期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

程雪林吴开政李宗葛《计算机工程》2003,29(12):93-95

介绍了利用三音子模型和基频信息提高汉语连续数字串的识别率。在汉语连续数字串识别中“8”和“2”是容易混淆的数字,而“9”和“6”在识别时会在末尾插入一个“5”而变成“95”和“65”。三音子模型将不同上下文的同一个数字区分开来,明显提高了识别率。基频反映了声调的变化,将它们作为后处理进一步降低了错误率。相似文献

2.

汉语连续数字串语音识别系统 总被引：1，自引：2，他引：1

许海天吴及王作英《计算机工程与应用》2002,38(2):97-98

汉语数字串在语音识别中具有重要的地位,文章设计实现了一个实用化的汉语连续数字串语音识别系统,并针对汉语数字混淆度大的特点进行了分析,提出了模型改进和语速控制策略,使系统具有很好的整体性能。相似文献

3.

汉语连续语音识别系统中三音子模型的优化

下载免费PDF全文

齐耀辉潘复平葛凤培颜永红《计算机应用研究》2013,30(10):2920-2922

为了更准确地估计状态聚类前有调三音子的模型参数,从而提高聚类后捆绑状态的精度及系统的识别性能,针对汉语连续语音识别中,有些有调三音子的训练样本数非常少,而其对应的无调三音子的训练样本数相对较多的情况,提出用其对应的无调三音子的模型参数进行初始化,并用最大后验概率准则训练模型。汉语大词汇量连续语音识别实验表明,该方法可以提高训练语料中稀疏三音子聚类前的模型精度,从而提高系统的识别性能。相似文献

4.

基于HMM的维吾尔语连续语音识别系统

那斯尔江·吐尔逊吾守尔·斯拉《计算机应用》2009,29(7):2009-2011

维吾尔语是黏着性语言,利用丰富的词缀可以用同样的词干产生超大词汇,给维吾尔语语音识别的研究工作带来了很大困难。结合维吾尔语自身特点,建立了维吾尔语连续语音语料库,利用HTK（HMMToolKit）工具实现了基于隐马尔可夫模型（HMM）的维吾尔语连续语音识别系统。在声学层,选取三音子作为基本的识别单元,建立了维吾尔语的三音子声学模型,并使用决策树、三音子绑定、修补哑音、增加高斯混合分量等方法提高模型的识别精度。在语言层,使用了适合于维吾尔语语音特征的基于统计的二元文法语言模型。最后,利用该系统进行了维吾尔语连续语音识别实验。相似文献

5.

基于帧间相关特性和汉语音节组成规则的连接数字串的音节切分

陈雁翔戴蓓蒨周曦李辉《模式识别与人工智能》2003,16(3)

本文提出了一种基于帧间相关特性的连续语音流的音节切分方法,采用反映相邻帧间LPC系数相关程度的帧间相关特性及其参数,进行连续语音流的分段切分,并通过时域参数对切分出的各个语音段进行音索性质标记,再根据汉语音节组成规则最后确定出音节切分及其边界.汉语数字串语音流的音节切分实验表明了该方法的有效性. 相似文献

6.

基于层级策略的连续数字串识别的研究

汤霖蔡莲红《计算机工程与应用》2003,39(21):83-86

在分析汉语数字串语音特点的基础上,设计出了基于层级策略的连续数字串识别系统。该系统先对连续数字串进行确定性的预分割,再用LevelBuilding算法对每个分割段进行基于模板模糊分组的识别,在该识别结果的基础上利用加权矩阵识别算法进一步区分易混淆语音对。该系统在计算时间减少到原来的35.2%的同时识别率提高到94.08%。相似文献

7.

基于多空间概率分布的汉语连续语音声调识别研究

倪崇嘉刘文举徐波《计算机科学》2011,38(9):224-226

汉语是一种带声调的语言,声调信息在汉语语音识别中具有非常重要的意义。提出了emt}eaaea声调模型与explicit声调模型相结合的方法用以识别汉语连续语音的声调。该方法能够将逐帧的基频信息和较强时长的基频信息相结合来识别声调。在“863-Test”和“TestCorpus98"测试集上的实验表明,该方法分别能够达到96. 12%和93.78 %的声调识别正确率。相似文献

8.

汉语连续单词语音识别的实验与研究

黄学东方棣棠胡起秀《计算机应用与软件》1987,(5)

本文介绍了一个在微机上实现的汉语连续单词语音识别系统。它采用了分级动态规划算法的基本思想。文章首先介绍了动态规划算法和分级动态规划算法。然后讨论了将其用于汉语连续数字串识别的一些实际问题。系统识别长度为3的汉语数字串,平均识别率为90.9%。对长度为2的数字串,识别率可达98.5%。相似文献

9.

基于决策树的藏语拉萨话三音子模型

李冠宇于洪志李永宏马宁《计算机工程与科学》2013,35(9):146

对藏语拉萨话中单音子及三音子分布情况进行了统计,分析了在藏语大词表连续词表连续语音识别中建立上下文相关声学模型的必要性.选择音素为建模单元,根据藏语特点,建立以音节为单位的发音字典.讨论了利用决策树建立三音子模型的几个关键问题和基本算法,结合国际音标分类和经验知识,确定了38个藏语拉萨话音子类别集及相应的决策树问题集.建立了共20个发音人8 170句的训练语料,在HTK平台上建立和训练得到了基于决策树的藏语拉萨话三音子模型,并分析了不同隐马尔可夫模型状态数及高斯混合度下的识别结果,确定了一套藏语大词表连续语音识别的完整方案. 相似文献

10.

基于视频三音子的汉语双模态语料库的建立 总被引：2，自引：0，他引：2

赵晖林成龙唐朝京《中文信息学报》2009,23(5):98-104

为实现可视语音合成和双模态语音识别,需要建立符合条件的双模态语料库。该文提出了一种汉语双模态语料库的建立方法。根据视频中唇部发音特征,对已有的三音子模型聚类,形成视频三音子。在视频三音子的基础上,利用评估函数对原始语料中的句子打分,并实现语料的自动选取。与其他双模态语料库相比,该文所建立的语料库在覆盖率、覆盖效率和高频词分布律有了较大改进,能够更加真实反映汉语中的双模态语言现象。相似文献

11.

From English pitch accent detection to Mandarin stress detection, where is the difference?

Chongjia Ni Wenju Liu Bo Xu 《Computer Speech and Language》2012,26(3):127-148

Although English pitch accent detection has been studied extensively, there relatively a few works explore Mandarin stress detection. Moreover, the comparison and analysis between Mandarin stress detection and English pitch accent detection have not been touched for such counterpart tasks. In this paper, we discuss Mandarin stress detection and compare it with English pitch accent detection. The contributions of the paper are two aspects: one is that we use classifier combination method to detect Mandarin stress and English pitch accent by using acoustic, lexical and syntactic evidence. Our proposed method achieves better performance on both the Mandarin prosodic annotation corpus—ASCCD and the English prosodic annotation corpus—Boston University Radio News Corpus (BURNC) when compared with the baseline system. We also verify our proposed method on other prosodic annotation corpus and continuous speech corpus. The other is the feature analysis. Duration, pitch, energy and intensity features are compared for Mandarin stress detection and English pitch accent detection. Based on the analysis of prosodic annotation corpora, we also verify some linguistic conclusions. 相似文献

12.

Detection of time varying pitch in tonal languages: an approach based on ensemble empirical mode decomposition

Hong HONG Xiao-hua ZHU Wei-min SU Run-tong GENG Xin-long WANG 《浙江大学学报:C卷英文版》2012,(2):139-145

A method based on ensemble empirical mode decomposition (EEMD) is proposed for accurately detecting the time varying pitch of speech in tonal languages. Unlike frame-, event-, or subspace-based pitch detectors, the time varying information of pitch within the short duration, which is of crucial importance in speech processing of tonal languages, can be accurately extracted. The Chinese Linguistic Data Consortium (CLDC) database for Mandarin Chinese was employed as standard speech data for the evaluation of the effectiveness of the method. It is shown that the proposed method provides more accurate and reliable results, particularly in estimating the tones of non-monotonically varying pitches like the third one in Mandarin Chinese. Also, it is shown that the new method has strong resistance to noise disturbance. 相似文献

13.

Discovering salient prosodic cues and their interactions for automatic story segmentation in Mandarin broadcast news

Lei Xie 《Multimedia Systems》2008,14(4):237-253

This paper investigates speech prosody for automatic story segmentation in Mandarin broadcast news. Prosodic cues effectively used in English story segmentation deserve a re-investigation since the lexical tones of Mandarin may complicate the expressions of pitch declination and reset. Our data-oriented study shows that story boundaries cannot be clearly discriminated from utterance boundaries by speaker normalized pitch features due to their large variations across different Mandarin syllable tones. We thus propose to use speaker- and tone-normalized pitch features that can provide clear separations between utterance and story boundaries. Our study also shows that speaker-normalized pause duration is quite effective to separate between story and utterance boundaries, while speaker-normalized speech energy and syllable duration are not effective. Experiments using decision trees for story boundary detection reinforce the difference between English and Chinese, i.e., speaker- and tone-normalized pitch features should be favorably adopted in Mandarin story segmentation. We show that the combination of different prosodic cues can achieve a very high F-measure of 93.04% due to the complementarity between pause, pitch and energy. Analysis of the decision tree uncovered five major heuristics that show how speakers jointly utilize pause duration and pitch to separate speech into stories. 相似文献

14.

基于数据挖掘的普通话韵律规则学习

朱廷劭高文《计算机学报》2000,23(11):1179-1183

普通话韵律规则对于语音合成和语音学研究具有重要意义。为了更有效地进行韵律规则学习,该文利用数据挖掘技术从语料库中的取规则。通过聚类分析进行基频模式提取,并以此进行基频序列的离散化;由语言学分析的结果得出训练句子中每个单节的参数,利用决策树和神经网络学习章节的韵律变化规则。测试表明基于数据挖掘的韵律规则学习取得了较好的结果,证实了方法的有效性。相似文献

15.

汉语声调识别中的基音平滑新方法 总被引：13，自引：0，他引：13

朱小燕王昱刘俊《中文信息学报》2001,15(2):46-51

汉语普通话是一种带声调的语言。声调可以用基音的轮廓信息进行描述。传统基音的平滑方法:线性平滑、中值平滑和一般的线性插值方法都不能很好地处理连续的基音频率有随机错误点的情况。本文提出了一种通过搜索来得到更精确的基音轮廓的新的基音平滑方法。这种方法具有简单可靠,快速高效的特点。实验表明这种方法比传统的方法识别错误率降低约40%。相似文献

16.

一种使用声调映射码本的汉语声音转换方法 总被引：3，自引：0，他引：3

左国玉刘文举阮晓钢《数据采集与处理》2005,20(2):144-149

在使用高斯混合模型实现说话人语音频谱包络变换的同时，提出了一种汉语声调码本映射技术来进一步提高转换语音目标说话人特征倾向性的方法。从源语音和目标语音分别提取汉语单音节的基频曲线作为基频变换单元，作预处理和聚类后分别形成源、目标声调码本，根据时间对准原则建立了一个由源特征空间到目标特征空间的声调模式映射码本。声音转换实验评估了声调码本映射算法的性能。实验结果表明，该算法较好地反映出源说话人与目标说话人基频曲线之间的映射关系，改善了声音转换性能。相似文献

17.

电影对白语言中的语音历时对比分析 总被引：1，自引：1，他引：0

下载免费PDF全文

王燕侯敏邹煜《计算机工程与应用》2011,47(22):6-9

普通话已经走过了近百年的历程,目前还很少有人对普通话的历时语音变化及其韵律特征进行系统的实验研究。以2005年颇具代表性的广播电视谈话体语料为基准数据,选取《现代汉语普通话数字化样本库》中20世纪50年代和70年代拍摄的同名电影语料,对其中主要人物对白的音高、时长等语音特征进行历时的对比研究。经过分析发现：在广播电视及影视等有声媒体中,70年代语音的音节时长均值要长于50年代,其中阴平调表现尤为明显;在音高方面,高音点和低音点也都高于50年代,音域也较宽。这说明70年代的语音在发音上显得较夸张、不自然,这与六七十年代那段特殊的历史时期有关。相似文献

18.

中文文语转换系统中基于决策树的基频模型提取

谢崇文柴佩琪《微型电脑应用》2007,23(7):4-7

普通话是有调语言,基频是TTS系统中选择单元时一个非常重要的参数。为了能根据基频这个声学参数来选择语音单元,就必须建立文本上下文环境信息与基频曲线之间映射关系,即基频模型。本文将通过决策树的方法来提取这个模型,并将这上模型应用到普通话的文—语转换系统中。相似文献

19.

Tone Modeling for Continuous Mandarin Speech Recognition

Yang Cao Shuwu Zhang Taiyi Huang Bo Xu 《International Journal of Speech Technology》2004,7(2-3):115-128

Tone study is very important for Mandarin speech recognition. In this paper, a Mixture Stochastic Polynomial Tone Model (MSPTM) is proposed for tone modeling in continuous Mandarin speech. In this model the pitch contour, main representative of tone pattern, is described as a mixed stochastic trajectory. The mean trajectory is represented by a polynomial function of normalized time while the variance is time varying. Effective training and tone recognition algorithms were developed. The experimental results based on the proposed MSPTM showed 40.7% tone recognition error rate reduction relative to the traditional Hidden Markov Model (HMM) tone model. We also present a decision tree based approach to learning the tone pattern variation in continuous speech. The phonetic and linguistic factors that may affect the tone patterns were taken into consideration while constructing the tree. After the tree was established, 28 different tone patterns were obtained. We found that in addition to the tone of the neighboring syllable, Consonant/Vowel type of the syllable and the position of the syllable in the utterance also made important contributions to tone pattern variations in continuous speech. Finally, a new approach of integrating tone information into the search process at word level is discussed. Experiments on continuous Mandarin speech recognition showed that the new tone model and tone information integration method were efficient, achieving a 16.2% relative character error rate reduction. 相似文献