首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 322 毫秒
1.
设计了一种腭裂语音的声韵母切分算法。通过主观的波形测试和客观的F检验及t检验,证明了腭裂语音与正常语音具有显著性差异。定义声母具有清音音素特性的音节为I类音节,声母具有浊音音素特性的音节为II类音节。首先基于层次聚类模型自动判别I类、II类音节,然后定义类浊音权重函数和类清音概率函数,实现I类音节的声韵母一级切分,再通过短时自相关函数峰值个数的一阶微分实现I类音节声韵母的二级切分。基于声韵母波形差异性,检测短时自相关函数的能量跳变点,实现II类音节的声韵母切分。通过大样本实验,结果表明提出的腭裂语音声韵母自动判别算法具有较高的正确率,I类音节的正确率达到90.72%,II类音节的正确率为92.90%。  相似文献   

2.
提出了一种由语音和文本共同驱动的卡通人脸动画方法.建立了卡通人脸音节-视位参数库,并对音节-视位参数进行非监督聚类分析,获得32个人脸视位基本类型,基于文本信息进行音节切分,获得准确的时长参数.结合视位的基本类型和语音时长参数,可以对输入的语音/文本进行连续动画拼接.对从影视作品中收集的100条具有娱乐效果的语音/文本进行的实验表明,本文提出的方法可以克服单独的语音驱动或文本驱动的不足,取得较好的卡通人脸动画效果.  相似文献   

3.
张扬  赵晓群  王缔罡 《计算机应用》2016,36(5):1410-1414
研究汉语自然语音音节切分方法具有明显现实意义,比较准确的自然语音切分方法可以代替人工对一些拥有参照文本的语音进行标注。然而至今为止并没有完全准确的汉语语音音节切分方法。依据相同发音环境下汉语语音音节时间长度服从某种高斯分布和相邻语音音节之间存在短时能量波谷两个假设,提出了基于音节时间长度高斯拟合的汉语音节切分方法。对算法进行分析,根据初步切分短时能量波谷分散到各分语音段的特性,提出了简化算法,有效降低了该音节切分方法的时间复杂度。实验结果表明,音节切分准确度(与人工标注切分时间距离平方的均值)达到小数点后3位,在台式机Matlab环境下运算时间均不超过1 s,可以达到应用要求。  相似文献   

4.
该研究基于大规模语音数据库,通过建立普通话连续语流中的声韵母时长预测模型,考察声韵母时长的影响因素,探讨普通话声韵母在连续语流中的时长变化类型与话语韵律结构之间的关系。初步研究结果表明 话语的韵律结构对声母时长的影响较小,而对韵母时长的影响较为显著,这种影响主要体现为 韵律单元末音节的韵母时长是否发生显著延长与话语的韵律结构密切相关,韵律大短语和语调短语末尾的音节通常会发生显著的韵母延长,韵律词内以及韵律词末尾的音节通常不会发生韵母延长;韵律小短语末尾的音节在韵母时长方面的表现比较混乱,规律性不明显,可能需要进一步做分化处理。  相似文献   

5.
连续汉语语音识别中基于归并的音节切分自动机   总被引:4,自引:0,他引:4  
张继勇  郑方  杜术  宋战江  徐明星 《软件学报》1999,10(11):1212-1215
文章研究并实现了汉语连续语音中的音节自动切分算法——基于归并的音节切分自动机(merging-based syllable detection automaton,简称MBSDA)算法.MBSDA算法利用了包括语音的短时能量、过零率和基音周期在内的多种特征参数,把特征参数高度相似的相邻帧(1帧或若干帧)的语音信号进行“归并(merging)”,形成“归并类似段(merged similar segment,简称MSS)”,它们被认定属于同一音节的相同状态.这些MSS经过一个包含若干状态的“音节切分自动机(  相似文献   

6.
张扬  赵晓群  王缔罡 《计算机应用》2016,36(11):3222-3228
较准确的语音切分方法可以极大提高语料标注等工作的效率,有助于语音识别等应用中语音与模型的对齐。利用汉语语音在时频二维的能量特征设计了一种新的汉语语音音节切分方法。用传统方法判断静音帧,用相同时间不同频率的二维能量判断清音帧,用不同时间特定频段的0-1二维能量判断浊音帧及有话帧,综合4种判断结果给出音节切分位置。实验结果表明,该方法切分准确度优于基于归并的音节切分自动机(MBSDA)和高斯拟合法,其音节切分误差为0.0297 s,音节切分偏差率为7.93%。  相似文献   

7.
本文提出了一种基于帧间相关特性的连续语音流的音节切分方法,采用反映相邻帧间LPC系数相关程度的帧间相关特性及其参数,进行连续语音流的分段切分,并通过时域参数对切分出的各个语音段进行音索性质标记,再根据汉语音节组成规则最后确定出音节切分及其边界.汉语数字串语音流的音节切分实验表明了该方法的有效性.  相似文献   

8.
基于分形特征变化的语音端点检测技术研究   总被引:1,自引:0,他引:1  
端点检测是语音识别的基本问题,最低要求是区分噪音和话音,如果实现对音节甚至音素的切分,那么对于语言识别, 关键词识别,以及连续语音识别都将是有益的.本文提出一种基于盒维与信息维的端点检测算法,首先根据信息维自适应调整门限划分噪音段和话音段,在此基础上,依据盒维与信息维的变化,及汉语音节特点,给出了一种汉语音节划分算法.采用实际电话信道话音数据进行测试.结果表明,本文提出的方法是有效的,话音段检测准确率较高,达到95%,音节切分准确率达85%,尚需进一步研究.  相似文献   

9.
方言转换系统实现了普通话到济南话、沈阳话和西安话的实时语音转换.北方方言之间的差异主要体现在声调上,声调是属于音节的,因此声调转换模式转换是以音节为单位实施的.主要研究了方言转换系统中关键技术:连续语流音节切分算法.提出了一种基于自动机的逐级音节切分算法,分为语段切分、音节切分自动机和切分点自动校正三部分.该算法在误差48ms时,正确率达到72.55%,并成功支持了方言转换中的基频模式转换.  相似文献   

10.
音节是泰语构词和读音的基本单位,泰语音节切分对泰语词法分析、语音合成、语音识别研究具有重要意义。结合泰语音节构成特点,提出基于条件随机场(Conditional Random Fields)的泰语音节切分方法。该方法结合泰语字母类别和字母位置定义特征,采用条件随机场对泰语句子中的字母进行序列标注,实现泰语音节切分。在InterBEST 2009泰语语料的基础上,标注了泰语音节切分语料。针对该语料的实验表明,该方法能有效利用字母类别和字母位置信息实现泰语音节切分,其准确率、召回率和F值分别达到了99.115%、99.284%和99.199%。  相似文献   

11.
汉语是一种有调语言,因此在汉语语音识别中,调型信息起着非常关键的作用。在现有的隐马尔可夫模型(Hidden Markov Model)框架下,如何有效地利用调型信息是有待研究的问题。现有的汉语语音识别系统中主要采用两种方式来使用调型信息 一种是基于Embedded Tone Model,即将调型特征向量与声学特征向量组成一个流去训练模型;一种是Explicit Tone Model,即将调型信息单独建模,再利用此模型优化原有的解码网络。该文将两种方法统一起来,首先利用Embedded Tone Model采用双流而非单流建模得到Nbest备选,再利用Explicit Tone Model对调进行左相关建模并对Nbest得分重新修正以得到识别结果,从而获得性能提升。与传统的无调模型相比,该文方法的识别率的平均绝对提升超过了3.0%,在第三测试集上的绝对提升达到了5.36%。  相似文献   

12.
该文以普通话测试数据统计结果为依据,分析了方言背景和学科背景对普通话水平的影响,发现文学类学科普通话水平较高,其他学科不相上下;该文考察了普通话测试中的易错音节,分析了普通话测试的主要失分因素和普通话学习难点,如平舌擦音,翘舌擦音,舌面擦音,鼻韵母和上声声调;该文对测评员主观评分的相关性统计表明,不同测评员主观测评之间相关度较高,分数客观。  相似文献   

13.
This paper investigates speech prosody for automatic story segmentation in Mandarin broadcast news. Prosodic cues effectively used in English story segmentation deserve a re-investigation since the lexical tones of Mandarin may complicate the expressions of pitch declination and reset. Our data-oriented study shows that story boundaries cannot be clearly discriminated from utterance boundaries by speaker normalized pitch features due to their large variations across different Mandarin syllable tones. We thus propose to use speaker- and tone-normalized pitch features that can provide clear separations between utterance and story boundaries. Our study also shows that speaker-normalized pause duration is quite effective to separate between story and utterance boundaries, while speaker-normalized speech energy and syllable duration are not effective. Experiments using decision trees for story boundary detection reinforce the difference between English and Chinese, i.e., speaker- and tone-normalized pitch features should be favorably adopted in Mandarin story segmentation. We show that the combination of different prosodic cues can achieve a very high F-measure of 93.04% due to the complementarity between pause, pitch and energy. Analysis of the decision tree uncovered five major heuristics that show how speakers jointly utilize pause duration and pitch to separate speech into stories.  相似文献   

14.
An experimental Mandarin dictation machine for inputting Mandarin speech (spoken Chinese language) into computers is described. Because of the special characteristics of the Chinese language, syllables are chosen as the basic units for dictation. The machine is designed based on a hierarchical language recognition approach in which acoustic signals are first recognized as a sequence of syllables, possible word hypotheses are then formed from the syllables, and the complete sentences are finally obtained. This approach is implemented by two subsystems. The first recognizes the syllables using speech signal processing techniques, the second subsystem then identifies the exact characters from the syllable and corrects the errors in syllable recognition. The detailed syllable recognition algorithms, word formation rules, parser, grammar, and the syntactic checking algorithms are described. With newspaper text in the form of isolated syllables as input, the preliminary test results indicate that such a dictation machine is not only practically attractive, but technically feasible  相似文献   

15.
汉语连续语流中的调型评测是汉语语音评测的一个重要环节,利用连续语流中韵律耦合效应和韵律结构紧密相关这一特性,以韵律词为基本建模单元,建立基于多空间概率分布的HMM调型模型(MSD-HMM),使得汉语普通话水平评测系统针对标准连续语流的调型识别率从82.0% 提升至84.6%;针对有方言背景的非标准发音,机器评分与专家评分的相关度绝对提升超过3.0%。  相似文献   

16.
广播语音的自动识别、标注、检索等是涉及到语音技术、自然语言处理、信息检索等多个领域的综合性课题。在介绍了广播语音的自动标注与检索的研究概况并分析了其中涉及的关键技术基础上,提出了面向普通话广播语音的多层次自动标注框架以及基于多层次标注的语音检索方案,对文档层、句子层和词语层的标注属性进行了探讨,采用了递归标注方法对属性逐层细化,并讨论了对语音自动标注至关重要的语音识别引擎和语音流分割等问题。基于本文提出的方法,对10 小时的普通话广播语音资料进行了标注和检索,得到了比较满意的实验结果。  相似文献   

17.
针对汉语统计参数语音合成中的上下文相关标注生成,设计了声韵母层、音节层、词层、韵律词层、韵律短语层和语句层6层上下文相关的标注格式。对输入的中文语句进行文本规范并利用语法分析获得语句的结构和分词信息;通过字音转换获得每个汉字的声韵母及声调;利用TBL(Transformation-Based error driven Learning)算法预测输入文本的韵律词边界和韵律短语边界。在此基础上,获得输入文本中每个汉字的声韵母信息及其上下文结构信息,从而产生统计参数语音合成所需的上下文相关标注。设计了一个以声韵母为合成基元的普通话的基于隐Markov模型(HMM)的统计参数语音合成系统,通过主、客观实验评测了不同标注信息对合成语音音质的影响,结果表明,上下文相关的标注信息越丰富,合成语音的音质越好。  相似文献   

18.
Suprasegmental (prosody) features of discourse provide a vehicle by which speakers reflect their mental purposes to listeners. Generating suitable prosody information is critical to expressing messages and improving the intelligibility and naturalness of synthetic speech. Generic prosody generators should provide information about pitch frequency (F 0) contours, energy levels, word durations, and inter-word pause durations for speech synthesizers. The present study used a recurrent neural network (RNN) for prosody generation. The inputs of this RNN were word-level and syllable-level linguistic features. To provide data efficiently for the RNN-based prosody generator in the training, validation, and test phases, automatic segmentation and labeling of phonemes were performed. The number of inputs to the RNN was reduced by employing a binary gravitational search algorithm (BGSA) for feature selection (FS). The proposed prosody generator provided 12 output prosodic parameters for the current syllable for representing pitch contour, log-energy contour, inter-syllable pause duration, duration of syllable, duration of the vowel in the syllable, and vowel onset time. Experimental results demonstrated the success of the RNN-based prosody generator in synthesizing the six prosodic elements with acceptable root mean square error (RMSE). By using a BGSA-based FS unit, a lighter neural model was achieved with a 53 % reduction in the number of weight connections, producing RMSEs with acceptable degradation over the no-FS unit prosody generator. The performance of the BGSA-based FS method was compared with a binary particle swarm optimization (BPSO) algorithm, and the BGSA showed slightly better results. A modified mean opinion score scale was used to evaluate the intelligibility and naturalness of synthesized speech using the proposed method.  相似文献   

19.
Tone study is very important for Mandarin speech recognition. In this paper, a Mixture Stochastic Polynomial Tone Model (MSPTM) is proposed for tone modeling in continuous Mandarin speech. In this model the pitch contour, main representative of tone pattern, is described as a mixed stochastic trajectory. The mean trajectory is represented by a polynomial function of normalized time while the variance is time varying. Effective training and tone recognition algorithms were developed. The experimental results based on the proposed MSPTM showed 40.7% tone recognition error rate reduction relative to the traditional Hidden Markov Model (HMM) tone model. We also present a decision tree based approach to learning the tone pattern variation in continuous speech. The phonetic and linguistic factors that may affect the tone patterns were taken into consideration while constructing the tree. After the tree was established, 28 different tone patterns were obtained. We found that in addition to the tone of the neighboring syllable, Consonant/Vowel type of the syllable and the position of the syllable in the utterance also made important contributions to tone pattern variations in continuous speech. Finally, a new approach of integrating tone information into the search process at word level is discussed. Experiments on continuous Mandarin speech recognition showed that the new tone model and tone information integration method were efficient, achieving a 16.2% relative character error rate reduction.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号