首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 218 毫秒
1.
构造了两个单流单音素的动态贝叶斯网络(DBN)模型,以实现基于音频和视频特征的连续语音识别,并在描述词和对应音素具体关系的基础上,实现对音素的时间切分。实验结果表明,在基于音频特征的识别率方面:在低信噪比(0~15dB)时,DBN模型的识别率比HMM模型平均高12.79%;而纯净语音下,基于DBN模型的音素时间切分结果和三音素HMM模型的切分结果很接近。对基于视频特征的语音识别,DBN模型的识别率比HMM识别率高2.47%。实验最后还分析了音视频数据音素时间切分的异步关系,为基于多流DBN模型的音视频连续语音识别和确定音频和视频的异步关系奠定了基础。  相似文献   

2.
基于三音素动态贝叶斯网络模型的大词汇量连续语音识别   总被引:1,自引:0,他引:1  
考虑连续语音中的协同发音现象,基于词-音素结构的DBN(WP-DBN)模型和词-音素-状态结构的DBN(WPS-DBN)模型,引入上下文相关的三音素单元,提出两个新颖的单流DBN模型:基于词-三音素结构的DBN(WT-DBN)模型和基于词-三音素-状态的DBN(WTS-DBN)模型.WTS-DBN模型是三音素模型,识别基元为三音素,以显式的方式模拟了基于三音素状态捆绑的隐马尔可夫模型(HMM).大词汇量语音识别实验结果表明:在纯净语音环境下,WTS-DBN模型的识别率比HMM,WT-DBN,WP-DBN和WPS-DBN模型的识别率分别提高了20.53%,40.77%,42.72%和7.52%.  相似文献   

3.
面向语音合成的维吾尔语音素自动切分算法研究   总被引:2,自引:0,他引:2  
结合维吾尔语语音特征,以建立维吾尔音素语料库为目标,为了减少人工工作量,通过HTK工具实现了音素的自动切分算法:首先完成了文本设计、录音和手动标注等准备工作,设计了上下文属性集,通过训练获得了每个音素的HMM模型,随后对任意输入的语音句子进行了其音素构成部分的自动切分,最后分析了其切分准确度、存在的问题及对策等。实践表明,在语料库的建设中,该研究策略确实节省了大量的时间和人力成本,提高了语音语料库标注信息的一致性和准确性。  相似文献   

4.
藏语语音合成及语音学研究中,经常需要切分音素。人工切分费时费力,但是由于藏语语料缺乏,训练的藏语声学模型不够精确和鲁棒,自动切分的音素边界不够准确。以藏语拉萨方言为研究对象,在确定拉萨方言音素集、建立拉萨方言发音词典的基础上,通过计算音素模型间的距离,确定了拉萨方言和英语的共同音素,融合拉萨方言和英语GMM HMM模型,并自动判断语音中的静音和短时停顿,构造语音对应的词网络,查询发音词典,将词网络扩展为模型(音素)网络,使用Viterbi算法将每一帧特征参数对应到模型的每一个状态上,进而对音素进行切分。实验表明,切分效果要优于单纯的藏语模型方法。  相似文献   

5.
提出了一种基于迭代的有限状态转录机的字素音素转换学习语料库的自动生成算法,针对一个德语单词及其音标进行基于形态规则和语音规则的迭代,最终将字素和音素完全一一对应起来。经过抽样测试,算法的正确率可以达到98.6%。  相似文献   

6.
基于动态贝叶斯网络的语音识别及音素切分研究   总被引:1,自引:1,他引:0  
研究了一种基于动态贝叶斯网络(dynamic bayesian networks, DBN)的语音识别建模方法,利用GMTK(graphical model tool kits)工具构建音素级音频流DBN语音训练和识别模型,同时与传统的基于隐马尔可夫的语音识别结果进行比较,并给出词与音素的切分结果.实验表明,在各种信噪比测试条件下,基于DBN的语音识别结果与基于HMM的语音识别结果相当,并表现出一定的抗噪性,音素的切分结果也比较准确.  相似文献   

7.
基于HMM模型的语音单元边界的自动切分   总被引:1,自引:0,他引:1  
基于隐尔马可夫模型(HMM)的强制对齐方法被用于文语转换系统(TTS)语音单元边界切分.为提高切分准确性,本文对HMM模型的特征选择,模型参数和模型聚类进行优化.实验表明:12维静态Mel频率倒谱系数(MFCC)是最优的语音特征;HMM模型中的状态模型采用单高斯;对于特定说话人的HMM模型,使用分类与衰退树(CART)聚类生成的绑定状态模型个数在3 000左右最优.在英文语音库中音素边界切分的实验中,切分准确率从模型优化前的77.3%提高到85.4%.  相似文献   

8.
字素音素转换是德语自然语言处理中的难点之一。提出一种基于决策树的字素音素转换的监督学习算法。在一个字素音素平行语料库的基础上,通过决策树进行字素音素转换的监督学习,生成字素音素转换规则。经交叉测试,平均转换正确率可达98. 03%。  相似文献   

9.
考虑连续语音中的协同发音问题,提出基于词内扩展的单流上下文相关三音素动态贝叶斯网络(SS-DBN-TRI)模型和词间扩展的单流上下文相关三音素DBN(SS-DBN-TRI-CON)模型。SS-DBN-TRI模型是Bilmes提出单流DBN(SS-DBN)模型的改进,采用词内上下文相关三音素节点替代单音素节点,每个词由它的对应三音素单元构成,而三音素单元和观测向量相联系;SS-DBN-TRI-CON模型基于SS-DBN模型,通过增加当前音素的前音素节点和后音素节点,构成一个新的词间扩展的三音素变量节点,新的三音素节点和观测向量相联系,采用高斯混合模型来描述,采用数字连续语音数据库的实验结果表明:SS-DBN-TRI-CON具备最好的语音识别性能。  相似文献   

10.
为实现文本/语音驱动的说话人头部动画,本文提出基于贝叶斯切线形状模型的口形轮廓特征提取方法和基于动态贝叶斯网络(Dynamic Bayesian Network, DBN)模型的唇读系统。在描述词与它的组成视素关系的基础上,得到视素时间切分序列。为比较性能,音素DBN模型和HMM的音素识别结果被影射成视素序列。在评价准则上,提出绝对视素切分正确性和基于图像与嘴唇几何特征两种相对视素切分正确性的评价标准。实验表明,DBN模型识别性能优于HMM,而基于视素的DBN模型能为说话人头部动画提供最好的口形。  相似文献   

11.
The speech recognition system basically extracts the textual information present in the speech. In the present work, speaker independent isolated word recognition system for one of the south Indian language—Kannada has been developed. For European languages such as English, large amount of research has been carried out in the context of speech recognition. But, speech recognition in Indian languages such as Kannada reported significantly less amount of work and there are no standard speech corpus readily available. In the present study, speech database has been developed by recording the speech utterances of regional Kannada news corpus of different speakers. The speech recognition system has been implemented using the Hidden Markov Tool Kit. Two separate pronunciation dictionaries namely phone based and syllable based dictionaries are built in-order to design and evaluate the performances of phone-level and syllable-level sub-word acoustical models. Experiments have been carried out and results are analyzed by varying the number of Gaussian mixtures in each state of monophone Hidden Markov Model (HMM). Also, context dependent triphone HMM models have been built for the same Kannada speech corpus and the recognition accuracies are comparatively analyzed. Mel frequency cepstral coefficients along with their first and second derivative coefficients are used as feature vectors and are computed in acoustic front-end processing. The overall word recognition accuracy of 60.2 and 74.35 % respectively for monophone and triphone models have been obtained. The study shows a good improvement in the accuracy of isolated-word Kannada speech recognition system using triphone HMM models compared to that of monophone HMM models.  相似文献   

12.
在维吾尔语连续语音识别试验的声学层建模基础上,引用DDBHMM模型将上下文相关的三音子作为基本识别单元,并提出一种状态绑定的思想,对状态进行优化。为得到更充分的训练模型,提高识别效率,对语料库进行扩充,在多组对比试验的基础上,分析扩充前后对声学层识别速度、准确率等各个方面的影响。  相似文献   

13.
基于视频三音子的汉语双模态语料库的建立   总被引:2,自引:0,他引:2  
为实现可视语音合成和双模态语音识别,需要建立符合条件的双模态语料库。该文提出了一种汉语双模态语料库的建立方法。根据视频中唇部发音特征,对已有的三音子模型聚类,形成视频三音子。在视频三音子的基础上,利用评估函数对原始语料中的句子打分,并实现语料的自动选取。与其他双模态语料库相比,该文所建立的语料库在覆盖率、覆盖效率和高频词分布律有了较大改进,能够更加真实反映汉语中的双模态语言现象。  相似文献   

14.
15.
Building a continuous speech recognizer for the Bangla (widely used as Bengali) language is a challenging task due to the unique inherent features of the language like long and short vowels and many instances of allophones. Stress and accent vary in spoken Bangla language from region to region. But in formal read Bangla speech, stress and accents are ignored. There are three approaches to continuous speech recognition (CSR) based on the sub-word unit viz. word, phoneme and syllable. Pronunciation of words and sentences are strictly governed by set of linguistic rules. Many attempts have been made to build continuous speech recognizers for Bangla for small and restricted tasks. However, medium and large vocabulary CSR for Bangla is relatively new and not explored. In this paper, the authors have attempted for building automatic speech recognition (ASR) method based on context sensitive triphone acoustic models. The method comprises three stages, where the first stage extracts phoneme probabilities from acoustic features using a multilayer neural network (MLN), the second stage designs triphone models to catch context of both sides and the final stage generates word strings based on triphone hidden Markov models (HMMs). The objective of this research is to build a medium vocabulary triphone based continuous speech recognizer for Bangla language. In this experimentation using Bangla speech corpus prepared by us, the recognizer provides higher word accuracy as well as word correct rate for trained and tested sentences with fewer mixture components in HMMs.  相似文献   

16.
We describe the automatic determination of a large and complicated acoustic model for speech recognition by using variational Bayesian estimation and clustering (VBEC) for speech recognition. We propose an efficient method for decision tree clustering based on a Gaussian mixture model (GMM) and an efficient model search algorithm for finding an appropriate acoustic model topology within the VBEC framework. GMM-based decision tree clustering for triphone HMM states features a novel approach designed to reduce the overly large number of computations to a practical level by utilizing the statistics of monophone hidden Markov model states. The model search algorithm also reduces the search space by utilizing the characteristics of the acoustic model. The experimental results confirmed that VBEC automatically and rapidly yielded an optimum model topology with the highest performance.  相似文献   

17.
利用投票选择机制进行语音分割的新方法   总被引:1,自引:1,他引:0       下载免费PDF全文
针对在噪声背景下连续语音信号的语音分割性能会明显下降的问题,提出了一种针对连续语音信号分割的新方法。该方法不再采用单一的端点检测方法,而是将基于分形维数的端点检测方法,基于倒谱特征的端点检测方法,基于HMM的端点检测方法等多种不同方法下得到的端点检测结果,通过投票选择的方式,得到最终的端点检测结果,从而达到对连续语音信号进行分割的目的。实验结果表明,该方法较明显地提高了语音分割的准确性。  相似文献   

18.
针对有声出版物语音分割系统,提出了一种阈值自适应加相似度判决的系统分割模型,基于脚本中的先验知识提出了能量阈值自适应分割算法.对于传统的端点检测算法无法排除的干扰,为提高系统的抗干扰能力以增强其适用性,提出了基于语音单元相似性进行分析判决的新方法.测试结果表明,无干扰时,系统分割的正确率100%,每个语音文件包含两个人为干扰信号时,系统分割正确率98.8%,能够满足有声出版物语音自动分割的需要.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号