首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 171 毫秒
1.
基于三音素动态贝叶斯网络模型的大词汇量连续语音识别   总被引:1,自引:0,他引:1  
考虑连续语音中的协同发音现象,基于词-音素结构的DBN(WP-DBN)模型和词-音素-状态结构的DBN(WPS-DBN)模型,引入上下文相关的三音素单元,提出两个新颖的单流DBN模型:基于词-三音素结构的DBN(WT-DBN)模型和基于词-三音素-状态的DBN(WTS-DBN)模型.WTS-DBN模型是三音素模型,识别基元为三音素,以显式的方式模拟了基于三音素状态捆绑的隐马尔可夫模型(HMM).大词汇量语音识别实验结果表明:在纯净语音环境下,WTS-DBN模型的识别率比HMM,WT-DBN,WP-DBN和WPS-DBN模型的识别率分别提高了20.53%,40.77%,42.72%和7.52%.  相似文献   

2.
构造了两个单流单音素的动态贝叶斯网络(DBN)模型,以实现基于音频和视频特征的连续语音识别,并在描述词和对应音素具体关系的基础上,实现对音素的时间切分。实验结果表明,在基于音频特征的识别率方面:在低信噪比(0~15dB)时,DBN模型的识别率比HMM模型平均高12.79%;而纯净语音下,基于DBN模型的音素时间切分结果和三音素HMM模型的切分结果很接近。对基于视频特征的语音识别,DBN模型的识别率比HMM识别率高2.47%。实验最后还分析了音视频数据音素时间切分的异步关系,为基于多流DBN模型的音视频连续语音识别和确定音频和视频的异步关系奠定了基础。  相似文献   

3.
构建了一种新的基于动态贝叶斯网络(Dynamic Bayesian Network,DBN)的异步整词-发音特征语音识别模型AWA-DBN(每个词由其发音特征的运动来描述),定义了各发音特征节点及异步检查节点的条件概率分布。在标准数字语音库Aurora5.0上的语音识别实验表明,与整词-状态DBN(WS-DBN,每个词由固定个数的整词状态构成)和整词-音素DBN(WP-DBN,每个词由其对应的音素序列构成)模型相比,WS-DBN模型虽然具有最高的识别率,但其只适用于小词汇量孤立词语音识别,AWA-DBN和WP-DBN可以为大词汇量连续语音建模,而AWA-DBN模型比WP-DBN模型具有更高的语音识别率和系统鲁棒性。  相似文献   

4.
提出一种基于词级区分性点过程模型的连续语音关键词检测方法。利用时间模式结构和多层感知器计算每个音素帧级后验概率,使用区分性点过程模型将一段时间内多个音素事件形成的点过程作为整体,把关键词检测看作二元分类问题,经分段和拼接构成超矢量,输入支持向量机分类器,判断该段语音是否为待检测关键词。该方法充分考虑语音信号上下文相关性,直接以词作为基本单元建模,提高了系统检测的准确性和鲁棒性。实验结果表明,对采样的语音,其关键词平均召回率和准确率分别可达71.5%和84.6%以上,并且结合相关语言模型知识,系统性能将会进一步提高。  相似文献   

5.
基于无监督预训练技术的wav2vec 2.0在许多低资源语种上获得了良好的性能,成为研究的热点。本文在预训练模型的基础上进行越南语连续语音识别。将语音学信息引入到基于链接时序分类代价函数(Connectionist temporal classification,CTC)的声学建模中,选取音素与含位置信息的音素作为基础单元。为了平衡建模单元数目以及模型的精细程度,采用字节对编码(Byte-pair encoding,BPE)算法生成音素子词,将上下文信息结合到声学建模过程。实验在美国NIST的BABEL任务低资源的越南语开发集上进行,所提算法相对wav2vec 2.0基线系统有明显改进,识别词错误率由37.3%降低到29.4%。  相似文献   

6.
藏语语音合成及语音学研究中,经常需要切分音素。人工切分费时费力,但是由于藏语语料缺乏,训练的藏语声学模型不够精确和鲁棒,自动切分的音素边界不够准确。以藏语拉萨方言为研究对象,在确定拉萨方言音素集、建立拉萨方言发音词典的基础上,通过计算音素模型间的距离,确定了拉萨方言和英语的共同音素,融合拉萨方言和英语GMM HMM模型,并自动判断语音中的静音和短时停顿,构造语音对应的词网络,查询发音词典,将词网络扩展为模型(音素)网络,使用Viterbi算法将每一帧特征参数对应到模型的每一个状态上,进而对音素进行切分。实验表明,切分效果要优于单纯的藏语模型方法。  相似文献   

7.
藏语拉萨话大词表连续语音识别声学模型研究   总被引:1,自引:0,他引:1       下载免费PDF全文
李冠宇  孟猛 《计算机工程》2012,38(5):189-191
根据藏语的特点,提出藏语拉萨话大词表连续语音识别声学模型,利用高层次的藏语语言知识减少模式匹配的模糊性。以音素和声韵母为声学建模单元,在HTK平台上建立上下文相关的连续隐马尔可夫声学模型,以实现藏语拉萨话特定人大词表连续语音识别。实验结果表明,在最优情况下,该模型词错误率只有7.8%。  相似文献   

8.
面向语音合成的维吾尔语音素自动切分算法研究   总被引:2,自引:0,他引:2  
结合维吾尔语语音特征,以建立维吾尔音素语料库为目标,为了减少人工工作量,通过HTK工具实现了音素的自动切分算法:首先完成了文本设计、录音和手动标注等准备工作,设计了上下文属性集,通过训练获得了每个音素的HMM模型,随后对任意输入的语音句子进行了其音素构成部分的自动切分,最后分析了其切分准确度、存在的问题及对策等。实践表明,在语料库的建设中,该研究策略确实节省了大量的时间和人力成本,提高了语音语料库标注信息的一致性和准确性。  相似文献   

9.
一种基于改进CP网络与HMM相结合的混合音素识别方法   总被引:2,自引:0,他引:2  
提出了一种基于改进对偶传播(CP)神经网络与隐驰尔可夫模型(HMM)相结合的混合音素识别方法.这一方法的特点是用一个具有有指导学习矢量量化(LVQ)和动态节点分配等特性的改进的CP网络生成离散HMM音素识别系统中的码书。因此,用这一方法构造的混合音素识别系统中的码书实际上是一个由有指导LVQ算法训练的具有很强分类能力的高性能分类器,这就意味着在用HMM对语音信号进行建模之前,由码书产生的观测序列中  相似文献   

10.
汉语连续语音识别之音素声学模型的改进   总被引:1,自引:1,他引:1  
研究基于主元音音素基元的声学模型的改进。由于汉语语音特点,主元音模型得到了广泛的应用。通过分析主元音音素模型,发现该模型存在词组音节序列字界线有歧义,从而提出主元音的改进方法以明确音节序列中字的分界,减小基元规模,提高语音系统识别率。为了描述连续语意中的协同发音现象,还针对改进后的主元音基元,设计了相应的有调问题集,利用决策树的参数共享策略建立了上下文相关的音素模型。实验结果表明,改进后的有调音素集合在削减了原有基元个数的基础上,字误识率(CER)有0.4%-0.6%的明显改善。  相似文献   

11.
构建一种基于发音特征的音视频双流动态贝叶斯网络(DBN)语音识别模型(AFAV_DBN),定义节点的条件概率关系,使发音特征状态的变化可以异步.在音视频语音数据库上的语音识别实验表明,通过调整发音特征之问的异步约束,AF- AV_DBN模型能得到比基于状态的同步和异步DBN模型以及音频单流模型更高的识别率,对噪声也具有...  相似文献   

12.
In automatic speech recognition, the phone has probably been a dominating sub-word unit for more than one decade. Context Dependent phone or triphone modeling accounts for contextual variations between adjacent phones and state tying addresses modeling of triphones that are not seen during training. Recently, syllable is gaining momentum as a new sub-word unit. Syllable being a larger unit than a phone addresses the severe contextual variations between phones within it. Therefore, it is more stable than a phone and models pronunciation variability in a systematic way. Tamil language has challenging features like agglutination and morpho-phonology. In this paper, attempts have been made to provide solutions to these issues by using the syllable as a sub-word unit in an acoustic model. Initially, a small vocabulary context independent word models and a medium vocabulary context dependent phone models are developed. Subsequently, an algorithm based on prosodic syllable is proposed and two experiments have been conducted. First, syllable based context independent models have been trained and tested. Despite large number of syllables, this system has performed reasonably well compared to context independent word models in terms of word error rate and out of vocabulary words. Subsequently, in the second experiment, syllable information is integrated in conventional triphone modeling wherein cross-syllable triphones are replaced with monophones and the number of context dependent phone models is reduced by 22.76% in untied units. In spite of reduction in the number of models, the accuracy of the proposed system is comparable to that of the baseline triphone system.  相似文献   

13.
Building a continuous speech recognizer for the Bangla (widely used as Bengali) language is a challenging task due to the unique inherent features of the language like long and short vowels and many instances of allophones. Stress and accent vary in spoken Bangla language from region to region. But in formal read Bangla speech, stress and accents are ignored. There are three approaches to continuous speech recognition (CSR) based on the sub-word unit viz. word, phoneme and syllable. Pronunciation of words and sentences are strictly governed by set of linguistic rules. Many attempts have been made to build continuous speech recognizers for Bangla for small and restricted tasks. However, medium and large vocabulary CSR for Bangla is relatively new and not explored. In this paper, the authors have attempted for building automatic speech recognition (ASR) method based on context sensitive triphone acoustic models. The method comprises three stages, where the first stage extracts phoneme probabilities from acoustic features using a multilayer neural network (MLN), the second stage designs triphone models to catch context of both sides and the final stage generates word strings based on triphone hidden Markov models (HMMs). The objective of this research is to build a medium vocabulary triphone based continuous speech recognizer for Bangla language. In this experimentation using Bangla speech corpus prepared by us, the recognizer provides higher word accuracy as well as word correct rate for trained and tested sentences with fewer mixture components in HMMs.  相似文献   

14.
在维吾尔语连续语音识别试验的声学层建模基础上,引用DDBHMM模型将上下文相关的三音子作为基本识别单元,并提出一种状态绑定的思想,对状态进行优化。为得到更充分的训练模型,提高识别效率,对语料库进行扩充,在多组对比试验的基础上,分析扩充前后对声学层识别速度、准确率等各个方面的影响。  相似文献   

15.
The speech recognition system basically extracts the textual information present in the speech. In the present work, speaker independent isolated word recognition system for one of the south Indian language—Kannada has been developed. For European languages such as English, large amount of research has been carried out in the context of speech recognition. But, speech recognition in Indian languages such as Kannada reported significantly less amount of work and there are no standard speech corpus readily available. In the present study, speech database has been developed by recording the speech utterances of regional Kannada news corpus of different speakers. The speech recognition system has been implemented using the Hidden Markov Tool Kit. Two separate pronunciation dictionaries namely phone based and syllable based dictionaries are built in-order to design and evaluate the performances of phone-level and syllable-level sub-word acoustical models. Experiments have been carried out and results are analyzed by varying the number of Gaussian mixtures in each state of monophone Hidden Markov Model (HMM). Also, context dependent triphone HMM models have been built for the same Kannada speech corpus and the recognition accuracies are comparatively analyzed. Mel frequency cepstral coefficients along with their first and second derivative coefficients are used as feature vectors and are computed in acoustic front-end processing. The overall word recognition accuracy of 60.2 and 74.35 % respectively for monophone and triphone models have been obtained. The study shows a good improvement in the accuracy of isolated-word Kannada speech recognition system using triphone HMM models compared to that of monophone HMM models.  相似文献   

16.
基于动态贝叶斯网络的语音识别及音素切分研究   总被引:1,自引:1,他引:0  
研究了一种基于动态贝叶斯网络(dynamic bayesian networks, DBN)的语音识别建模方法,利用GMTK(graphical model tool kits)工具构建音素级音频流DBN语音训练和识别模型,同时与传统的基于隐马尔可夫的语音识别结果进行比较,并给出词与音素的切分结果.实验表明,在各种信噪比测试条件下,基于DBN的语音识别结果与基于HMM的语音识别结果相当,并表现出一定的抗噪性,音素的切分结果也比较准确.  相似文献   

17.
陈斌  牛铜  张连海  李弼程  屈丹 《自动化学报》2014,40(12):2899-2907
提出了一种基于动态加权的数据选取方法, 并应用到连续语音识别的声学模型区分性训练中. 该方法联合后验概率和音素准确率选取数据, 首先, 采用后验概率的Beam算法裁剪词图, 在此基础上依据候选词所在候选路径的错误率, 基于后验概率动态的赋予候选词不同的权值; 其次, 通过统计音素对之间的混淆程度, 给易混淆音素对动态地加以不同的惩罚权重, 计算音素准确率; 最后, 在估计得到弧段期望准确率分布的基础上, 采用高斯函数形式对所有竞争弧段的期望音素准确率软加权.实验结果表明, 与最小音素错误准则相比, 该动态加权方法识别准确率提高了0.61%, 可有效减少训练时间.  相似文献   

18.
汉语语音识别中的区分性声调建模方法   总被引:1,自引:0,他引:1       下载免费PDF全文
提出从特征提取参数、模型参数对隐马尔可夫声调模型进行区分型训练,来提高声调识别率;提出模型相关的权重对谱特征模型和声调模型的概率进行加权,并根据最小音子错误区分性目标函数对权重进行训练,来提高声调模型加入连续语音识别时的性能。声调识别实验表明区分性的声调模型训练以及特征提取方法显著提高了声调识别率。区分性模型权重训练能够在声调模型加入之后进一步连续语音识别系统的识别率。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号