共查询到17条相似文献,搜索用时 234 毫秒
1.
构造了两个单流单音素的动态贝叶斯网络(DBN)模型,以实现基于音频和视频特征的连续语音识别,并在描述词和对应音素具体关系的基础上,实现对音素的时间切分。实验结果表明,在基于音频特征的识别率方面:在低信噪比(0~15dB)时,DBN模型的识别率比HMM模型平均高12.79%;而纯净语音下,基于DBN模型的音素时间切分结果和三音素HMM模型的切分结果很接近。对基于视频特征的语音识别,DBN模型的识别率比HMM识别率高2.47%。实验最后还分析了音视频数据音素时间切分的异步关系,为基于多流DBN模型的音视频连续语音识别和确定音频和视频的异步关系奠定了基础。 相似文献
2.
构建了一种新的基于动态贝叶斯网络(Dynamic Bayesian Network,DBN)的异步整词-发音特征语音识别模型AWA-DBN(每个词由其发音特征的运动来描述),定义了各发音特征节点及异步检查节点的条件概率分布。在标准数字语音库Aurora5.0上的语音识别实验表明,与整词-状态DBN(WS-DBN,每个词由固定个数的整词状态构成)和整词-音素DBN(WP-DBN,每个词由其对应的音素序列构成)模型相比,WS-DBN模型虽然具有最高的识别率,但其只适用于小词汇量孤立词语音识别,AWA-DBN和WP-DBN可以为大词汇量连续语音建模,而AWA-DBN模型比WP-DBN模型具有更高的语音识别率和系统鲁棒性。 相似文献
3.
基于动态贝叶斯网络的语音识别及音素切分研究* 总被引:1,自引:1,他引:0
研究了一种基于动态贝叶斯网络(dynamic bayesian networks, DBN)的语音识别建模方法,利用GMTK(graphical model tool kits)工具构建音素级音频流DBN语音训练和识别模型,同时与传统的基于隐马尔可夫的语音识别结果进行比较,并给出词与音素的切分结果.实验表明,在各种信噪比测试条件下,基于DBN的语音识别结果与基于HMM的语音识别结果相当,并表现出一定的抗噪性,音素的切分结果也比较准确. 相似文献
4.
考虑连续语音中的协同发音问题,提出基于词内扩展的单流上下文相关三音素动态贝叶斯网络(SS-DBN-TRI)模型和词间扩展的单流上下文相关三音素DBN(SS-DBN-TRI-CON)模型。SS-DBN-TRI模型是Bilmes提出单流DBN(SS-DBN)模型的改进,采用词内上下文相关三音素节点替代单音素节点,每个词由它的对应三音素单元构成,而三音素单元和观测向量相联系;SS-DBN-TRI-CON模型基于SS-DBN模型,通过增加当前音素的前音素节点和后音素节点,构成一个新的词间扩展的三音素变量节点,新的三音素节点和观测向量相联系,采用高斯混合模型来描述,采用数字连续语音数据库的实验结果表明:SS-DBN-TRI-CON具备最好的语音识别性能。 相似文献
5.
6.
近年来,由于动态贝叶斯网络(DBN)相对于传统的隐马尔可夫模型(HMM)更具可解释性、可分解性以及可扩展性,基于DBN的语音识别引起学者们越来越多的关注.但是,目前关于基于DBN的语音识别的研究主要集中在孤立语音识别上,连续语音识别的框架和识别算法还远没有HMM成熟和灵活.为了解决基于DBN的连续语音识别的灵活性和可扩展性,将在基于HMM的连续语音识别中很好地解决了上述问题的Token传递模型加以修改,使之适用于DBN.在该模型基础上,为基于DBN的连续语音识别提出了一个基本框架,并在此框架下提出了一个新的独立于上层语言模型的识别算法.还介绍了作者开发的一套基于该框架的可用于连续语音识别及其他时序系统的工具包DTK. 相似文献
7.
为实现文本/语音驱动的说话人头部动画,本文提出基于贝叶斯切线形状模型的口形轮廓特征提取方法和基于动态贝叶斯网络(Dynamic Bayesian Network, DBN)模型的唇读系统。在描述词与它的组成视素关系的基础上,得到视素时间切分序列。为比较性能,音素DBN模型和HMM的音素识别结果被影射成视素序列。在评价准则上,提出绝对视素切分正确性和基于图像与嘴唇几何特征两种相对视素切分正确性的评价标准。实验表明,DBN模型识别性能优于HMM,而基于视素的DBN模型能为说话人头部动画提供最好的口形。 相似文献
8.
提出一种基于隐马尔可夫模型(Hidden Markov model,HMM)和人工神经网络(Artificial Neural Network,ANN)混合模型的汉语大词表连续语音识别系统.在混合模型系统中,多种模型协同工作.ANN负责建模音素发音物理特性,HMM联合语言学模型识别待识语料.这样,混合模型系统能够结合HMM和ANN两种模型的优点:HMM对时间序列结构建模能力强;ANN的非线性预测能力强,建模能力强,鲁棒性,便于硬件实现.实验结果表明,HMM/ANN混合模型系统有效结合了两种模型的优点,提高了识别率. 相似文献
9.
构建一种基于发音特征的音视频双流动态贝叶斯网络(DBN)语音识别模型(AFAV_DBN),定义节点的条件概率关系,使发音特征状态的变化可以异步.在音视频语音数据库上的语音识别实验表明,通过调整发音特征之问的异步约束,AF- AV_DBN模型能得到比基于状态的同步和异步DBN模型以及音频单流模型更高的识别率,对噪声也具有... 相似文献
10.
11.
12.
Foyzul Hassan Mohammed Rokibul Alam Kotwal Ghulam Muhammad Mohammad Nurul Huda 《International Journal of Speech Technology》2011,14(3):183-191
Building a continuous speech recognizer for the Bangla (widely used as Bengali) language is a challenging task due to the
unique inherent features of the language like long and short vowels and many instances of allophones. Stress and accent vary
in spoken Bangla language from region to region. But in formal read Bangla speech, stress and accents are ignored. There are
three approaches to continuous speech recognition (CSR) based on the sub-word unit viz. word, phoneme and syllable. Pronunciation
of words and sentences are strictly governed by set of linguistic rules. Many attempts have been made to build continuous
speech recognizers for Bangla for small and restricted tasks. However, medium and large vocabulary CSR for Bangla is relatively
new and not explored. In this paper, the authors have attempted for building automatic speech recognition (ASR) method based
on context sensitive triphone acoustic models. The method comprises three stages, where the first stage extracts phoneme probabilities
from acoustic features using a multilayer neural network (MLN), the second stage designs triphone models to catch context
of both sides and the final stage generates word strings based on triphone hidden Markov models (HMMs). The objective of this
research is to build a medium vocabulary triphone based continuous speech recognizer for Bangla language. In this experimentation
using Bangla speech corpus prepared by us, the recognizer provides higher word accuracy as well as word correct rate for trained
and tested sentences with fewer mixture components in HMMs. 相似文献
13.
大词汇量连续语音识别系统的性能很大程度上取决于语音库的质量,而语音库设计的中心环节就是语料选取。但是传统语料选取方法往往考虑因素单一,不利于语音识别系统有效利用语言信息。本语音库的语料选取方法综合考虑了多种因素:三音子覆盖率、三音子覆盖效率、三音子稀疏度、常用词分布等,并完全实现程序自动选取,充分利用了原始语料,使选取结果的信息量更加丰富。程序自动选取结果可以覆盖94.1%的三音子,75.4%的最常用词,覆盖效率和稀疏度也比传统方法有了较大改善。 相似文献
14.
Ananthakrishna Thalengala Kumara Shama 《International Journal of Speech Technology》2016,19(4):817-826
The speech recognition system basically extracts the textual information present in the speech. In the present work, speaker independent isolated word recognition system for one of the south Indian language—Kannada has been developed. For European languages such as English, large amount of research has been carried out in the context of speech recognition. But, speech recognition in Indian languages such as Kannada reported significantly less amount of work and there are no standard speech corpus readily available. In the present study, speech database has been developed by recording the speech utterances of regional Kannada news corpus of different speakers. The speech recognition system has been implemented using the Hidden Markov Tool Kit. Two separate pronunciation dictionaries namely phone based and syllable based dictionaries are built in-order to design and evaluate the performances of phone-level and syllable-level sub-word acoustical models. Experiments have been carried out and results are analyzed by varying the number of Gaussian mixtures in each state of monophone Hidden Markov Model (HMM). Also, context dependent triphone HMM models have been built for the same Kannada speech corpus and the recognition accuracies are comparatively analyzed. Mel frequency cepstral coefficients along with their first and second derivative coefficients are used as feature vectors and are computed in acoustic front-end processing. The overall word recognition accuracy of 60.2 and 74.35 % respectively for monophone and triphone models have been obtained. The study shows a good improvement in the accuracy of isolated-word Kannada speech recognition system using triphone HMM models compared to that of monophone HMM models. 相似文献
15.
维吾尔语是黏着性语言,利用丰富的词缀可以用同样的词干产生超大词汇,给维吾尔语语音识别的研究工作带来了很大困难。结合维吾尔语自身特点,建立了维吾尔语连续语音语料库,利用HTK(HMMToolKit)工具实现了基于隐马尔可夫模型(HMM)的维吾尔语连续语音识别系统。在声学层,选取三音子作为基本的识别单元,建立了维吾尔语的三音子声学模型,并使用决策树、三音子绑定、修补哑音、增加高斯混合分量等方法提高模型的识别精度。在语言层,使用了适合于维吾尔语语音特征的基于统计的二元文法语言模型。最后,利用该系统进行了维吾尔语连续语音识别实验。 相似文献
16.
17.
基于发音特征的音/视频双流语音识别模型* 总被引:1,自引:0,他引:1
构建了一种基于发音特征的音/视频双流动态贝叶斯网络(dynamic Bayesian network, DBN)语音识别模型,定义了各节点的条件概率关系,以及发音特征之间的异步约束关系,最后在音/视频连接数字语音数据库上进行了语音识别实验,并与音频单流、视频单流DBN模型比较了在不同信噪比情况下的识别效果。结果表明,在低信噪比情况下,基于发音特征的音/视频双流语音识别模型表现出最好的识别性能,而且随着噪声的增加,其识别率下降的趋势比较平缓,表明该模型对噪声具有很强的鲁棒性,更适用于低信噪比环境下的语音识别 相似文献