首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Obtaining training material for rarely used English words and common given names from countries where English is not spoken is di?cult due to excessive time, storage and cost factors. By considering pe...  相似文献   

2.
语音识别中动态时间规整和隐马尔可夫统一模型   总被引:1,自引:0,他引:1  
对于目前在语音识别中广泛使用的两种技术即动态时间规整(DTW)技术和隐马尔可夫模型(HMM)的本质联系,提出了二者的统一模型(DHUM,DTW and HMM Uni-fied Model),并分别给出DTW和HM向DHUM的转换关系。文中还提出了用DHUM解决更接近语音实际情况的高阶HMM作语音识别时所面临的运算量过大的问题。中等词表的识别实验结果表明,建立在DHUM之上的识别器的识别性能不低于  相似文献   

3.
To recognize speech, handwriting or sign language, many hybrid approaches have been proposed that combine Dynamic Time Warping (DTW) or Hidden Markov Models (HMM) with discriminative classifiers. However, all methods rely directly on the likelihood models of DTW/HMM. We hypothesize that time warping and classification should be separated because of conflicting likelihood modelling demands. To overcome these restrictions, we propose to use Statistical DTW (SDTW) only for time warping, while classifying the warped features with a different method. Two novel statistical classifiers are proposed (CDFD and Q-DFFM), both using a selection of discriminative features (DF), and are shown to outperform HMM and SDTW. However, we have found that combining likelihoods of multiple models in a second classification stage degrades performance of the proposed classifiers, while improving performance with HMM and SDTW. A proof-of-concept experiment, combining DFFM mappings of multiple SDTW models with SDTW likelihoods, shows that also for model-combining, hybrid classification can provide significant improvement over SDTW. Although recognition is mainly based on 3D hand motion features, these results can be expected to generalize to recognition with more detailed measurements such as hand/body pose and facial expression.  相似文献   

4.
基于动态时间规整的手势加速度信号识别   总被引:1,自引:0,他引:1  
为了提高基于加速度传感器的动态手势识别算法的性能,本文采用了动态时间规整(DTW)识别算法。通过该算法计算测试模板和参考模板的相似度,从而得出识别结果。为了验证该方法,建立了一套手势加速度无线采集系统,并采集了41个志愿者的手势信息。实验结果表明,该方法手势平均识别率在97%以上。与HMM识别算法相比,DTW识别算法在识别的准确率上比HMM识别算法更具优势。  相似文献   

5.
基于关键帧的多级分类手语识别研究*   总被引:6,自引:1,他引:6  
提出了一种基于关键帧识别的多级分类的手语识别方法,该方法采用HDR(多层判别回归)/DTW(动态时间规正)模板匹配多级分类方法。根据手语表达由多帧构成的特点,采用SIFT(尺度不变特征变换)算法定位获取手语词汇的关键帧,并提取其特征向量;根据手语词汇的关键帧采用HDR方法缩小搜索范围,然后采用DTW比较待识别的手语词特征与该范围内每一个手语词进行匹配比较,计算概率最大的为识别结果。这种方法在相同识别率的情况下比HMM识别方法速度提高近8.2%,解决了模板匹配法在大词汇量面前识别率快速下降的问题。  相似文献   

6.
We are addressing the novel problem of jointly evaluating multiple speech patterns for automatic speech recognition and training. We propose solutions based on both the non-parametric dynamic time warping (DTW) algorithm, and the parametric hidden Markov model (HMM). We show that a hybrid approach is quite effective for the application of noisy speech recognition. We extend the concept to HMM training wherein some patterns may be noisy or distorted. Utilizing the concept of “virtual pattern” developed for joint evaluation, we propose selective iterative training of HMMs. Evaluating these algorithms for burst/transient noisy speech and isolated word recognition, significant improvement in recognition accuracy is obtained using the new algorithms over those which do not utilize the joint evaluation strategy.  相似文献   

7.
为提高大词汇量手语识别速度,论文提出了一种将动态时间规整(DTW)和隐马尔可夫模型(HMM)相结合的多层次的大词汇量手语识别方法。该方法思想是先进行全局粗略搜索,将要识别的手势词归入某一组范围较小的词表中,然后通过更加精确的HMM局部搜索将词识别出来。各个词汇表用DTW/ISODATA算法来产生。对4942个孤立手语词作了实验,结果表明,相对于仅用HMM单层识别而言,识别速度从原来每个词的2.364秒提高到0.137秒,提高了94.2%,识别准确率也提高了4.66%。  相似文献   

8.
This paper studies some pattern recognition algorithms for on-line signature recognition: vector quantization (VQ), nearest neighbor (NN), dynamic time warping (DTW) and hidden Markov models (HMM). We have used a database of 330 users which includes 25 skilled forgeries performed by five different impostors. This database is larger than the typical ones found in the literature.Experimental results reveal that our first proposed combination of VQ and DTW (by means of score fusion) outperforms the other algorithms (DTW, HMM) and achieves a minimum detection cost function (DCF) value equal to 1.37% for random forgeries and 5.42% for skilled forgeries. In addition, we present another combined DTW-VQ scheme which enables improvement of privacy for remote authentication systems, avoiding the submission of the whole original dynamical signature information (using codewords, instead of feature vectors). This system achieves similar performance than DTW.  相似文献   

9.
提出一种基于特定人的内窥镜自动定位语音识别系统,通过识别特定医生的语音控制口令实现内窥镜的定位,为手持内窥镜操作提供更加智能化的解决方案。在识别算法上提出了参考模板归一化平均的动态时间规划(Normalized Average-Dynamic Time Warping,NA-DTW)算法,可获得更高的识别率,系统以片上Windows?CE操作系统和ARM作为系统的软硬件平台。实验通过对10个不同测试人的共1 250组测试数据进行识别检测,NA-DTW算法与传统DTW算法相比,识别率从96.6%提高到99.76%,运算时间从469 ms缩短到241 ms。验证了NA-DTW算法可以完成基于特定人、孤立词的语音识别功能,并满足嵌入式系统中的实时检测条件。  相似文献   

10.
提出了一种新的基于LBG和DTW结合的模板训练算法,包括模板训练、初始模板设置、空子集处理三个部分,能够完整、有效地解决语音识别中模板训练的问题。该算法实现了语音信号特征矩阵的聚类及其质心的生成,使孤立词语音识别系统更好地适用于非特定人的情况,提高了系统对训练集外说话人语音的正确识别率。设计、实现了一个识别系统,模板训练中较快的收敛速度和系统较高的识别率验证了算法的优良性能。  相似文献   

11.
朱淑琴  赵瑛 《微计算机信息》2012,(5):150-151,163
研究动态时间规整(Dynamic Time Warping)语音识别算法问题,传统动态时间规整方法需要存储较大的矩阵,直接计算将会占据较大的空间,计算量也比较大,对系统硬件要求比较高。为了减小DTW算法的运算量,提高识别速度,对DTW语音识别算法进行优化改进。将局部路径约束和整体路径约束相结合,仅在一个规定的宽度内搜索动态规划路径,计算累积匹配距离。仿真实验结果表明该方法不仅可以降低运算负载,提高识别速度,而且能在一定程度上提高语音识别率。  相似文献   

12.
基于一种改进禁忌搜索算法优化离散隐马尔可夫模型   总被引:1,自引:0,他引:1  
隐马尔可夫模型(HMM,HiddenMarkovModel)是语音识别和手势识别中广泛使用的统计模式识别方法。文章提出了一种改进的禁忌搜索(ITS,ImprovedTabuSearch)优化HMM的参数。传统的TabuSearch(TS)与局部搜索算法(极大似然法)交替进行,从而加快了算法的收敛速度,并得到优化解。分别用TS及ITS训练隐马尔可夫模型进行动态手势识别。结果表明ITS可获得更高的识别率,且能达到全局优化。  相似文献   

13.
李宏言  盛利元  陈妮 《计算机工程与设计》2007,28(19):4702-4704,4737
针对传统DTW语音识别方法的运算量和存储空间大的缺陷,提出一种基于矢量量化和查找表的改进DTW方法.方法利用矢量量化操作将连续特征矢量空间转化成离散矢量空间,以降低模式存储空间,在此基础上建立矢量失真测度表,并通过Hash查表方式实现了地址空间的精确定位,从而省去了动态规划操作造成的大量距离测度计算,极大提高了识别匹配速度.理论分析和实验结果证明了改进方法的有效性.同时为研究方便,在Matlab平台下设计和开发了DTW实时语音识别系统.  相似文献   

14.
This paper presents a realistic visual speech synthesis based on the hybrid concatenation method. Unlike previous methods based on phoneme level unit selection or hidden Markov model (HMM), etc., the hybrid concatenation method uses a frame level-based unit selection method combined with a fused HMM, and is able to generate more expressive and stable facial animations. The fused HMM can be used to explicitly model the loose synchronization of tightly coupled streams, with much better results than a normal HMM for audiovisual mapping. After fused HMM is created, facial animation is generated via the unit selection method at the frame level by using the fused HMM output probabilities. To accelerate the computing efficiency of the unit selection on a large corpus, this paper also proposes a two-layer Viterbi search method in which only the subsets that have been selected in the first layer are further checked in the second layer. Using this idea, the system has been successfully integrated into real-time applications. Furthermore, the paper also proposes a mapping method to generate emotional facial expressions from neutral facial expressions based on Gaussian mixture models (GMMs). Final experiments prove that the method described can output synthesized facial parameters with high quality. Compared with other audiovisual mapping methods, our method has better performance with respect to expressiveness, stability, and system running speed.   相似文献   

15.
基于HMM与RBF的混合语音识别新方法   总被引:5,自引:0,他引:5  
提出了一种隐马尔可夫模型(HMM)和径向基函数神经网络(RBF)相结合的语音识别新方法。该方法首先利用HMM生成最佳语音状态序列,然后用函数逼近技术产生对最佳状态序列进行时间规正,最后通过RBF神经网络进行分类识别。理论和实验结果表明,该系统比HMM具有更好的识别效果,特别对提高易混淆词的识别性能尤为显著。  相似文献   

16.
研究语音识别率问题,语音信号是一种非平稳信号,含有大量噪声信息,目前大多数识别算法线性理论,难以正确识别语音信号非线性变化过程,识别正确率低。通过将隐马尔可夫模型(HMM)和SVM相结合组成一个混合抗噪语音识别模型(HMM-SVM)。同时用HMM模型对语音信号时序进行建模,并得到待识别语音信号的输出概率,然后将输出概率作为SVM的输入进行学习,得到语音分类信息,最后通过利用HMM-SVM识别结果做出正确识别决策。仿真结果表明,HMM-SVM提高语音识别正确率,尤其在低信噪比环境下,明显改善了语音识别系统的性能。  相似文献   

17.
研究了利用隐马尔可夫模型(HMM)对动态语音模式进行时间归一化的方法。引入了借助于HMM对语音基元观测序列所做的一种分段,这种分段被称之为语音基元观测序列的HMM全状态分段,并且定义了HMM全状态分段的符合度。根据HMM全状态分段的符合度确定了语音基元观测序列的最优HMM全状态分段,通过最优HMM全状态分段把语音基元观测序列转换为固定维数的向量,从而实现了动态语音模式的时间归一化。将动态语音模式的这一时间归一化方法在结合HMM和人工神经网络(ANN)的混合语音识别方法中进行了应用,实验结果表明这一时间归一化方法的有效性。  相似文献   

18.
In this paper, we propose a novel optimization algorithm called constrained line search (CLS) for discriminative training (DT) of Gaussian mixture continuous density hidden Markov model (CDHMM) in speech recognition. The CLS method is formulated under a general framework for optimizing any discriminative objective functions including maximum mutual information (MMI), minimum classification error (MCE), minimum phone error (MPE)/minimum word error (MWE), etc. In this method, discriminative training of HMM is first cast as a constrained optimization problem, where Kullback-Leibler divergence (KLD) between models is explicitly imposed as a constraint during optimization. Based upon the idea of line search, we show that a simple formula of HMM parameters can be found by constraining the KLD between HMM of two successive iterations in an quadratic form. The proposed CLS method can be applied to optimize all model parameters in Gaussian mixture CDHMMs, including means, covariances, and mixture weights. We have investigated the proposed CLS approach on several benchmark speech recognition databases, including TIDIGITS, Resource Management (RM), and Switchboard. Experimental results show that the new CLS optimization method consistently outperforms the conventional EBW method in both recognition performance and convergence behavior.  相似文献   

19.
We present a new framework for joint analysis of throat and acoustic microphone (TAM) recordings to improve throat microphone only speech recognition. The proposed analysis framework aims to learn joint sub-phone patterns of throat and acoustic microphone recordings through a parallel branch HMM structure. The joint sub-phone patterns define temporally correlated neighborhoods, in which a linear prediction filter estimates a spectrally rich acoustic feature vector from throat feature vectors. Multimodal speech recognition with throat and throat-driven acoustic features significantly improves throat-only speech recognition performance. Experimental evaluations on a parallel TAM database yield benchmark phoneme recognition rates for throat-only and multimodal TAM speech recognition systems as 46.81% and 60.69%, respectively. The proposed throat-driven multimodal speech recognition system improves phoneme recognition rate to 52.58%, a significant relative improvement with respect to the throat-only speech recognition benchmark system.  相似文献   

20.
解本铭  韩明明  张攀  张威 《计算机应用》2018,38(6):1771-1776
为研究飞机牵引车智能语音控制,实现机场环境下牵引车对飞行员语音命令的精确、高效识别,同时针对传统动态时间规整(DTW)算法计算量大、时间复杂度高、算法识别效率低的问题,提出了一种车辆语音识别的六边形弯曲窗口约束DTW优化算法。首先,从DTW算法原理、牵引车指令的语音特性和机场环境三方面,分析了弯曲窗口对DTW算法识别精度、效率的影响;然后,在Itakura Parallelogram菱形弯曲窗口约束DTW优化算法的基础上,进一步提出了六边形弯曲窗口约束的DTW全局优化算法;最后,通过改变优化系数,实现了最优六边形弯曲窗口约束的DTW算法方案。基于孤立词识别的实验结果表明,所提最优算法与传统DTW算法、菱形弯曲窗口约束的DTW算法相比,识别错误率分别降低77.14%和69.27%,识别效率分别提高48.92%和27.90%。该最优算法更具鲁棒性、时效性,可以作为飞机牵引车智能控制的理想指令输入端口。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号