期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Hybrid continuous speech recognition systems by HMM,MLP and SVM: a comparative study

Elyes Zarrouk Yassine Ben Ayed Faiez Gargouri 《International Journal of Speech Technology》2014,17(3):223-233

This paper presents a new hybrid method for continuous Arabic speech recognition based on triphones modelling. To do this, we apply Support Vectors Machine (SVM) as an estimator of posterior probabilities within the Hidden Markov Models (HMM) standards. In this work, we describe a new approach of categorising Arabic vowels to long and short vowels to be applied on the labeling phase of speech signals. Using this new labeling method, we deduce that SVM/HMM hybrid model is more efficient then HMMs standards and the hybrid system Multi-Layer Perceptron (MLP) with HMM. The obtained results for the Arabic speech recognition system based on triphones are 64.68 % with HMMs, 72.39 % with MLP/HMM and 74.01 % for SVM/HMM hybrid model. The WER obtained for the recognition of continuous speech by the three systems proves the performance of SVM/HMM by obtaining the lowest average for 4 tested speakers 11.42 %. 相似文献

2.

Automatic phonetic segmentation of Hindi speech using hidden Markov model

Archana Balyan S. S. Agrawal Amita Dev 《AI & Society》2012,27(4):543-549

相似文献

3.

Continuous speech recognition using linear dynamic models

Tao Ma Sundararajan Srinivasan Georgios Lazarou Joseph Picone 《International Journal of Speech Technology》2014,17(1):11-16

Hidden Markov models (HMMs) with Gaussian mixture distributions rely on an assumption that speech features are temporally uncorrelated, and often assume a diagonal covariance matrix where correlations between feature vectors for adjacent frames are ignored. A Linear Dynamic Model (LDM) is a Markovian state-space model that also relies on hidden state modeling, but explicitly models the evolution of these hidden states using an autoregressive process. An LDM is capable of modeling higher order statistics and can exploit correlations of features in an efficient and parsimonious manner. In this paper, we present a hybrid LDM/HMM decoder architecture that postprocesses segmentations derived from the first pass of an HMM-based recognition. This smoothed trajectory model is complementary to existing HMM systems. An Expectation-Maximization (EM) approach for parameter estimation is presented. We demonstrate a 13 % relative WER reduction on the Aurora-4 clean evaluation set, and a 13 % relative WER reduction on the babble noise condition. 相似文献

4.

基于隐马尔可夫链的广播新闻分割分类 总被引：4，自引：2，他引：4

庄越挺毛祎吴飞潘云鹤《计算机研究与发展》2002,39(9):1057-1063

提出了使用具有模拟随机时序数据良好能力的隐马尔可夫链来完成广播新闻分割分类的算法，首先使用含隐藏语义状态的隐马尔可夫链把原始广播新闻粗略分类成开始/结束和语音两部分，其次应用3个隐马尔可夫链，按照最大似然概率法把语音片段预识别为主持人介绍、广告和天气预报，最后由语义变化速率识别出新闻现场报道，完成广播新闻的精细分割分类任务。相似文献

5.

Joint evaluation of multiple speech patterns for speech recognition and training

Nishanth Ulhas Nair T.V. Sreenivas 《Computer Speech and Language》2010,24(2):307-340

We are addressing the novel problem of jointly evaluating multiple speech patterns for automatic speech recognition and training. We propose solutions based on both the non-parametric dynamic time warping (DTW) algorithm, and the parametric hidden Markov model (HMM). We show that a hybrid approach is quite effective for the application of noisy speech recognition. We extend the concept to HMM training wherein some patterns may be noisy or distorted. Utilizing the concept of “virtual pattern” developed for joint evaluation, we propose selective iterative training of HMMs. Evaluating these algorithms for burst/transient noisy speech and isolated word recognition, significant improvement in recognition accuracy is obtained using the new algorithms over those which do not utilize the joint evaluation strategy. 相似文献

6.

基于HMM和聚类的英语语音识别混合算法

下载免费PDF全文

朱祥《计算机测量与控制》2020,28(5):175-179

对于具有大量特征数据和复杂发音变化的英语语音,与单词相比,在隐马尔可夫模型(HMM)中存在更多问题,例如维特比算法的复杂度计算和高斯混合模型中的概率分布问题。为了实现基于HMM和聚类的独立于说话人的英语语音识别系统,提出了用于降低语音特征参数维数的分段均值算法、聚类交叉分组算法和HMM分组算法的组合形式。实验结果表明,与单个HMM模型相比,该算法不仅提高了英语语音的识别率近3%,而且提高系统的识别速度20.1%。相似文献

7.

Simplified scoring methods for HMM-based speech recognition

Pavel Paramonov Nadezhda Sutula 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2016,20(9):3455-3460

Most of the contemporary speech recognition systems exploit complex algorithms based on Hidden Markov Models (HMMs) to achieve high accuracy. However, in some cases rich computational resources are not available, and even isolated words recognition becomes challenging task. In this paper, we present two ways to simplify scoring in HMM-based speech recognition in order to reduce its computational complexity. We focus on core HMM procedure—forward algorithm, which is used to find the probability of generating observation sequence by given HMM, applying methods of dynamic programming. All proposed approaches were tested on Russian words recognition and the results were compared with those demonstrated by conventional forward algorithm. 相似文献

8.

Large margin hidden Markov models for speech recognition 总被引：1，自引：0，他引：1

Hui Jiang Xinwei Li Chaojun Liu 《IEEE transactions on audio, speech, and language processing》2006,14(5):1584-1595

In this paper, motivated by large margin classifiers in machine learning, we propose a novel method to estimate continuous-density hidden Markov model (CDHMM) for speech recognition according to the principle of maximizing the minimum multiclass separation margin. The approach is named large margin HMM. First, we show this type of large margin HMM estimation problem can be formulated as a constrained minimax optimization problem. Second, we propose to solve this constrained minimax optimization problem by using a penalized gradient descent algorithm, where the original objective function, i.e., minimum margin, is approximated by a differentiable function and the constraints are cast as penalty terms in the objective function. The new training method is evaluated in the speaker-independent isolated E-set recognition and the TIDIGITS connected digit string recognition tasks. Experimental results clearly show that the large margin HMMs consistently outperform the conventional HMM training methods. It has been consistently observed that the large margin training method yields significant recognition error rate reduction even on top of some popular discriminative training methods. 相似文献

9.

Maximum entropy direct models for speech recognition

Hong-Kwang Jeff Kuo Yuqing Gao 《IEEE transactions on audio, speech, and language processing》2006,14(3):873-881

Traditional statistical models for speech recognition have mostly been based on a Bayesian framework using generative models such as hidden Markov models (HMMs). This paper focuses on a new framework for speech recognition using maximum entropy direct modeling, where the probability of a state or word sequence given an observation sequence is computed directly from the model. In contrast to HMMs, features can be asynchronous and overlapping. This model therefore allows for the potential combination of many different types of features, which need not be statistically independent of each other. In this paper, a specific kind of direct model, the maximum entropy Markov model (MEMM), is studied. Even with conventional acoustic features, the approach already shows promising results for phone level decoding. The MEMM significantly outperforms traditional HMMs in word error rate when used as stand-alone acoustic models. Preliminary results combining the MEMM scores with HMM and language model scores show modest improvements over the best HMM speech recognizer. 相似文献

10.

Robust combination of neural networks and hidden Markov models for speech recognition 总被引：2，自引：0，他引：2

Trentin E. Gori M. 《Neural Networks, IEEE Transactions on》2003,14(6):1519-1531

Acoustic modeling in state-of-the-art speech recognition systems usually relies on hidden Markov models (HMMs) with Gaussian emission densities. HMMs suffer from intrinsic limitations, mainly due to their arbitrary parametric assumption. Artificial neural networks (ANNs) appear to be a promising alternative in this respect, but they historically failed as a general solution to the acoustic modeling problem. This paper introduces algorithms based on a gradient-ascent technique for global training of a hybrid ANN/HMM system, in which the ANN is trained for estimating the emission probabilities of the states of the HMM. The approach is related to the major hybrid systems proposed by Bourlard and Morgan and by Bengio, with the aim of combining their benefits within a unified framework and to overcome their limitations. Several viable solutions to the "divergence problem"-that may arise when training is accomplished over the maximum-likelihood (ML) criterion-are proposed. Experimental results in speaker-independent, continuous speech recognition over Italian digit-strings validate the novel hybrid framework, allowing for improved recognition performance over HMMs with mixtures of Gaussian components, as well as over Bourlard and Morgan's paradigm. In particular, it is shown that the maximum a posteriori (MAP) version of the algorithm yields a 46.34% relative word error rate reduction with respect to standard HMMs. 相似文献

11.

基于去噪技术的DSP语音识别系统设计

韦高梧冯祖勇《传感器与微系统》2017,36(1)

针对语音识别系统对抗环境噪声的实际需求,提出一种二次组合抗噪技术,研究并设计了一种以数字信号处理器(DSP)为硬件平台,以隐马尔可夫模型(HMM)为算法的抗噪声嵌入式语音识别系统.DSP采用型号为TMS320VC5509A的芯片,配以外围硬件电路构成语音识别系统的硬件平台.软件设计以离散隐马尔可夫模型(DHMM)为识别算法进行编程,系统软件主要有识别、训练、学习和USB四个主要模块.实验结果表明:基于二次组合去噪技术的语音识别系统有更好的抗噪声效果. 相似文献

12.

Spoken query based word spotting in digitized Tamil documents

AN. Sigappi S. Palanivel 《AI & Society》2014,29(1):113-121

This paper presents an integrated approach to spot the spoken keywords in digitized Tamil documents by combining word image matching and spoken word recognition techniques. The work involves the segmentation of document images into words, creation of an index of keywords, and construction of word image hidden Markov model (HMM) and speech HMM for each keyword. The word image HMMs are constructed using seven dimensional profile and statistical moment features and used to recognize a segmented word image for possible inclusion of the keyword in the index. The spoken query word is recognized using the most likelihood of the speech HMMs using the 39 dimensional mel frequency cepstral coefficients derived from the speech samples of the keywords. The positional details of the search keyword obtained from the automatically updated index retrieve the relevant portion of text from the document during word spotting. The performance measures such as recall, precision, and F-measure are calculated for 40 test words from the four groups of literary documents to illustrate the ability of the proposed scheme and highlight its worthiness in the emerging multilingual information retrieval scenario. 相似文献

13.

基于动态贝叶斯网络的音视频连续语音识别和音素切分

吕国云蒋冬梅蒋晓悦赵荣椿侯云舒孙阿利 H. Sahli W. Verhelst 《计算机应用》2007,27(7):1670-1673

构造了两个单流单音素的动态贝叶斯网络（DBN）模型，以实现基于音频和视频特征的连续语音识别，并在描述词和对应音素具体关系的基础上，实现对音素的时间切分。实验结果表明，在基于音频特征的识别率方面：在低信噪比（0~15dB）时，DBN模型的识别率比HMM模型平均高12.79%；而纯净语音下，基于DBN模型的音素时间切分结果和三音素HMM模型的切分结果很接近。对基于视频特征的语音识别，DBN模型的识别率比HMM识别率高2.47%。实验最后还分析了音视频数据音素时间切分的异步关系，为基于多流DBN模型的音视频连续语音识别和确定音频和视频的异步关系奠定了基础。相似文献

14.

混合语音识别模型的设计与仿真研究

宋志章马丽刘省非李奇楠《计算机仿真》2012,29(5):152-155

研究语音识别率问题,语音信号是一种非平稳信号,含有大量噪声信息,目前大多数识别算法线性理论,难以正确识别语音信号非线性变化过程,识别正确率低。通过将隐马尔可夫模型(HMM)和SVM相结合组成一个混合抗噪语音识别模型(HMM-SVM)。同时用HMM模型对语音信号时序进行建模,并得到待识别语音信号的输出概率,然后将输出概率作为SVM的输入进行学习,得到语音分类信息,最后通过利用HMM-SVM识别结果做出正确识别决策。仿真结果表明,HMM-SVM提高语音识别正确率,尤其在低信噪比环境下,明显改善了语音识别系统的性能。相似文献

15.

基于贝叶斯方法的鲁棒语音切分 总被引：1，自引：0，他引：1

张文军谢剑英李聪《数据采集与处理》2002,17(3):260-264

在基于隐马尔科夫模型的语音切分基础上，融合了不受噪声干扰的先验切分模型，提出了基于贝叶斯方法的语间切分方法。在贝叶斯切分方法的框架内，作者首先对语音序列进行了变换，将由切分点构成的序列变为由音节长度构成的序列。然后，假设音节长度序列符合一阶马尔科夫过程，经过归一化处理后，求出了切分的先验概率公式，得到了贝叶斯方法的切分模型。在噪声环境下的实验证明，由于切分模型独立于噪声，对在噪声环境下声学模型的失配提供了很好的补偿，使得语音切分的鲁棒性大大增加。相似文献

16.

支持向量机语音识别算法在DM6446上的实现

下载免费PDF全文

牛砚波张雪英刘晓峰《计算机工程与应用》2012,48(20):67-69,86

针对语音识别系统对实时性和便携性的要求,提出一种基于MFCC/SVM在DM6446嵌入式系统开发平台上的实现方法,实现了一个面向非特定人的语音识别系统,将有向无环图多类分类支持向量机算法移植到该平台。并在该平台用DAG方法对非特定人孤立词和连接词进行语音识别,比隐马尔可夫模型有明显优势。通过样本预选取算法对训练样本进行预选取处理,并且应用到嵌入式语音识别系统中,大大降低了训练时间和测试时间。相似文献

17.

双模态语音识别中乘积HMM权重系数与瞬时SNR的关系研究

赵晖顾亚强唐朝京《计算机应用》2009,29(Z2)

在有噪声污染等复杂情况下,为了能够得到更高的语音识别率,提出了一种新的乘积隐马尔可夫模型(HMM)用于双模态语音识别,研究并确定了模型中权重系数与瞬时信噪比(SNR)之间的关系.该模型在独立训练音频和视频HMM的基础上,建立二雏训练模型,并使用重估策略保证更高的准确性.同时引入广义几率递减(GPD)算法,调整音视频特征的权重系数.实验结果表明,提出的方法在噪声环境下体现出了良好稳定的识别性能. 相似文献

18.

Off-line recognition of realistic Chinese handwriting using segmentation-free strategy

Tong-Hua Su^{Author Vitae} Tian-Wen Zhang Author Vitae Hu-Jie Huang 《Pattern recognition》2009,42(1):167-182

Great challenges are faced in the off-line recognition of realistic Chinese handwriting. This paper presents a segmentation-free strategy based on Hidden Markov Model (HMM) to handle this problem, where character segmentation stage is avoided prior to recognition. Handwritten textlines are first converted to observation sequence by sliding windows. Then embedded Baum-Welch algorithm is adopted to train character HMMs. Finally, best character string maximizing the a posteriori is located through Viterbi algorithm. Experiments are conducted on the HIT-MW database written by more than 780 writers. The results show the feasibility of such systems and reveal apparent complementary capacities between the segmentation-free systems and the segmentation-based ones. 相似文献

19.

汉语连续语音识别中经典HMM的实验评测 总被引：2，自引：1，他引：1

郝杰李星《计算机工程与应用》2001,37(13):1-4,101

定量地分析与评价经典隐马尔可夫模型（ＨｉｄｄｅｎＭａｒｋｏｖＭｏｄｅｌ,ＨＭＭ）的性能,是汉语连续语音识别研究中尚未解决并且亟需解决的问题。文章构造了基于经典ＨＭＭ模型的汉语连续语音识别系统。针对语音单元和输出概率这两个自由度上的各种组合,研究了经典ＨＭＭ模型的复杂度、稳健性、精确性与训练集合的数据量、训练时间、解码效率等特性之间的关系;并且通过实验分析了多候选的构造和剪枝的意义。该文构造的系统与具有国内最高水平的ＴＨＥＥＳＰ系统的识别率相当,所得实验结果和结论为汉语语音识别的深入研究提供了必要的参考和依据。相似文献

20.

Real-time Unsupervised Segmentation of human whole-body motion and its application to humanoid robot acquisition of motion symbols

《Robotics and Autonomous Systems》2016

An interactive loop between motion recognition and motion generation is a fundamental mechanism for humans and humanoid robots. We have been developing an intelligent framework for motion recognition and generation based on symbolizing motion primitives. The motion primitives are encoded into Hidden Markov Models (HMMs), which we call “motion symbols”. However, to determine the motion primitives to use as training data for the HMMs, this framework requires a manual segmentation of human motions. Essentially, a humanoid robot is expected to participate in daily life and must learn many motion symbols to adapt to various situations. For this use, manual segmentation is cumbersome and impractical for humanoid robots. In this study, we propose a novel approach to segmentation, the Real-time Unsupervised Segmentation (RUS) method, which comprises three phases. In the first phase, short human movements are encoded into feature HMMs. Seamless human motion can be converted to a sequence of these feature HMMs. In the second phase, the causality between the feature HMMs is extracted. The causality data make it possible to predict movement from observation. In the third phase, movements having a large prediction uncertainty are designated as the boundaries of motion primitives. In this way, human whole-body motion can be segmented into a sequence of motion primitives. This paper also describes an application of RUS to AUtonomous Symbolization of motion primitives (AUS). Each derived motion primitive is classified into an HMM for a motion symbol, and parameters of the HMMs are optimized by using the motion primitives as training data in competitive learning. The HMMs are gradually optimized in such a way that the HMMs can abstract similar motion primitives. We tested the RUS and AUS frameworks on captured human whole-body motions and demonstrated the validity of the proposed framework. 相似文献