首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到17条相似文献,搜索用时 140 毫秒
1.
基于HMM和遗传神经网络的语音识别系统   总被引:1,自引:0,他引:1  
本文提出了一种基于隐马尔可夫(HMM)和遗传算法优化的反向传播网络(GA-BP)的混合模型语音识别方法。该方法首先利用HMM对语音信号进行时序建模,并计算出语音对HMM的输出概率的评分,将得到的概率评分作为优化后反向传播网络的输入,得到分类识别信息,最后根据混合模型的识别算法作出识别决策。通过Matlab软件对已有的样本数据进行训练和测试。仿真结果表明,由于设计充分利用了HMM时间建模能力强和GA-BP神经网络分类能力强等特点,该混合模型比单纯的HMM具有更强的抗噪性,克服了神经网络的局部最优问题,大大提高了识别的速度,明显改善了语音识别系统的性能。  相似文献   

2.
语音识别关键技术研究   总被引:11,自引:0,他引:11  
采用隐马尔可夫模型(HMM)进行语音声学建模是大词汇连续语音识别取得突破性进展最主要的原因之一,HMM本身依赖的某些不合理建模假设和不具有区分性的训练算法正在成为制约语音识别系统未来发展的瓶颈。神经网络依靠权能够进行长时间记忆和知识存储,但对于输入模式的瞬时响应的记忆能力比较差。采用混合HMM/ANN模型对HMM的一些不尽合理的建模假设和训练算法进行了革新。混合模型用神经网络非参数概率模型代替高斯混合器(GM)计算HMM的状态所需要的观测概率。另外对神经网络的结构进行了优化,取得了很好的效果。  相似文献   

3.
该文研究了基于数据模拟方法和HMM(隐马尔科夫模型)自适应的电话信道条件下语音识别问题。模拟数据模仿了纯净语音在不同电话信道条件下的语音行为。各基线系统的HMM模型分别由纯净语音和模拟语音训练而成。语音识别实验评估了各基线系统HMM模型在采用MLLR算法(最大似然线性回归)做无监督式自适应前后的识别性能。实验证明,由纯净语音转换生成的模拟语音有效地减小了训练语音和测试语音声学性质的不匹配,很大程度上提高了电话语音识别率。基线模型的自适应结果显示模拟数据的自适应性能比纯净语音自适应的性能最大提高达到9.8%,表明了电话语音识别性能的进一步改善和系统稳健性的提高。  相似文献   

4.
为了解决语音信号中帧与帧之间的重叠,提高语音信号的自适应能力,本文提出基于隐马尔可夫(HMM)与遗传算法神经网络改进的语音识别系统.该改进方法主要利用小波神经网络对Mel频率倒谱系数(MFCC)进行训练,然后利用HMM对语音信号进行时序建模,计算出语音对HMM的输出概率的评分,结果作为遗传神经网络的输入,即得语音的分类识别信息.实验结果表明,改进的语音识别系统比单纯的HMM有更好的噪声鲁棒性,提高了语音识别系统的性能.  相似文献   

5.
基于乘积HMM的双模态语音识别方法   总被引:3,自引:2,他引:1       下载免费PDF全文
针对噪声环境中的语音识别,提出一种用于双模态语音识别的乘积隐马尔可夫模型(HMM)。在独立训练音频HMM和视频HMM的基础上,建立二维训练模型,表征音频流和视频流之间的异步特性。引入权重系数,根据不同噪声环境自适应调整音频流与视频流的权重。实验结果证明,与其他双模态语音识别方法相比,该方法的识别性能更高。  相似文献   

6.
姚煜  RYAD Chellali 《计算机应用》2018,38(9):2495-2499
针对隐马尔可夫模型(HMM)在语音识别中存在的不合理条件假设,进一步研究循环神经网络的序列建模能力,提出了基于双向长短时记忆神经网络的声学模型构建方法,并将联结时序分类(CTC)训练准则成功地应用于该声学模型训练中,搭建出不依赖于隐马尔可夫模型的端到端中文语音识别系统;同时设计了基于加权有限状态转换器(WFST)的语音解码方法,有效解决了发音词典和语言模型难以融入解码过程的问题。与传统GMM-HMM系统和混合DNN-HMM系统对比,实验结果显示该端到端系统不仅明显降低了识别错误率,而且大幅提高了语音解码速度,表明了该声学模型可以有效地增强模型区分度和优化系统结构。  相似文献   

7.
黄光球  汪晓海 《计算机工程》2007,33(10):131-133,163
提出了基于BP-HMM模型的网络入侵检测方法,给出了该模型的训练和识别方法.由于纯粹的HMM建立的分类器不能兼顾每个模型对其对应目标有很强的识别能力和模型之间差异性的最大化,因此将BP神经网络集成到HMM框架中,用BP网络为HMM提供状态概率输出.通过BP网络的粗分类,克服了HMM的缺陷,提高了系统的分类识别能力.  相似文献   

8.
针对多数语音识别系统在噪音环境下性能急剧下降的问题,提出了一种新的语音识别特征提取方法。该方法是建立在听觉模型的基础上,通过组合语音信号和其差分信号的上升过零率获得频率信息,通过峰值检测和非线性幅度加权来获取强度信息,二者组合在一起,得到输出语音特征,再分别用BP神经网络和HMM进行训练和识别。仿真实现了不同信噪比下不依赖人的50词的语音识别,给出了识别的结果,证明了组合差分信息的过零与峰值幅度特征具有较强的抗噪声性能。  相似文献   

9.
通过MFFC计算出的语音特征系数,由于语音信号的动态性,帧之间有重叠,噪声的影响,使特征系数不能完全反映出语音的信息。提出一种隐马尔可夫模型(HMM)和小波神经网络(WNN)混合模型的抗噪语音识别方法。该方法对MFCC特征系数利用小波神经网络进行训练,得到新的MFCC特征系数。实验结果表明,在噪声环境下,该混合模型比单纯HMM具有更强的噪声鲁棒性,明显改善了语音识别系统的性能。  相似文献   

10.
一种基于HMM和ANN的语音情感识别分类器   总被引:2,自引:0,他引:2  
罗毅 《微计算机信息》2007,23(34):218-219,296
针对在语音情感识别中孤立使用隐马尔科夫模型(HMM)固有的分类特性较差的缺点,本文提出了利用隐马尔科夫模型和径向基函数神经网络(RBF)对惊奇,愤怒,喜悦,悲伤,厌恶5种语音情感进行识别的方法。该方法借助HMM规整语音情感特征向量,并用RBF作为最终的决策分类器。实验结果表明在本文的实验条件下此方法和孤立HMM相比具有更好的性能,厌恶的识别率有了较大改进。  相似文献   

11.
Links between Markov models and multilayer perceptrons   总被引:2,自引:0,他引:2  
The statistical use of a particular classic form of a connectionist system, the multilayer perceptron (MLP), is described in the context of the recognition of continuous speech. A discriminant hidden Markov model (HMM) is defined, and it is shown how a particular MLP with contextual and extra feedback input units can be considered as a general form of such a Markov model. A link between these discriminant HMMs, trained along the Viterbi algorithm, and any other approach based on least mean square minimization of an error function (LMSE) is established. It is shown theoretically and experimentally that the outputs of the MLP (when trained along the LMSE or the entropy criterion) approximate the probability distribution over output classes conditioned on the input, i.e. the maximum a posteriori probabilities. Results of a series of speech recognition experiments are reported. The possibility of embedding MLP into HMM is described. Relations with other recurrent networks are also explained  相似文献   

12.
This paper discusses the use of an integrated HMM/NN classifier for speech recognition. The proposed classifier combines the time normalization property of the HMM classifier with the superior discriminative ability of the neural net (NN) classifier. Speech signals display a strong time varying characteristic. Although the neural net has been successful in many classification problems, its success (compared to HMM) is secondary to HMM in the field of speech recognition. The main reason is the lack of time normalization characteristics of most neural net structures (time-delay neural net is one notable exception but its structure is very complex). In the proposed integrated hybrid HMM/NN classifier, a left-to-right HMM module is used first to segment the observation sequence of every exemplar into a fixed number of states. Subsequently, all the frames belonging to the same state are replaced by one average frame. Thus, every exemplar, irrespective of its time scale variation, is transformed into a fixed number of frames, i.e., a static pattern. The multilayer perceptron (MLP) neural net is then used as the classifier for these time normalized exemplars. Some experimental results using telephone speech databases are presented to demonstrate the potential of this hybrid integrated classifier.  相似文献   

13.
Multifeedback-Layer Neural Network   总被引:1,自引:0,他引:1  
The architecture and training procedure of a novel recurrent neural network (RNN), referred to as the multifeedback-layer neural network (MFLNN), is described in this paper. The main difference of the proposed network compared to the available RNNs is that the temporal relations are provided by means of neurons arranged in three feedback layers, not by simple feedback elements, in order to enrich the representation capabilities of the recurrent networks. The feedback layers provide local and global recurrences via nonlinear processing elements. In these feedback layers, weighted sums of the delayed outputs of the hidden and of the output layers are passed through certain activation functions and applied to the feedforward neurons via adjustable weights. Both online and offline training procedures based on the backpropagation through time (BPTT) algorithm are developed. The adjoint model of the MFLNN is built to compute the derivatives with respect to the MFLNN weights which are then used in the training procedures. The Levenberg-Marquardt (LM) method with a trust region approach is used to update the MFLNN weights. The performance of the MFLNN is demonstrated by applying to several illustrative temporal problems including chaotic time series prediction and nonlinear dynamic system identification, and it performed better than several networks available in the literature  相似文献   

14.
In recent years, the use of Multi-Layer Perceptron (MLP) derived acoustic features has become increasingly popular in automatic speech recognition systems. These features are typically used in combination with standard short-term spectral-based features, and have been found to yield consistent performance improvements. However there are a number of design decisions and issues associated with the use of MLP features for state-of-the-art speech recognition systems. Two modifications to the standard training/adaptation procedures are described in this work. First, the paper examines how MLP features, and the associated acoustic models, can be trained efficiently on large training corpora using discriminative training techniques. An approach that combines multiple individual MLPs is proposed, and this reduces the time needed to train MLPs on large amounts of data. In addition, to further speed up discriminative training, a lattice re-use method is proposed. The paper also examines how systems with MLP features can be adapted to a particular speakers, or acoustic environments. In contrast to previous work (where standard HMM adaptation schemes are used), linear input network adaptation is investigated. System performance is investigated within a multi-pass adaptation/combination framework. This allows the performance gains of individual techniques to be evaluated at various stages, as well as the impact in combination with other sub-systems. All the approaches considered in this paper are evaluated on an Arabic large vocabulary speech recognition task which includes both Broadcast News and Broadcast Conversation test data.  相似文献   

15.
General Regression Neural Networks (GRNN) have been applied to phoneme identification and isolated word recognition in clean speech. In this paper, the authors extended this approach to Arabic spoken word recognition in adverse conditions. In fact, noise robustness is one of the most challenging problems in Automatic Speech Recognition (ASR) and most of the existing recognition methods, which have shown to be highly efficient under noise-free conditions, fail drastically in noisy environments. The proposed system was tested for Arabic digit recognition at different Signal-to-Noise Ratio (SNR) levels and under four noisy conditions: multispeakers babble background, car production hall (factory), military vehicle (leopard tank) and fighter jet cockpit (buccaneer) issued from NOISEX-92 database. The proposed scheme was successfully compared to the similar recognizers based on the Multilayer Perceptrons (MLP), the Elman Recurrent Neural Network (RNN) and the discrete Hidden Markov Model (HMM). The experimental results showed that the use of nonparametric regression with an appropriate smoothing factor (spread) improved the generalization power of the neural network and the global performance of the speech recognizer in noisy environments.  相似文献   

16.
研究适用于隐马尔可夫模型(HMM)结合多层感知器(MLP)的小词汇量混合语音识别系统的一种简化神经网络结构。利用小词汇量混合语音识别系统中的HMM状态所形成的规则的二维阵列,对状态观测概率进行分解。基于这种利用HMM的二维结构特性的方法,实现了用一种由多个简单的MLP所组成的简化神经网络结构来估计状态观测概率。理论分析和语音识别实验的结果都表明,这种简化神经网络结构在性能上优于Franco等人提出的简化神经网络结构。  相似文献   

17.
For a recurrent neural network (RNN), its transient response is a critical issue, especially for real-time signal processing applications. The conventional RNN training algorithms, such as backpropagation through time (BPTT) and real-time recurrent learning (RTRL), have not adequately addressed this problem because they suffer from low convergence speed. While increasing the learning rate may help to improve the performance of the RNN, it can result in unstable training in terms of weight divergence. Therefore, an optimal tradeoff between RNN training speed and weight convergence is desired. In this paper, a robust adaptive gradient-descent (RAGD) training algorithm of RNN is developed based on a novel RNN hybrid training concept. It switches the training patterns between standard real-time online backpropagation (BP) and RTRL according to the derived convergence and stability conditions. The weight convergence and $L_2$-stability of the algorithm are derived via the conic sector theorem. The optimized adaptive learning maximizes the training speed of the RNN for each weight update without violating the stability and convergence criteria. Computer simulations are carried out to demonstrate the applicability of the theoretical results.   相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号