共查询到20条相似文献,搜索用时 15 毫秒
1.
Karsten Vandborg Sorensen Sren Vang Andersen 《IEEE transactions on audio, speech, and language processing》2007,15(3):901-917
In this paper, we propose a new statistical model for noise periodogram modeling and estimation. The proposed model is a hidden Markov model (HMM) with a Rayleigh mixture model (RMM) in each state. For this new model, we derive an expectation-maximization (EM) training algorithm and a minimum mean-square error (MMSE) noise periodogram estimator. It is shown that when compared to the Gaussian mixture model (GMM)-based HMM, the RMM-based HMM has less computationally complex EM iterations and gives a better fit of the noise periodograms when the mixture models has a low number of components. Furthermore, we propose a specialization of the proposed model, which is shown to provide better MMSE noise periodogram estimates than any other of the tested HMM initializations for cyclo-stationary noise types 相似文献
2.
驾驶辅助系统被认为是解决交通安全问题的有效手段, 开发驾驶辅助系统的基础是对车辆的行为进行准确的识别, 以应用于车辆安全预警, 路径规划, 智能导航等方面. 目前存在的基于支持向量机模型, 隐马尔科夫模型, 卷积神经网络等行为识别方法还存在计算量与精度平衡的问题. 本文结合了隐马尔科夫模型与高斯混合模型, 提出了高斯混合隐马尔科夫模型, 利用美国联邦公路管理局NGSIM数据集对此方法进行了实验验证, 结果表明该方法对自由换道行为识别具有较高的精度. 本文还对高斯混合隐马尔科夫模型的实验参数进行了优化, 以期达到最好的识别效果, 为未来智能驾驶的车辆行为识别提供了参考. 相似文献
3.
Dong Yu Li Deng Yifan Gong Acero A. 《IEEE transactions on audio, speech, and language processing》2009,17(7):1348-1360
We propose a new framework and the associated maximum-likelihood and discriminative training algorithms for the variable-parameter hidden Markov model (VPHMM) whose mean and variance parameters vary as functions of additional environment-dependent conditioning parameters. Our framework differs from the VPHMM proposed by Cui and Gong (2007) in that piecewise spline interpolation instead of global polynomial regression is used to represent the dependency of the HMM parameters on the conditioning parameters, and a more effective functional form is used to model the variances. Our framework unifies and extends the conventional discrete VPHMM. It no longer requires quantization in estimating the model parameters and can support both parameter sharing and instantaneous conditioning parameters naturally. We investigate the strengths and weaknesses of the model on the Aurora-3 corpus. We show that under the well-matched condition the proposed discriminatively trained VPHMM outperforms the conventional HMM trained in the same way with relative word error rate (WER) reduction of 19% and 15%, respectively, when only mean is updated and when both mean and variances are updated. 相似文献
4.
提出一种噪声下的多数据流子带语音识别方法。传统的子带特征方法虽然能提高噪声下的语音识别性能,但通常会使无噪声情况下的识别性能下降。新方法提取感知线性预测(PLP)特征和子带特征,分别进行识别,然后在识别概率层将两者相结合。通过E-Set在NoiseX92下的白噪声的识别实验表明,新方法不仅具有更好的抗噪性能,而且同时能提高无噪声情况下的识别性能。 相似文献
5.
Scott Axelrod Vaibhava Goel Ramesh Gopinath Peder Olsen Karthik Visweswariah 《IEEE transactions on audio, speech, and language processing》2007,15(1):172-189
In this paper, we study discriminative training of acoustic models for speech recognition under two criteria: maximum mutual information (MMI) and a novel "error-weighted" training technique. We present a proof that the standard MMI training technique is valid for a very general class of acoustic models with any kind of parameter tying. We report experimental results for subspace constrained Gaussian mixture models (SCGMMs), where the exponential model weights of all Gaussians are required to belong to a common "tied" subspace, as well as for subspace precision and mean (SPAM) models which impose separate subspace constraints on the precision matrices (i.e., inverse covariance matrices) and means. It has been shown previously that SCGMMs and SPAM models generalize and yield significant error rate improvements over previously considered model classes such as diagonal models, models with semitied covariances, and extended maximum likelihood linear transformation (EMLLT) models. We show here that MMI and error-weighted training each individually result in over 20% relative reduction in word error rate on a digit task over maximum-likelihood (ML) training. We also show that a gain of as much as 28% relative can be achieved by combining these two discriminative estimation techniques 相似文献
6.
Since Turkish is a morphologically productive language, it is almost impossible for a word-based recognition system to be realized to completely model
Turkish language. Due to the fact that it is difficult for the system to recognize words not introduced to it in a word-based recognition system, recognition
success rate drops considerably caused by out-of-vocabulary words. In this study, a speaker-dependent, phoneme-based word recognition system has been
designed and implemented for Turkish Language to overcome the problem. An algorithm for finding phoneme-boundaries has been devised in order to
segment the word into its phonemes. After the segmentation of words into phonemes, each phoneme is separated into different sub-groups according to its
position and neighboring phonemes in that word. Generated sub-groups are represented by Hidden Markov Model, which is a statistical technique, using
Mel-frequency cepstral coefficients as feature vector. Since phoneme-based approach is adopted in this study, it has been successfully achieved that many
out of vocabulary words could be recognized. 相似文献
7.
概要地论述了诱发电位的主要特点及检测技术,以脑干听觉诱发电位为例,根据其时,频特性,把子波变换技术应用到BAEP的提取中,提出了BAEP的子波变换技术应用到BAEP的提取中,提出了BAEP的子波变换去噪算法;根据BAEP的信噪比较低的实际情况,提出了结合平均技术的改进算法,通过与传统的平均法比较,该方法可以大大减少检测时间和获得较高的信噪比及满意的波形。 相似文献
8.
根据不同尺度子带特征反映语音的不同细节特性,提出一种噪声下的多层子带(MLS)语音识别方法。将语音频谱分成多层多个子带,首先各子带分另单独进行识别,然后将各层各子带识别概率综合起来得到最终识别结果。将新方法应用于TIMIT数据饣E-Set在NoiseX92白噪声和F16噪声下识别实验。实验结果表明,多层子带方法在噪声环境和无噪情况下识别性能都有很大提高。 相似文献
9.
10.
一种基于HMM和ANN的语音情感识别分类器 总被引:2,自引:0,他引:2
针对在语音情感识别中孤立使用隐马尔科夫模型(HMM)固有的分类特性较差的缺点,本文提出了利用隐马尔科夫模型和径向基函数神经网络(RBF)对惊奇,愤怒,喜悦,悲伤,厌恶5种语音情感进行识别的方法。该方法借助HMM规整语音情感特征向量,并用RBF作为最终的决策分类器。实验结果表明在本文的实验条件下此方法和孤立HMM相比具有更好的性能,厌恶的识别率有了较大改进。 相似文献
11.
Di Huijun Tao Linmi Xu Guangyou 《IEEE transactions on pattern analysis and machine intelligence》2009,31(10):1817-1830
Elastic motion is a nonrigid motion constrained only by some degree of smoothness and continuity. Consequently, elastic motion estimation by explicit feature matching actually contains two correlated subproblems: shape registration and motion tracking, which account for spatial smoothness and temporal continuity, respectively. If we ignore their interrelationship, solving each of them alone will be rather challenging, especially when the cluttered features are involved. To integrate them into a probabilistic model, one straightforward approach is to draw the dependence between their hidden states. With regard to their separated states, there are, however, two different explanations of motion which are still made under the individual constraint of smoothness or continuity. Each one can be error-prone, and their coupling causes error propagation. Therefore, it is highly desirable to design a probabilistic model in which a unified state is shared by the two subproblems. This paper is intended to propose such a model, i.e., a Mixture of Transformed Hidden Markov Models (MTHMM), where a unique explanation of motion is made simultaneously under the spatiotemporal constraints. As a result, the MTHMM could find a coherent global interpretation of elastic motion from local cluttered edge features, and experiments show its robustness under ambiguities, data missing, and outliers. 相似文献
12.
Ishi C.T. Matsuda S. Kanda T. Jitsuhiro T. Ishiguro H. Nakamura S. Hagita N. 《Robotics, IEEE Transactions on》2008,24(3):759-763
The application range of communication robots could be widely expanded by the use of automatic speech recognition (ASR) systems with improved robustness for noise and for speakers of different ages. In past researches, several modules have been proposed and evaluated for improving the robustness of ASR systems in noisy environments. However, this performance might be degraded when applied to robots, due to problems caused by distant speech and the robot's own noise. In this paper, we implemented the individual modules in a humanoid robot, and evaluated the ASR performance in a real-world noisy environment for adults' and children's speech. The performance of each module was verified by adding different levels of real environment noise recorded in a cafeteria. Experimental results indicated that our ASR system could achieve over 80% word accuracy in 70-dBA noise. Further evaluation of adult speech recorded in a real noisy environment resulted in 73% word accuracy. 相似文献
13.
14.
一种噪声环境下连续语音识别的快速端点检测算法 总被引:2,自引:0,他引:2
根据汉语语音的特点,该算法利用幅度及功率谱对语音端点进行检测,有效地消除了背景噪声及DC分量的干扰。算法采用实际语音采样进行分析,试验结果表明此算法不仅能有效地标识出语音的起始及终止点,并且还具有相当高的运算效率。 相似文献
15.
经典隐马尔可夫模型用于语音识别存在的两个主要缺陷是“离散状态假设”和“独立分布假设”。前者忽略了语音信号的非平稳性,后者忽略了语音信号的相关性。文章将混合因子分析方法用于语音建模,提出了基于混合因子分析的隐马尔可夫模型框架,并用动态贝叶斯网络形象地表示。该模型框架不仅从理论上解决了上述问题,而且给出许多语音建模的选择。目前广泛使用的统计声学模型均可视为该模型的特例。 相似文献
16.
17.
噪音环境下的语音识别一直是语音识别的难点,本文采用了谱减法进行去噪,进行孤立词(数字0-9)的识别,提高系统的识别率 相似文献
18.
噪音环境下的语音识别一直是语音识别的难点,本文采用了谱减法进行去噪,进行孤立词(数字0-9)的识别,提高系统的识别率. 相似文献
19.
抗噪声语音识别及语音增强算法的应用 总被引:1,自引:0,他引:1
提高语音识别系统的鲁棒性是语音识别技术一个重要的研究课题。语音识别系统往往由于训练环境下的数据和识别环境下的数据不匹配造成系统的识别性能下降,为了让语音识别系统在含噪的环境下获得令人满意的工作性能,该文根据人耳听觉特性提出了一种鲁棒语音特征提取方法。在MFCC特征提取之前先对含噪语音特征进行掩蔽特性处理,同时结合语音增强方法对特征进行处理,最后得到鲁棒语音特征。通过4种不同试验结果分析表明,将这种方法用于抗噪声分析可以提高系统的抗噪声能力;同时这种特征的处理方法对不同噪声在不同信噪比有很好的适应性。 相似文献
20.
Nikos Chatzichrisafis Vassilios Diakoloukas Vassilios Digalakis Costas Harizakis 《IEEE transactions on audio, speech, and language processing》2007,15(3):928-938
The porting of a speech recognition system to a new language is usually a time-consuming and expensive process since it requires collecting, transcribing, and processing a large amount of language-specific training sentences. This work presents techniques for improved cross-language transfer of speech recognition systems to new target languages. Such techniques are particularly useful for target languages where minimal amounts of training data are available. We describe a novel method to produce a language-independent system by combining acoustic models from a number of source languages. This intermediate language-independent acoustic model is used to bootstrap a target-language system by applying language adaptation. For our experiments, we use acoustic models of seven source languages to develop a target Greek acoustic model. We show that our technique significantly outperforms a system trained from scratch when less than 8 h of read speech is available 相似文献