期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Rayleigh Mixture Model-Based Hidden Markov Modeling and Estimation of Noise in Noisy Speech Signals

Karsten Vandborg Sorensen Sren Vang Andersen 《IEEE transactions on audio, speech, and language processing》2007,15(3):901-917

In this paper, we propose a new statistical model for noise periodogram modeling and estimation. The proposed model is a hidden Markov model (HMM) with a Rayleigh mixture model (RMM) in each state. For this new model, we derive an expectation-maximization (EM) training algorithm and a minimum mean-square error (MMSE) noise periodogram estimator. It is shown that when compared to the Gaussian mixture model (GMM)-based HMM, the RMM-based HMM has less computationally complex EM iterations and gives a better fit of the noise periodograms when the mixture models has a low number of components. Furthermore, we propose a specialization of the proposed model, which is shown to provide better MMSE noise periodogram estimates than any other of the tested HMM initializations for cyclo-stationary noise types 相似文献

2.

基于高斯混合隐马尔科夫模型的自由换道识别

杨志强朱家伟穆蕾安毅生《计算机系统应用》2022,31(8):388-394

驾驶辅助系统被认为是解决交通安全问题的有效手段, 开发驾驶辅助系统的基础是对车辆的行为进行准确的识别, 以应用于车辆安全预警, 路径规划, 智能导航等方面. 目前存在的基于支持向量机模型, 隐马尔科夫模型, 卷积神经网络等行为识别方法还存在计算量与精度平衡的问题. 本文结合了隐马尔科夫模型与高斯混合模型, 提出了高斯混合隐马尔科夫模型, 利用美国联邦公路管理局NGSIM数据集对此方法进行了实验验证, 结果表明该方法对自由换道行为识别具有较高的精度. 本文还对高斯混合隐马尔科夫模型的实验参数进行了优化, 以期达到最好的识别效果, 为未来智能驾驶的车辆行为识别提供了参考. 相似文献

3.

A Novel Framework and Training Algorithm for Variable-Parameter Hidden Markov Models

Dong Yu Li Deng Yifan Gong Acero A. 《IEEE transactions on audio, speech, and language processing》2009,17(7):1348-1360

We propose a new framework and the associated maximum-likelihood and discriminative training algorithms for the variable-parameter hidden Markov model (VPHMM) whose mean and variance parameters vary as functions of additional environment-dependent conditioning parameters. Our framework differs from the VPHMM proposed by Cui and Gong (2007) in that piecewise spline interpolation instead of global polynomial regression is used to represent the dependency of the HMM parameters on the conditioning parameters, and a more effective functional form is used to model the variances. Our framework unifies and extends the conventional discrete VPHMM. It no longer requires quantization in estimating the model parameters and can support both parameter sharing and instantaneous conditioning parameters naturally. We investigate the strengths and weaknesses of the model on the Aurora-3 corpus. We show that under the well-matched condition the proposed discriminatively trained VPHMM outperforms the conventional HMM trained in the same way with relative word error rate (WER) reduction of 19% and 15%, respectively, when only mean is updated and when both mean and variances are updated. 相似文献

4.

多数据流子带噪声语音识别方法

蒋文建韦岗《计算机工程与应用》2001,37(19):52-54

提出一种噪声下的多数据流子带语音识别方法。传统的子带特征方法虽然能提高噪声下的语音识别性能,但通常会使无噪声情况下的识别性能下降。新方法提取感知线性预测(PLP)特征和子带特征,分别进行识别,然后在识别概率层将两者相结合。通过E-Set在NoiseX92下的白噪声的识别实验表明,新方法不仅具有更好的抗噪性能,而且同时能提高无噪声情况下的识别性能。相似文献

5.

Discriminative Estimation of Subspace Constrained Gaussian Mixture Models for Speech Recognition

Scott Axelrod Vaibhava Goel Ramesh Gopinath Peder Olsen Karthik Visweswariah 《IEEE transactions on audio, speech, and language processing》2007,15(1):172-189

In this paper, we study discriminative training of acoustic models for speech recognition under two criteria: maximum mutual information (MMI) and a novel "error-weighted" training technique. We present a proof that the standard MMI training technique is valid for a very general class of acoustic models with any kind of parameter tying. We report experimental results for subspace constrained Gaussian mixture models (SCGMMs), where the exponential model weights of all Gaussians are required to belong to a common "tied" subspace, as well as for subspace precision and mean (SPAM) models which impose separate subspace constraints on the precision matrices (i.e., inverse covariance matrices) and means. It has been shown previously that SCGMMs and SPAM models generalize and yield significant error rate improvements over previously considered model classes such as diagonal models, models with semitied covariances, and extended maximum likelihood linear transformation (EMLLT) models. We show here that MMI and error-weighted training each individually result in over 20% relative reduction in word error rate on a digit task over maximum-likelihood (ML) training. We also show that a gain of as much as 28% relative can be achieved by combining these two discriminative estimation techniques 相似文献

6.

A Phoneme-Based Approach for Eliminating Out-of-vocabulary Problem Turkish Speech Recognition Using Hidden Markov Model

Erdem Yavuz Vedat Topuz 《计算机系统科学与工程》2018,33(6):429-445

Since Turkish is a morphologically productive language, it is almost impossible for a word-based recognition system to be realized to completely model Turkish language. Due to the fact that it is difficult for the system to recognize words not introduced to it in a word-based recognition system, recognition success rate drops considerably caused by out-of-vocabulary words. In this study, a speaker-dependent, phoneme-based word recognition system has been designed and implemented for Turkish Language to overcome the problem. An algorithm for finding phoneme-boundaries has been devised in order to segment the word into its phonemes. After the segmentation of words into phonemes, each phoneme is separated into different sub-groups according to its position and neighboring phonemes in that word. Generated sub-groups are represented by Hidden Markov Model, which is a statistical technique, using Mel-frequency cepstral coefficients as feature vector. Since phoneme-based approach is adopted in this study, it has been successfully achieved that many out of vocabulary words could be recognized. 相似文献

7.

脑干听觉诱发电位的子波变换提取技术

温玉汉卢战《数据采集与处理》1997,12(2):91-95

概要地论述了诱发电位的主要特点及检测技术，以脑干听觉诱发电位为例，根据其时，频特性，把子波变换技术应用到ＢＡＥＰ的提取中，提出了ＢＡＥＰ的子波变换技术应用到ＢＡＥＰ的提取中，提出了ＢＡＥＰ的子波变换去噪算法；根据ＢＡＥＰ的信噪比较低的实际情况，提出了结合平均技术的改进算法，通过与传统的平均法比较，该方法可以大大减少检测时间和获得较高的信噪比及满意的波形。相似文献

8.

一种多层子带的噪声语音识别新方法

蒋文建韦岗《数据采集与处理》2002,17(1):15-19

根据不同尺度子带特征反映语音的不同细节特性，提出一种噪声下的多层子带（MLS）语音识别方法。将语音频谱分成多层多个子带，首先各子带分另单独进行识别，然后将各层各子带识别概率综合起来得到最终识别结果。将新方法应用于TIMIT数据饣E-Set在NoiseX92白噪声和F16噪声下识别实验。实验结果表明，多层子带方法在噪声环境和无噪情况下识别性能都有很大提高。相似文献

9.

噪音环境下的语音识别研究 总被引：5，自引：2，他引：3

杨大利徐明星吴文虎《计算机工程与应用》2003,39(20):1-4

文章详细介绍了一些常用的去噪处理方法,也介绍了笔者在抗噪语音识别方面的研究工作,文章最后给出了很有潜力的一些抗噪识别方式。相似文献

10.

一种基于HMM和ANN的语音情感识别分类器 总被引：2，自引：0，他引：2

罗毅《微计算机信息》2007,23(34):218-219,296

针对在语音情感识别中孤立使用隐马尔科夫模型（HMM）固有的分类特性较差的缺点，本文提出了利用隐马尔科夫模型和径向基函数神经网络（RBF）对惊奇，愤怒，喜悦，悲伤，厌恶5种语音情感进行识别的方法。该方法借助HMM规整语音情感特征向量，并用RBF作为最终的决策分类器。实验结果表明在本文的实验条件下此方法和孤立HMM相比具有更好的性能，厌恶的识别率有了较大改进。相似文献

11.

A Mixture of Transformed Hidden Markov Models for Elastic Motion Estimation

Di Huijun Tao Linmi Xu Guangyou 《IEEE transactions on pattern analysis and machine intelligence》2009,31(10):1817-1830

Elastic motion is a nonrigid motion constrained only by some degree of smoothness and continuity. Consequently, elastic motion estimation by explicit feature matching actually contains two correlated subproblems: shape registration and motion tracking, which account for spatial smoothness and temporal continuity, respectively. If we ignore their interrelationship, solving each of them alone will be rather challenging, especially when the cluttered features are involved. To integrate them into a probabilistic model, one straightforward approach is to draw the dependence between their hidden states. With regard to their separated states, there are, however, two different explanations of motion which are still made under the individual constraint of smoothness or continuity. Each one can be error-prone, and their coupling causes error propagation. Therefore, it is highly desirable to design a probabilistic model in which a unified state is shared by the two subproblems. This paper is intended to propose such a model, i.e., a Mixture of Transformed Hidden Markov Models (MTHMM), where a unique explanation of motion is made simultaneously under the spatiotemporal constraints. As a result, the MTHMM could find a coherent global interpretation of elastic motion from local cluttered edge features, and experiments show its robustness under ambiguities, data missing, and outliers. 相似文献

12.

A Robust Speech Recognition System for Communication Robots in Noisy Environments

Ishi C.T. Matsuda S. Kanda T. Jitsuhiro T. Ishiguro H. Nakamura S. Hagita N. 《Robotics, IEEE Transactions on》2008,24(3):759-763

The application range of communication robots could be widely expanded by the use of automatic speech recognition (ASR) systems with improved robustness for noise and for speakers of different ages. In past researches, several modules have been proposed and evaluated for improving the robustness of ASR systems in noisy environments. However, this performance might be degraded when applied to robots, due to problems caused by distant speech and the robot's own noise. In this paper, we implemented the individual modules in a humanoid robot, and evaluated the ASR performance in a real-world noisy environment for adults' and children's speech. The performance of each module was verified by adding different levels of real environment noise recorded in a cafeteria. Experimental results indicated that our ASR system could achieve over 80% word accuracy in 70-dBA noise. Further evaluation of adult speech recorded in a real noisy environment resulted in 73% word accuracy. 相似文献

13.

马尔可夫随机场在语音识别中的应用

傅国康赵荣椿刘志强《数据采集与处理》1999,14(4):433-437

为适应语音识别的需要,作者克服了传统隐马尔可夫模型（ＨＭＭ）只考虑当前观测符号之前状态的缺点,吸收其采用“隐含”层的处理方式,将其纳入马尔可夫随机场（ＭＲＦ）的框架,建立了一个基于ＭＲＦ的语音识别模型,并较详细地阐明了这个系统的训练和识别算法,重新定义了松弛标注算法中相应的支持函数。典型实验表明,ＭＲＦ模型较传统的ＨＭＭ有较高的识别率。在优化初始参数的条件下,两种模型的识别在同样的时间范围内。在训练脱机的情况下,ＭＲＦ模型有其明显的优势。相似文献

14.

一种噪声环境下连续语音识别的快速端点检测算法 总被引：2，自引：0，他引：2

崔冬青李治柱吴亚栋《计算机工程与应用》2003,39(23):95-97,138

根据汉语语音的特点,该算法利用幅度及功率谱对语音端点进行检测,有效地消除了背景噪声及DC分量的干扰。算法采用实际语音采样进行分析,试验结果表明此算法不仅能有效地标识出语音的起始及终止点,并且还具有相当高的运算效率。相似文献

15.

基于混合因子分析的隐马尔可夫模型

王新民姚天任《计算机工程与应用》2005,41(24):50-52

经典隐马尔可夫模型用于语音识别存在的两个主要缺陷是“离散状态假设”和“独立分布假设”。前者忽略了语音信号的非平稳性,后者忽略了语音信号的相关性。文章将混合因子分析方法用于语音建模,提出了基于混合因子分析的隐马尔可夫模型框架,并用动态贝叶斯网络形象地表示。该模型框架不仅从理论上解决了上述问题,而且给出许多语音建模的选择。目前广泛使用的统计声学模型均可视为该模型的特例。相似文献

16.

带噪汉语语音识别的端点检测方法 总被引：4，自引：0，他引：4

王朋塔维娜陈树中《计算机工程》2003,29(17):120-121,135

在语音识别系统中产生错误识别的原因之一是端点检测有误差，在高信噪比情况下，正确地确定语音的端点并不困难，然而，大多数实际的语音识别系统需工作在低信噪比情况下，一些常规的端点检测方法，例如基于能量的端点检测方法在噪声环境下不能有效地工作。该文利用改进的隐马尔柯夫模型(HMM)进行语音检测以适应噪声的变化，实验结果表明本方法可得到高正确率的带噪语音端点检测。相似文献

17.

噪音环境下孤立词的语音识别

刘鹏王怀杰《数字社区&智能家居》2007,(12):1399-1400,1404

噪音环境下的语音识别一直是语音识别的难点，本文采用了谱减法进行去噪，进行孤立词（数字0-9）的识别，提高系统的识别率相似文献

18.

噪音环境下孤立词的语音识别

刘鹏王怀杰《数字社区&智能家居》2007,(23)

噪音环境下的语音识别一直是语音识别的难点,本文采用了谱减法进行去噪,进行孤立词(数字0-9)的识别,提高系统的识别率. 相似文献

19.

抗噪声语音识别及语音增强算法的应用 总被引：1，自引：0，他引：1

汤玲戴斌《计算机仿真》2006,23(9):80-82,143

提高语音识别系统的鲁棒性是语音识别技术一个重要的研究课题。语音识别系统往往由于训练环境下的数据和识别环境下的数据不匹配造成系统的识别性能下降,为了让语音识别系统在含噪的环境下获得令人满意的工作性能,该文根据人耳听觉特性提出了一种鲁棒语音特征提取方法。在MFCC特征提取之前先对含噪语音特征进行掩蔽特性处理,同时结合语音增强方法对特征进行处理,最后得到鲁棒语音特征。通过4种不同试验结果分析表明,将这种方法用于抗噪声分析可以提高系统的抗噪声能力;同时这种特征的处理方法对不同噪声在不同信噪比有很好的适应性。相似文献

20.

Gaussian Mixture Clustering and Language Adaptation for the Development of a New Language Speech Recognition System

Nikos Chatzichrisafis Vassilios Diakoloukas Vassilios Digalakis Costas Harizakis 《IEEE transactions on audio, speech, and language processing》2007,15(3):928-938

The porting of a speech recognition system to a new language is usually a time-consuming and expensive process since it requires collecting, transcribing, and processing a large amount of language-specific training sentences. This work presents techniques for improved cross-language transfer of speech recognition systems to new target languages. Such techniques are particularly useful for target languages where minimal amounts of training data are available. We describe a novel method to produce a language-independent system by combining acoustic models from a number of source languages. This intermediate language-independent acoustic model is used to bootstrap a target-language system by applying language adaptation. For our experiments, we use acoustic models of seven source languages to develop a target Greek acoustic model. We show that our technique significantly outperforms a system trained from scratch when less than 8 h of read speech is available 相似文献