首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 38 毫秒
1.
This paper attempts to overcome the tendency of the expectation-maximization (EM) algorithm to locate a local rather than global maximum when applied to estimate the hidden Markov model (HMM) parameters in speech signal modeling. We propose a hybrid algorithm for estimation of the HMM in automatic speech recognition (ASR) using a constraint-based evolutionary algorithm (EA) and EM, the CEL-EM. The novelty of our hybrid algorithm (CEL-EM) is that it is applicable for estimation of the constraint-based models with many constraints and large numbers of parameters (which use EM) like HMM. Two constraint-based versions of the CEL-EM with different fusion strategies have been proposed using a constraint-based EA and the EM for better estimation of HMM in ASR. The first one uses a traditional constraint-handling mechanism of EA. The other version transforms a constrained optimization problem into an unconstrained problem using Lagrange multipliers. Fusion strategies for the CEL-EM use a staged-fusion approach where EM has been plugged with the EA periodically after the execution of EA for a specific period of time to maintain the global sampling capabilities of EA in the hybrid algorithm. A variable initialization approach (VIA) has been proposed using a variable segmentation to provide a better initialization for EA in the CEL-EM. Experimental results on the TIMIT speech corpus show that CEL-EM obtains higher recognition accuracies than the traditional EM algorithm as well as a top-standard EM (VIA-EM, constructed by applying the VIA to EM).  相似文献   

2.
This paper investigates a noise robust technique for automatic speech recognition which exploits hidden Markov modeling of stereo speech features from clean and noisy channels. The HMM trained this way, referred to as stereo HMM, has in each state a Gaussian mixture model (GMM) with a joint distribution of both clean and noisy speech features. Given the noisy speech input, the stereo HMM gives rise to a two-pass compensation and decoding process where MMSE denoising based on N-best hypotheses is first performed and followed by decoding the denoised speech in a reduced search space on lattice. Compared to the feature space GMM-based denoising approaches, the stereo HMM is advantageous as it has finer-grained noise compensation and makes use of information of the whole noisy feature sequence for the prediction of each individual clean feature. Experiments on large vocabulary spontaneous speech from speech-to-speech translation applications show that the proposed technique yields superior performance than its feature space counterpart in noisy conditions while still maintaining decent performance in clean conditions.  相似文献   

3.
In this paper, we propose a multi-environment model adaptation method based on vector Taylor series (VTS) for robust speech recognition. In the training phase, the clean speech is contaminated with noise at different signal-to-noise ratio (SNR) levels to produce several types of noisy training speech and each type is used to obtain a noisy hidden Markov model (HMM) set. In the recognition phase, the HMM set which best matches the testing environment is selected, and further adjusted to reduce the environmental mismatch by the VTS-based model adaptation method. In the proposed method, the VTS approximation based on noisy training speech is given and the testing noise parameters are estimated from the noisy testing speech using the expectation-maximization (EM) algorithm. The experimental results indicate that the proposed multi-environment model adaptation method can significantly improve the performance of speech recognizers and outperforms the traditional model adaptation method and the linear regression-based multi-environment method.  相似文献   

4.
Accurate modeling and estimation of speech and noise gains facilitate good performance of speech enhancement methods using data-driven prior models. In this paper, we propose a hidden Markov model (HMM)-based speech enhancement method using explicit gain modeling. Through the introduction of stochastic gain variables, energy variation in both speech and noise is explicitly modeled in a unified framework. The speech gain models the energy variations of the speech phones, typically due to differences in pronunciation and/or different vocalizations of individual speakers. The noise gain helps to improve the tracking of the time-varying energy of nonstationary noise. The expectation-maximization (EM) algorithm is used to perform offline estimation of the time-invariant model parameters. The time-varying model parameters are estimated online using the recursive EM algorithm. The proposed gain modeling techniques are applied to a novel Bayesian speech estimator, and the performance of the proposed enhancement method is evaluated through objective and subjective tests. The experimental results confirm the advantage of explicit gain modeling, particularly for nonstationary noise sources  相似文献   

5.
This paper investigates the modelling of the interframe dependence in a hidden Markov model (HMM) for speech recognition. First, a new observation model, assuming dependence on multiple previous frames, is proposed. This model represents such a dependence structure with a weighted mixture of a set of first-order conditional Gaussian densities, each mixture component accounting for a specific conditional frame. Next, an optimization in choosing the conditional frames/segment is performed in both training and recognition, thereby helping to remove the mismatch of the conditional segments due to different observation histories. An EM (Expectation–Maximization) iteration algorithm is developed for the estimation of the model parameters and for the optimization over the dependence structure. Experimental comparisons on a speaker-independent E-set database show that the new model, without optimization on the dependence structure, achieves better performance than the standard HMM, the bigram HMM and the linear-predictive HMM, all in comparable or smaller parameter sizes. The optimization over the dependence structure leads to further improvement in the performance.  相似文献   

6.
In this paper, we present a novel competitive EM (CEM) algorithm for finite mixture models to overcome the two main drawbacks of the EM algorithm: often getting trapped at local maxima and sometimes converging to the boundary of the parameter space. The proposed algorithm is capable of automatically choosing the clustering number and selecting the “split” or “merge” operations efficiently based on the new competitive mechanism we propose. It is insensitive to the initial configuration of the mixture component number and model parameters.Experiments on synthetic data show that our algorithm has very promising performance for the parameter estimation of mixture models. The algorithm is also applied to the structure analysis of complicated Chinese characters. The results show that the proposed algorithm performs much better than previous methods with slightly heavier computation burden.  相似文献   

7.
We present a factorial representation of Gaussian mixture models for observation densities in hidden Markov models (HMMs), which uses the factorial learning in the HMM framework. We derive the reestimation formulas for estimating the factorized parameters by the Expectation Maximization (EM) algorithm and propose a novel method for initializing them. To compare the performances of the proposed models with that of the factorial hidden Markov models and HMMs, we have carried out extensive experiments which show that this modelling approach is effective and robust.  相似文献   

8.
Baibo  Changshui  Xing 《Pattern recognition》2005,38(12):2351-2362
Gaussian Mixture Models (GMM) have been broadly applied for the fitting of probability density function. However, due to the intrinsic linearity of GMM, usually many components are needed to appropriately fit the data distribution, when there are curve manifolds in the data cloud.

In order to solve this problem and represent data with curve manifolds better, in this paper we propose a new nonlinear probability model, called active curve axis Gaussian model. Intuitively, this model can be imagined as Gaussian model being bent at the first principal axis. For estimating parameters of mixtures of this model, the EM algorithm is employed.

Experiments on synthetic data and Chinese characters show that the proposed nonlinear mixture models can approximate distributions of data clouds with curve manifolds in a more concise and compact way than GMM does. The performance of the proposed nonlinear mixture models is promising.  相似文献   


9.
为了统一地补偿电话语音受加性噪声和卷积通道响应的影响,本文提出了矢量分段多项式近似(VPP)算法.并把此算法成功地应用到稳态噪声和非稳态噪声环境.对于稳态噪声环境,在log谱域采用Batch EM(B EM)方法;对于非稳态噪声环境,在倒谱域采用递归EM(REM)方法.这两种方法都是基于最小均方误差估计(MMSE)准则的特征补偿.实验结果表明,受背景噪声和电话通道(包括固定电话和GSM)影响的大词汇量连续语音识别应用此算法误识率可以降低约18%.  相似文献   

10.
Hidden Markov model (HMM) has made great achievements in many fields such as speech recognition and engineering. However, due to its assumption of state conditional independence between observations, HMM has a very limited capacity for recognizing complex patterns involving more than first-order dependencies in customer relationships management. Group Method of Data Handling (GMDH) could overcome the drawbacks of HMM, so we propose a hybrid model by combining the HMM and GMDH to score customer credit. There are three phases in this model: training HMM with multiple observations, adding GMDH into HMM and optimizing the hybrid model. The proposed hybrid model is compared with other exiting methods in terms of average accuracy, Type I error, Type II error and AUC. Experimental results show that the proposed method has better performance than HMM/ANN in two credit scoring datasets. The implementation of HMM/GMDH hybrid model allows lenders and regulators to develop techniques to measure customer credit risk.  相似文献   

11.
In this paper, we propose a novel optimization algorithm called constrained line search (CLS) for discriminative training (DT) of Gaussian mixture continuous density hidden Markov model (CDHMM) in speech recognition. The CLS method is formulated under a general framework for optimizing any discriminative objective functions including maximum mutual information (MMI), minimum classification error (MCE), minimum phone error (MPE)/minimum word error (MWE), etc. In this method, discriminative training of HMM is first cast as a constrained optimization problem, where Kullback-Leibler divergence (KLD) between models is explicitly imposed as a constraint during optimization. Based upon the idea of line search, we show that a simple formula of HMM parameters can be found by constraining the KLD between HMM of two successive iterations in an quadratic form. The proposed CLS method can be applied to optimize all model parameters in Gaussian mixture CDHMMs, including means, covariances, and mixture weights. We have investigated the proposed CLS approach on several benchmark speech recognition databases, including TIDIGITS, Resource Management (RM), and Switchboard. Experimental results show that the new CLS optimization method consistently outperforms the conventional EBW method in both recognition performance and convergence behavior.  相似文献   

12.
This paper proposes a novel hidden Markov model (HMM) based on simulated annealing (SA) algorithm and expectation maximization (EM) algorithm for machinery diagnosis. As traditional HMM is sensitive to initial values and EM is easy to trap into a local optimization, SA is combined to improve HMM which can overcome local optimization searching problem. The proposed HMM has strong ability of global convergence, and optimizes the process of parameters estimation. Finally, through a case study, the computation results illustrate this SAEM-HMM has high efficiency and accuracy, which could help machinery diagnosis in practical.  相似文献   

13.
Blind source separation (BSS) has attained much attention in signal processing society due to its ‘blind’ property and wide applications. However, there are still some open problems, such as underdetermined BSS, noise BSS. In this paper, we propose a Bayesian approach to improve the separation performance of instantaneous mixtures with non-stationary sources by taking into account the internal organization of the non-stationary sources. Gaussian mixture model (GMM) is used to model the distribution of source signals and the continuous density hidden Markov model (CDHMM) is derived to track the non-stationarity inside the source signals. Source signals can switch between several states such that the separation performance can be significantly improved. An expectation-maximization (EM) algorithm is derived to estimate the mixing coefficients, the CDHMM parameters and the noise covariance. The source signals are recovered via maximum a posteriori (MAP) approach. To ensure the convergence of the proposed algorithm, the proper prior densities, conjugate prior densities, are assigned to estimation coefficients for incorporating the prior information. The initialization scheme for the estimates is also discussed. Systematic simulations are used to illustrate the performance of the proposed algorithm. Simulation results show that the proposed algorithm has more robust separation performance in terms of similarity score in noise environments in comparison with the classical BSS algorithms in determined mixture case. Additionally, since the mixing matrix and the sources are estimated jointly, the proposed EM algorithm also works well in underdetermined case. Furthermore, the proposed algorithm converges quickly with proper initialization.  相似文献   

14.
The mismatch between system training and operating conditions can seriously deteriorate the performance of automatic speech recognition (ASR) systems. Various techniques have been proposed to solve this problem in a specified speech environment. Employment of these techniques often involves modification on the ASR system structure. In this paper, we propose an environment-independent (EI) ASR model parameter adaptation approach based on Bayesian parametric representation (BPR), which is able to adapt ASR models to new environments without changing the structure of an ASR system. The parameter set of BPR is optimized by a maximum joint likelihood criterion which is consistent with that of the hidden Markov model (HMM)-based ASR model through an independent expectation-maximization (EM) procedure. Variations of the proposed approach are investigated in the experiments designed in two different speech environments: one is the noisy environment provided by the AURORA 2 database, and the other is the network environment provided by the NTIMIT database. Performances of the proposed EI ASR model compensation approach are compared to those of the cepstral mean normalization (CMN) approach, which is one of the standard techniques for additive noise compensation. The experimental results show that performances of ASR models in different speech environments are significantly improved after being adapted by the proposed BPR model compensation approach  相似文献   

15.
罗磊  黄博妍  孙金玮  温良 《自动化学报》2016,42(9):1432-1439
为了提高宽窄带混合噪声的消噪效果,本文提出一种基于总体平均经验模态分解(Ensemble empirical mode decomposition,EEMD)的主动噪声控制(Active noise control,ANC)系统,利用实时EEMD算法逐段将混合噪声分解成若干个固有模态函数(Intrinsic mode functions,IMF)分量.因为这些IMF分量的频带各不相同,所以实现了混合噪声中宽带分量和窄带分量的有效分离,独立进行ANC处理后成功解决了处理混合噪声时带来的“火花”现象,而且避免了传统混合ANC(Hybrid ANC,HANC)系统中频率失调的影响. EEMD算法也是对混合噪声的平稳化处理过程,因此当混合噪声中出现非平稳变化时,本文提出的系统也能保持较好的系统稳定性.通过不同噪声环境下进行仿真分析,提出的ANC系统比HANC系统具有更好的系统稳定性和更小的稳态误差.  相似文献   

16.
In this paper, the family of conditional minimum mean square error (MMSE) spectral estimators is studied which take on the form$(E(X_p^alpha/vert X_p+D_pvert))^1/alpha$, where$X_p$is the clean speech spectrum, and$D_p$is the noise spectrum, resulting in a Generalized MMSE estimator (GMMSE). The degree of noise suppression versus musical tone artifacts of these estimators is studied. The tradeoffs in selection of$(alpha)$, across noise spectral structure and signal-to-noise ratio (SNR) level, are also considered. Members of this family of estimators include the Ephraim–Malah (EM) amplitude estimator and, for high SNRs, the Wiener Filter. It is shown that the colorless residual noise observed in the EM estimator is a characteristic of this general family of estimators. An application of these estimators in an auditory enhancement scheme using the masking threshold of the human auditory system is formulated, resulting in the GMMSE-auditory masking threshold (AMT) enhancement method. Finally, a detailed evaluation of the proposed algorithms is performed over the phonetically balanced TIMIT database and the National Gallery of the Spoken Word (NGSW) audio archive using subjective and objective speech quality measures. Results show that the proposed GMMSE-AMT outperforms MMSE and log-MMSE enhancement methods using a detailed phoneme-based objective quality analysis.  相似文献   

17.
One serious difficulty in the deployment of wideband speech recognition systems for new tasks is the expense in both time and cost of obtaining sufficient training data. A more economical approach is to collect telephone speech and then restrict the application to operate at the telephone bandwidth. However, this generally results in suboptimal performance compared to a wideband recognition system. In this paper, we propose a novel expectation-maximization (EM) algorithm in which wideband acoustic models are trained using a small amount of wideband speech and a larger amount of narrowband speech. We show how this algorithm can be incorporated into the existing training schemes of hidden Markov model (HMM) speech recognizers. Experiments performed using wideband speech and telephone speech demonstrate that the proposed mixed-bandwidth training algorithm results in significant improvements in recognition accuracy over conventional training strategies when the amount of wideband data is limited  相似文献   

18.
Hidden Markov models (HMMs) with Gaussian mixture distributions rely on an assumption that speech features are temporally uncorrelated, and often assume a diagonal covariance matrix where correlations between feature vectors for adjacent frames are ignored. A Linear Dynamic Model (LDM) is a Markovian state-space model that also relies on hidden state modeling, but explicitly models the evolution of these hidden states using an autoregressive process. An LDM is capable of modeling higher order statistics and can exploit correlations of features in an efficient and parsimonious manner. In this paper, we present a hybrid LDM/HMM decoder architecture that postprocesses segmentations derived from the first pass of an HMM-based recognition. This smoothed trajectory model is complementary to existing HMM systems. An Expectation-Maximization (EM) approach for parameter estimation is presented. We demonstrate a 13 % relative WER reduction on the Aurora-4 clean evaluation set, and a 13 % relative WER reduction on the babble noise condition.  相似文献   

19.
An EM algorithm for the block mixture model   总被引:1,自引:0,他引:1  
Although many clustering procedures aim to construct an optimal partition of objects or, sometimes, of variables, there are other methods, called block clustering methods, which consider simultaneously the two sets and organize the data into homogeneous blocks. Recently, we have proposed a new mixture model called block mixture model which takes into account this situation. This model allows one to embed simultaneous clustering of objects and variables in a mixture approach. We have studied this probabilistic model under the classification likelihood approach and developed a new algorithm for simultaneous partitioning based on the classification EM algorithm. In this paper, we consider the block clustering problem under the maximum likelihood approach and the goal of our contribution is to estimate the parameters of this model. Unfortunately, the application of the EM algorithm for the block mixture model cannot be made directly; difficulties arise due to the dependence structure in the model and approximations are required. Using a variational approximation, we propose a generalized EM algorithm to estimate the parameters of the block mixture model and, to illustrate our approach, we study the case of binary data by using a Bernoulli block mixture.  相似文献   

20.
In this paper we present a new event analysis framework based on mixture hidden Markov model (HMM) for ice hockey videos. Hockey is a competitive sport and hockey videos are hard to analyze because of the homogeneity of its frame features. However, the temporal dynamics of hockey videos is highly structured. Using the mixture representation of local observations and Markov chain property of hockey event structure, we successfully model the hockey event as a mixture HMM. Based on the mixture HMM, the hockey event could be classified with high accuracy. Two types of mixture HMMs, Gaussian mixture and independent component analysis (ICA) mixture, are compared for the hockey video event classification. The results confirm our analysis that the mixture HMM is a suitable model to deal with videos with intensive activities. The new mixture HMM hockey event model could be a very useful tool for hockey game analysis.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号