首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 24 毫秒
1.
This paper reports an upper bound for the Kullback–Leibler divergence (KLD) for a general family of transient hidden Markov models (HMMs). An upper bound KLD (UBKLD) expression for Gaussian mixtures models (GMMs) is presented which is generalized for the case of HMMs. Moreover, this formulation is extended to the case of HMMs with nonemitting states, where under some general assumptions, the UBKLD is proved to be well defined for a general family of transient models. In particular, the UBKLD has a computationally efficient closed-form for HMMs with left-to-right topology and a final nonemitting state, that we refer to as left-to-right transient HMMs. Finally, the usefulness of the closed-form expression is experimentally evaluated for automatic speech recognition (ASR) applications, where left-to-right transient HMMs are used to model basic acoustic-phonetic units. Results show that the UBKLD is an accurate discrimination indicator for comparing acoustic HMMs used for ASR.   相似文献   

2.
A method of integrating the Gibbs distributions (GDs) into hidden Markov models (HMMs) is presented. The probabilities of the hidden state sequences of HMMs are modeled by GDs in place of the transition probabilities. The GDs offer a general way in modeling neighbor interactions of Markov random fields where the Markov chains in HMMs are special cases. An algorithm for estimating the model parameters is developed based on Baum reestimation, and an algorithm for computing the probability terms is developed using a lattice structure. The GD models were used for experiments in speech recognition on the TI speaker-independent, isolated digit database. The observation sequences of the speech signals were modeled by mixture Gaussian autoregressive densities. The energy functions of the GDs were developed using very few parameters and proved adequate in hidden layer modeling. The results of the experiments showed that the GD models performed at least as well as the HMM models  相似文献   

3.
In this paper, we investigate a Hidden Markov Model (HMM)-based method to drive a lip movement sequence with input speech. In a previous study, we have already investigated a mapping method based on the Viterbi decoding algorithm which converts an input speech signal to a lip movement sequence through the most likely HMM state sequence using audio HMMs. However, the method can result in errors due to incorrectly decoded HMM states. This paper proposes a method to re-estimate visual parameters using HMMs of audio-visual joint probability using the Expectation-Maximization (EM) algorithm. In the experiments, the proposed mapping method results in a 26% error reduction when compared to the Viterbi-based algorithm at incorrectly decoded bilabial consonants.  相似文献   

4.
A fused hidden Markov model with application to bimodal speech processing   总被引:2,自引:0,他引:2  
This paper presents a novel fused hidden Markov model (fused HMM) for integrating tightly coupled time series, such as audio and visual features of speech. In this model, the time series are first modeled by two conventional HMMs separately. The resulting HMMs are then fused together using a probabilistic fusion model, which is optimal according to the maximum entropy principle and a maximum mutual information criterion. Simulations and bimodal speaker verification experiments show that the proposed model can significantly reduce the recognition errors in noiseless or noisy environments.  相似文献   

5.
用于连续语音识别的RBF-Gamma-HMM组合模型   总被引:2,自引:0,他引:2  
本文提供了一个有特色的、易扩展的多模块RBF-Gamma神经网与HMM组合的连续语音识别模型,兼有RBF网表达音元空间、Gamma综合时序相关信息、HMM作音元时间集成和扩展等功能,以实现功能互充本模型为基础,将本文提出的各咎改进分类的学习算法用于特定人连续数字语音识别,其字正识率达到98.9%,串正识率达到94.8%。  相似文献   

6.
基于PCANN/HMM混合结构的语音识别方法   总被引:1,自引:0,他引:1  
赵力  邹采荣  吴镇扬 《信号处理》2001,17(5):473-476
本文提出了一种基于PCANN/HMM混合结构的语音识别方法,它采用相继几帧组成的特征参数矢量作为语音识别HMM的输入,能有效地在语音识别HMM中引入帧间相关信息,同时为了改善多帧特征输入HMM的输出概率密度函数性能,在HMM的前端增加语音参数压缩的主分量分析神经网络(PCANN).通过对多讲者汉语连续语音识别实验,证实了本文提出方法的有效性.  相似文献   

7.
The authors demonstrate the effectiveness of phonemic hidden Markov models with Gaussian mixture output densities (mixture HMMs) for speaker-dependent large-vocabulary word recognition. Speech recognition experiments show that for almost any reasonable amount of training data, recognizers using mixture HMMs consistently outperform those employing unimodal Gaussian HMMs. With a sufficiently large training set (e.g. more than 2500 words), use of HMMs with 25-component mixture distributions typically reduces recognition errors by about 40%. It is also found that the mixture HMMs outperform a set of unimodal generalized triphone models having the same number of parameters. Previous attempts to employ mixture HMMs for speech recognition proved discouraging because of the high complexity and computational cost in implementing the Baum-Welch training algorithm. It is shown how mixture HMMs can be implemented very simply in unimodal transition-based frameworks by allowing multiple transitions from one state to another  相似文献   

8.
Hidden Markov models (HMMs) represent a very important tool for analysis of signals and systems. In the past two decades, HMMs have attracted the attention of various research communities, including the ones in statistics, engineering, and mathematics. Their extensive use in signal processing and, in particular, speech processing is well documented. A major weakness of conventional HMMs is their inflexibility in modeling state durations. This weakness can be avoided by adopting a more complicated class of HMMs known as nonstationary HMMs. We analyze nonstationary HMMs whose state transition probabilities are functions of time that indirectly model state durations by a given probability mass function and whose observation spaces are discrete. The objective of our work is to estimate all the unknowns of a nonstationary HMM, which include its parameters and the state sequence. To that end, we construct a Markov chain Monte Carlo (MCMC) sampling scheme, where sampling from all the posterior probability distributions is very easy. The proposed MCMC sampling scheme has been tested in extensive computer simulations on finite discrete-valued observed data, and some of the simulation results are presented  相似文献   

9.
并行子带HMM最大后验概率自适应非线性类估计算法   总被引:1,自引:0,他引:1  
目前,自动语音识别(ASR)系统在实验室环境下获得了较高的识别率,但是在实际环境中,由于受到背景噪声和传输信道的影响,系统的识别性能急剧恶化.本文以听觉试验为基础,提出一种新的独立子带并行最大后验概率的非线性类估计算法,用以提高识别系统的鲁棒性.本算法利用多种噪声和识别内容功率谱差异,以及噪声在不同频带上对HMM影响的不同,采用多层感知机(MLP)对噪声环境下最大后验概率进行非线性映射,以减少识别系统由于环境不匹配而导致的识别性能下降.实验表明:该算法性能明显优于最大后验线性回归算法和Sangita提出的子带语音识别算法.  相似文献   

10.
提出了一种在自相关域上,以相关函数值为参数,利用单边自相关序列的线性预测误差去除语音中加性噪声的方法。该方法首先对含噪语音进行单边自相关处理,以语音信号的单边自相关序列替代语音信号序列,进而对该序列进行线性预测分析后,获得线性预测分析系数,并求得线性预测误差。根据误差能量与信号能量的比例关系,确定减因子u,从含噪语音中根据减因子u的大小减去预测误差,即可抑制噪声误差能量。实验表明;上述方法在低信噪比时,仍能较好地保留语音信号的频谱结构,使音质不至于下降。  相似文献   

11.
For the acoustic models of embedded speech recognition systems, hidden Markov models (HMMs) are usually quantized and the original full space distributions are represented by combinations of a few quantized distribution prototypes. We propose a maximum likelihood objective function to train the quantized distribution prototypes. The experimental results show that the new training algorithm and the link structure adaptation scheme for the quantized HMMs reduce the word recognition error rate by 20.0%.  相似文献   

12.
In this paper, we describe an automatic unsupervised texture segmentation scheme using hidden Markov models (HMMs). First, the feature map of the image is formed using Laws' micromasks and directional macromasks. Each pixel in the feature map is represented by a sequence of 4-D feature vectors. The feature sequences belonging to the same texture are modeled as an HMM. Thus, if there are M different textures present in an image, there are M distinct HMMs to be found and trained. Consequently, the unsupervised texture segmentation problem becomes an HMM-based problem, where the appropriate number of HMMs, the associated model parameters, and the discrimination among the HMMs become the foci of our scheme. A two-stage segmentation procedure is used. First, coarse segmentation is used to obtain the approximate number of HMMs and their associated model parameters. Then, fine segmentation is used to accurately estimate the number of HMMs and the model parameters. In these two stages, the critical task of merging the similar HMMs is accomplished by comparing the discrimination information (DI) between the two HMMs against a threshold computed from the distribution of all DI's. A postprocessing stage of multiscale majority filtering is used to further enhance the segmented result. The proposed scheme is highly suitable for pipeline/parallel implementation. Detailed experimental results are reported. These results indicate that the present scheme compares favorably with respect to other successful schemes reported in the literature.  相似文献   

13.
基于3维空间Viterbi算法的汉语连续语音识别方法   总被引:1,自引:0,他引:1       下载免费PDF全文
赵力  邹采荣  吴镇扬 《电子学报》2000,28(7):67-69,58
本文提出了基于3维空间Viterbi算法的汉语连续语音识别方法。本方法采用60个音素单位的隐马尔可大模型(HMM)和8个声调单位的HMM作为识别用基元模型。音素基元模型和声调基元模型的识别结果的统合,采用音素单位的HMM状态,声调单位的HMM状态和时间的3维空间Viterbi算法来实现。  相似文献   

14.
The context-tree weighting method: basic properties   总被引:4,自引:0,他引:4  
Describes a sequential universal data compression procedure for binary tree sources that performs the “double mixture.” Using a context tree, this method weights in an efficient recursive way the coding distributions corresponding to all bounded memory tree sources, and achieves a desirable coding distribution for tree sources with an unknown model and unknown parameters. Computational and storage complexity of the proposed procedure are both linear in the source sequence length. The authors derive a natural upper bound on the cumulative redundancy of the method for individual sequences. The three terms in this bound can be identified as coding, parameter, and model redundancy, The bound holds for all source sequence lengths, not only for asymptotically large lengths. The analysis that leads to this bound is based on standard techniques and turns out to be extremely simple. The upper bound on the redundancy shows that the proposed context-tree weighting procedure is optimal in the sense that it achieves the Rissanen (1984) lower bound  相似文献   

15.
A performance bound for prediction of MIMO channels   总被引:2,自引:0,他引:2  
Knowledge of future channel conditions can increase the performance of many types of wireless systems. This is especially true for radio channels with multiple transmit and receive antennas, i.e., multiple-input multiple-output (MIMO) systems. This paper derives a performance bound for MIMO channel prediction. It is assumed that prediction is based upon estimating a model for the channel and then extrapolating that model to predict future values of the channel. A vector formulation of the Crame/spl acute/r-Rao bound for functions of parameters is used to find a lower bound on the prediction error. Numerical evaluation of this bound shows that substantially longer prediction lengths are possible for MIMO channels than for single antenna channels. An intuitive interpretation of this result is that more of the channel structure is revealed when using multiple antennas at both ends. Finally, the longer prediction lengths for MIMO channels are confirmed by numerical results obtained by implementing a MIMO extension of a single-antenna prediction scheme.  相似文献   

16.
黄国捷  金慧  俞一彪 《信号处理》2018,34(10):1246-1251
提出一种增强变分自编码器进行非平行语料语音转换的新方法。源语音首先经过编码网络生成一个服从高斯分布的语音编码,解码网络将该语音编码重构为指定的目标语音,最后通过增强网络优化生成的目标语音。增强网络的一个输入对应一个输出的,这使得整体转换系统有较好的去噪能力。此外,本文还引入了循环训练方法以改善转换语音的目标倾向性。实验结果显示,与基准语音转换系统相比,本文提出的增强变分自编码器语音转换系统在跨性别语音转换上的客观评价指标谱失真上下降10.3%,在主观评价指标相似度与清晰度方面同样有所改善。这一结果表明,本文提出的方法能够使转换语音具有良好目标倾向性,同时有较好的语音转换质量。   相似文献   

17.
Hidden Markov modeling of flat fading channels   总被引:2,自引:0,他引:2  
Hidden Markov models (HMMs) are a powerful tool for modeling stochastic random processes. They are general enough to model with high accuracy a large variety of processes and are relatively simple allowing us to compute analytically many important parameters of the process which are very difficult to calculate for other models (such as complex Gaussian processes). Another advantage of using HMMs is the existence of powerful algorithms for fitting them to experimental data and approximating other processes. In this paper, we demonstrate that communication channel fading can be accurately modeled by HMMs, and we find closed-form solutions for the probability distribution of fade duration and the number of level crossings  相似文献   

18.
高珍珍  鲍长春 《信号处理》2016,32(8):937-944
针对基于梅尔频谱域隐马尔可夫模型(Mel Frequency Spectral domain Hidden Markov Model, MFS-HMM)的语音增强算法中存在训练集和测试集能量不匹配问题,本文提出了能量匹配的MFS HMM语音增强方法。该方法采用迭代的期望最大(Expectation Maximization, EM)法在线估计纯净语音和噪声的对数能量调整因子,并在线修正纯净语音和噪声的HMM参数,使得训练集和测试集能量相匹配,有效地解决了能量不匹配对增强语音质量影响的问题。主客观测试结果表明,本文所提方法优于参考算法。   相似文献   

19.
一种基于SDTS的HMM训练算法   总被引:7,自引:0,他引:7  
用传统的BW算法训练语音识别系统的HMM需要大量的语音数据。本文在假设声学模型系统的子空间捆绑结构(SDTS)为己知的前提下,提出了一种新的训练算法,可以有效地减少系统对训练数据的需求。理论分析和仿真表明,与传统的BW算法比较,新的训练算法(IBW)可压缩模型参数15倍,从而可大量地减少训练数据。尽管新算法要用到系统的先验知识,但它还是显示了许多优越性。  相似文献   

20.
A simple statistical extension at the individual tree level for an earlier-developed very high frequency forest backscatter model is proposed. This extended model treats trunk volumes as random quantities. A concept of random forest reflection coefficient is also introduced to characterize radar returns from individual trees. Based on the extended model, a set of algorithms for estimating the mean trunk (stem) volume from synthetic aperture radar data at the individual tree level is developed assuming that the areal tree density is known. The algorithms are specified for different scenarios related to a priori information on parameters of statistical distributions for the trunk volume and fluctuations of the forest reflection coefficient. An approximate lower bound on the standard deviation in the unbiased estimation of the mean trunk (stem) volume is proposed. This bound can be readily obtained by means of computer simulation for any specified statistical distribution for the trunk volume and fluctuations of the forest reflection coefficient. Performance analysis for the proposed algorithms is numerically performed by means of Monte Carlo simulation for a variety of scenarios. This analysis has shown that the algorithms provide nearly unbiased and efficient estimates, and the proposed lower bound is a very accurate approximation. The results of the study have demonstrated that the approach and methods developed in this paper suggest promising solutions in accurate forest parameter retrieval.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号