首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 171 毫秒
1.
在二维的时频域网格结构中,相邻点上语音信号的存在与否是相关的,传统的马尔可夫链不能对二维的时频相关性进行自适应的建模.基于语音信号在时频域中的相关性,提出了一种利用二维的相关模型估计语音掩模的方法.该方法将时频域中带噪语音信号的对数功率谱划分为语音和非语音类,利用时域中的状态转移概率和前向因子描述语音信号的时域相关性,同时利用频域中的状态转移概率和邻域因子描述语音信号的频域相关性.通过全局的统计最优化,该模型将时域相关性和频域相关性相结合.给出了该模型的序贯化更新方法,逐帧更新模型并估计语音出现概率.在当前已知对数功率谱和模型参数的条件下,通过最大化后验概率得到的语音信号状态矩阵可以作为语音掩模的最优估计.将该方法与几种现有的语音掩模在线估计方法进行比较,实验结果显示出了该方法的优越性.  相似文献   

2.
李宇  郭雷勇  谭洪舟 《计算机应用》2011,31(5):1447-1449
为了提高统计模型似然比测试的语音活动检测(VAD)的检测性能,利用前后语音帧间存在的统计相关特性,提出一种改进VAD算法。通过前帧语音频谱分量对先验信噪比进行递归估计,然后利用前一帧的语音检测状态来设计判决阈值,建立了双阈值隐马尔可夫模型语音活动判决规则。实验表明,此帧间相关性VAD算法的检测指标值优于Sohn算法。  相似文献   

3.
针对隐马尔可夫模型较强的语音信号表征能力和高斯混合模型良好的声音转换效果,提出了一种了隐马尔可夫模型和高斯混合模型相结合转换线谱频率的方法,给出了理论推导和算法流程,并利用高斯建模实现了韵律特征的转换.利用所述算法对录制的两段语音进行了仿真实验,转换语音有较好的自然度和清晰度,ABX测试结果显示,文中算法得到的语音在听觉上有90.2%的概率更接近目标说话人语音.  相似文献   

4.
为了进一步提高语音识别系统的准确率,使语音产品应用更加方便,提出了一种隐马尔可夫模型和代数神经网络相结合的语音识别方法.利用隐马尔可夫模型生成最佳语音状态序列,将最佳状态序列的输出概率作为前馈型神经网络的输入,通过代数神经网络进行分类识别.使用Matlab7.0实验平台进行仿真,实验结果表明,与传统神经网络相比,该方法在收敛速度、鲁棒性和识别率方面都有改善.  相似文献   

5.
针对经典马尔可夫模型没有考虑模型应用中状态、观测量间的上下文相关性以及状态转移概率动态性、可变性,提出一种模糊深隐马尔可夫模型.该模型通过增加观测值间的相关性、解决概率转移问题中的不确定性和改进参数优化算法,使之能够较好地应用于强噪声、训练数据缺损等情形的模式识别中.理论证明,显式模糊深隐马尔可夫模型在同等模型复杂度下具有模型优化程度高、区分度好、误识率低、鲁棒性高的特性.  相似文献   

6.
马娥娥  刘颖  王成儒 《计算机工程》2009,35(18):283-285
针对语音驱动的唇动合成系统进行基于小波包分析的语音特征提取,采用特征差分和口形帧前后关联的多帧语音表征语音的动态特性,利用主成分分析降低输入语音的特征维数。采用基于输入输出隐马尔可夫模型(IOHMM)的音视频映射模型构建语音驱动唇动合成系统,实验表明提取的语音参数比传统Mel倒谱系数鲁棒性更好,合成的口形序列更连贯、自然。  相似文献   

7.
尽管作为当前最为流行的语音识别模型,隐马尔可夫模型(HMM)由于采用了状态输出独立同分布假设,因此不能描述语音现象中固有的时间相关性。文章介绍了一个更为灵活的基于段长分布HMM(DDBHMM)的研究帧相关性的框架,并在此基础上提出了一个混合模型,采用一种将语音特征静态信息和动态变化信息分别描述又有机结合在一起的方式,以较小的计算代价更为合理地刻划了真实的语音现象。汉语大词汇量非特定人连续语音识别的实验表明,通过利用帧相关性识别系统的性能得到了明显改善。  相似文献   

8.
本文提出了一种基于帧间相关特性的连续语音流的音节切分方法,采用反映相邻帧间LPC系数相关程度的帧间相关特性及其参数,进行连续语音流的分段切分,并通过时域参数对切分出的各个语音段进行音索性质标记,再根据汉语音节组成规则最后确定出音节切分及其边界.汉语数字串语音流的音节切分实验表明了该方法的有效性.  相似文献   

9.
语音识别系统在音频质量较差时,经常出现识别错误的情况,为提高识别精度,基于连续隐马尔科夫模型设计英语翻译机器人语音识别系统。在硬件中,设计音频信号接收器和机器人音频识别模块主处理器。在软件中,对音频信号量化并预加重处理,计算帧移距离与每帧长度之间的比值,获取模拟信号转换频率与基本单位量化指标;基于连续隐马尔可夫模型构建语音文本解编码器,计算窗函数的宽度,在网格中获取马尔科夫链概率路径,比较不同概率路径的复杂度;设计英语翻译机器人语音识别算法,得到英语翻译机器人的语音识别结果。由实验数据可知:该系统在三种不同音频质量下的语音识别准确率均在75%以上,较其他系统更稳定,在同等音频质量下,准确率更高,可见连续隐马尔可夫模型的语音识别系统优于其他系统。  相似文献   

10.
分层次B帧是一种基于闭环运动补偿的时域可伸缩性视频编码,其特点是解码重构后的B帧可作为参考帧.针对分层次B帧的特点,提出一种新的B帧双向预测直接模式.通过挖掘相邻帧间的时域相关性,利用B帧作为参考帧时的前后向运动矢量进行时域缩放,精细当前块在直接模式下的运动矢量精度.仿真实验证明,该方法相对于传统的编码方式,编码性能增益平均提高0.46 dB.  相似文献   

11.
This paper investigates the modelling of the interframe dependence in a hidden Markov model (HMM) for speech recognition. First, a new observation model, assuming dependence on multiple previous frames, is proposed. This model represents such a dependence structure with a weighted mixture of a set of first-order conditional Gaussian densities, each mixture component accounting for a specific conditional frame. Next, an optimization in choosing the conditional frames/segment is performed in both training and recognition, thereby helping to remove the mismatch of the conditional segments due to different observation histories. An EM (Expectation–Maximization) iteration algorithm is developed for the estimation of the model parameters and for the optimization over the dependence structure. Experimental comparisons on a speaker-independent E-set database show that the new model, without optimization on the dependence structure, achieves better performance than the standard HMM, the bigram HMM and the linear-predictive HMM, all in comparable or smaller parameter sizes. The optimization over the dependence structure leads to further improvement in the performance.  相似文献   

12.
In this paper, we propose a novel approach to improve the performance of minima controlled recursive averaging (MCRA) based on a conditional maximum a posteriori (MAP) criterion. From an investigation of the MCRA scheme, it is discovered that the MCRA method cannot take full consideration of the inter-frame correlation of voice activity since the noise power estimate is adjusted by the speech presence probability depending on an observation of the current frame. To avoid this phenomenon, the proposed MCRA approach incorporates the conditional MAP criterion in which the noise power estimate is obtained using the speech presence probability conditioned on both the current observation and the speech activity decision in the previous frame. Experimental results show that compared to the conventional MCRA method the proposed MCRA technique based on conditional MAP obtains low estimation error and when integrated into a speech enhancement system achieves improved speech quality.  相似文献   

13.
This paper presents a novel streamed hidden Markov model (HMM) framework for speech recognition. The factor analysis (FA) principle is adopted to explore the common factors from acoustic features. The streaming regularities in building HMMs are governed by the correlation between cepstral features, which is inherent in common factors. Those features corresponding to the same factor are generated by the identical HMM state. Accordingly, the multiple Markov chains are adopted to characterize the variation trends in different dimensions of cepstral vectors. An FA streamed HMM (FASHMM) method is developed to relax the assumption of standard HMM topology, namely, that all features of a speech frame perform the same state emission. The proposed FASHMM is more flexible than the streamed factorial HMM (SFHMM) where the streaming was empirically determined. To reduce the number of factor loading matrices in FA, we evaluated the similarity between individual matrices to find the optimal solution to parameter clustering of FA models. A new decoding algorithm was presented to perform FASHMM speech recognition. FASHMM carries out the streamed Markov chains for a sequence of multivariate Gaussian mixture observations through the state transitions of the partitioned vectors. In the experiments, the proposed method reduced the recognition error rates significantly when compared with the standard HMM and SFHMM methods.  相似文献   

14.
一种针对区分性训练的受限线性搜索优化方法   总被引:1,自引:0,他引:1  
提出一种称为“受限线性搜索”的优化方法,并用于语音识别中混合高斯的连续密度隐马尔科夫(CDHMM)模型的区分性训练。该方法可用于优化基于最大互信息(MMI)准则的区分性训练目标函数。在该方法中,首先把隐马尔科夫模型(HMM)的区分性训练问题看成一个受限的优化问题,并利用模型间的KL度量作为优化过程中的一个限制。再基于线性搜索的思想,指出通过限制更新前后模型间的KL度量,可将HMM的参数表示成一种简单的二次形式。该方法可用于优化混合高斯CDHMM模型中的任何参数,包括均值、协方差矩阵、高斯权重等。将该方法分别用于中英文两个标准语音识别任务上,包括英文TIDIGITS数据库和中文863数据库。实验结果表明,该方法相对传统的扩展Baum-Welch方法在识别性能和收敛特性上都取得一致提升。  相似文献   

15.
In this paper, we propose a novel optimization algorithm called constrained line search (CLS) for discriminative training (DT) of Gaussian mixture continuous density hidden Markov model (CDHMM) in speech recognition. The CLS method is formulated under a general framework for optimizing any discriminative objective functions including maximum mutual information (MMI), minimum classification error (MCE), minimum phone error (MPE)/minimum word error (MWE), etc. In this method, discriminative training of HMM is first cast as a constrained optimization problem, where Kullback-Leibler divergence (KLD) between models is explicitly imposed as a constraint during optimization. Based upon the idea of line search, we show that a simple formula of HMM parameters can be found by constraining the KLD between HMM of two successive iterations in an quadratic form. The proposed CLS method can be applied to optimize all model parameters in Gaussian mixture CDHMMs, including means, covariances, and mixture weights. We have investigated the proposed CLS approach on several benchmark speech recognition databases, including TIDIGITS, Resource Management (RM), and Switchboard. Experimental results show that the new CLS optimization method consistently outperforms the conventional EBW method in both recognition performance and convergence behavior.  相似文献   

16.
利用空间相关性的改进HMM模型   总被引:1,自引:0,他引:1  
语音识别领域中所采用的经典HMM模型,忽略了语音信号间的相关信息.针对这一问题,利用语音信号的空间相关性对经典HMM模型进行补偿,得到一种改进模型.该方法通过空间相关变换,描述了当前语音特征与历史数据之间的空间相关性,从而对联合状态输出分布进行建模.改进模型的解码算法利用空间相关性变换的参数更新算法在经典ⅧⅥM的解码算法基础上得到.实验结果表明,上述方法在说话人无关连续语音识别系统上获得了明显的性能改进.  相似文献   

17.
虽然基于对角协方差矩阵高斯分布的隐马尔可夫模型(HiddenMarkovModelBasedonDiagonalGaussiandistributions,HMM-DG)目前在现代大词表连续语音识别系统中得到了广泛的应用,但HMM-DG在帧内特征相关(intra-framefeaturescorrelation)建模方面存在缺陷。该文将因子分析方法与HMM-DG的混合高斯建模相结合,提出了一种具有弹性的帧内特征相关隐马尔可夫模型框架—基于因子分析的隐马尔可夫模型(HiddenMarkovModelBasedonFactorAnalysis,HMM-FA),并导出了HMM-FA的训练算法。仿真实验表明:在相同的条件下,HMM-FA的性能优于HMM-DG。  相似文献   

18.
This paper investigates a noise robust technique for automatic speech recognition which exploits hidden Markov modeling of stereo speech features from clean and noisy channels. The HMM trained this way, referred to as stereo HMM, has in each state a Gaussian mixture model (GMM) with a joint distribution of both clean and noisy speech features. Given the noisy speech input, the stereo HMM gives rise to a two-pass compensation and decoding process where MMSE denoising based on N-best hypotheses is first performed and followed by decoding the denoised speech in a reduced search space on lattice. Compared to the feature space GMM-based denoising approaches, the stereo HMM is advantageous as it has finer-grained noise compensation and makes use of information of the whole noisy feature sequence for the prediction of each individual clean feature. Experiments on large vocabulary spontaneous speech from speech-to-speech translation applications show that the proposed technique yields superior performance than its feature space counterpart in noisy conditions while still maintaining decent performance in clean conditions.  相似文献   

19.
从语音信号声学特征空间的非线性流形结构特点出发, 利用流形上的压缩感知原理, 构建新的语音识别声学模型. 将特征空间划分为多个局部区域, 对每个局部区域用一个低维的因子分析模型进行近似, 从而得到混合因子分析模型. 将上下文相关状态的观测矢量限定在该非线性低维流形结构上, 推导得到其观测概率模型. 最终, 每个状态由一个服从稀疏约束的权重矢量和若干个服从标准正态分布的低维局部因子矢量所决定. 文中给出了局部区域潜在维数的确定准则及模型参数的迭代估计算法. 基于RM语料库的连续语音识别实验表明, 相比于传统的高斯混合模型(Gaussian mixture model, GMM)和子空间高斯混合模型(Subspace Gaussian mixture model, SGMM), 新声学模型在测试集上的平均词错误率(Word error rate, WER)分别相对下降了33.1%和9.2%.  相似文献   

20.
This paper presents a combination approach to robust speech recognition by using two-stage model-based feature compensation. Gaussian mixture model (GMM)-based and hidden Markov model (HMM)-based compensation approaches are combined together and conducted sequentially in the multiple-decoding recognition system. The clean speech is firstly modeled as a GMM in the initial pass, and then modeled as a HMM generated from the initial pass in the following passes, respectively. The environment parameter estimation on these two modeling strategies are formulated both under maximum a posteriori (MAP) criterion. Experimental result shows that a significant improvement is achieved compared to European Telecommunications Standards Institute (ETSI) advanced compensation approach, GMM-based feature compensation approach, HMM-based feature compensation approach, and acoustic model compensation approach.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号