共查询到20条相似文献,搜索用时 171 毫秒
1.
在二维的时频域网格结构中,相邻点上语音信号的存在与否是相关的,传统的马尔可夫链不能对二维的时频相关性进行自适应的建模.基于语音信号在时频域中的相关性,提出了一种利用二维的相关模型估计语音掩模的方法.该方法将时频域中带噪语音信号的对数功率谱划分为语音和非语音类,利用时域中的状态转移概率和前向因子描述语音信号的时域相关性,同时利用频域中的状态转移概率和邻域因子描述语音信号的频域相关性.通过全局的统计最优化,该模型将时域相关性和频域相关性相结合.给出了该模型的序贯化更新方法,逐帧更新模型并估计语音出现概率.在当前已知对数功率谱和模型参数的条件下,通过最大化后验概率得到的语音信号状态矩阵可以作为语音掩模的最优估计.将该方法与几种现有的语音掩模在线估计方法进行比较,实验结果显示出了该方法的优越性. 相似文献
2.
3.
4.
为了进一步提高语音识别系统的准确率,使语音产品应用更加方便,提出了一种隐马尔可夫模型和代数神经网络相结合的语音识别方法.利用隐马尔可夫模型生成最佳语音状态序列,将最佳状态序列的输出概率作为前馈型神经网络的输入,通过代数神经网络进行分类识别.使用Matlab7.0实验平台进行仿真,实验结果表明,与传统神经网络相比,该方法在收敛速度、鲁棒性和识别率方面都有改善. 相似文献
5.
6.
7.
尽管作为当前最为流行的语音识别模型,隐马尔可夫模型(HMM)由于采用了状态输出独立同分布假设,因此不能描述语音现象中固有的时间相关性。文章介绍了一个更为灵活的基于段长分布HMM(DDBHMM)的研究帧相关性的框架,并在此基础上提出了一个混合模型,采用一种将语音特征静态信息和动态变化信息分别描述又有机结合在一起的方式,以较小的计算代价更为合理地刻划了真实的语音现象。汉语大词汇量非特定人连续语音识别的实验表明,通过利用帧相关性识别系统的性能得到了明显改善。 相似文献
8.
本文提出了一种基于帧间相关特性的连续语音流的音节切分方法,采用反映相邻帧间LPC系数相关程度的帧间相关特性及其参数,进行连续语音流的分段切分,并通过时域参数对切分出的各个语音段进行音索性质标记,再根据汉语音节组成规则最后确定出音节切分及其边界.汉语数字串语音流的音节切分实验表明了该方法的有效性. 相似文献
9.
语音识别系统在音频质量较差时,经常出现识别错误的情况,为提高识别精度,基于连续隐马尔科夫模型设计英语翻译机器人语音识别系统。在硬件中,设计音频信号接收器和机器人音频识别模块主处理器。在软件中,对音频信号量化并预加重处理,计算帧移距离与每帧长度之间的比值,获取模拟信号转换频率与基本单位量化指标;基于连续隐马尔可夫模型构建语音文本解编码器,计算窗函数的宽度,在网格中获取马尔科夫链概率路径,比较不同概率路径的复杂度;设计英语翻译机器人语音识别算法,得到英语翻译机器人的语音识别结果。由实验数据可知:该系统在三种不同音频质量下的语音识别准确率均在75%以上,较其他系统更稳定,在同等音频质量下,准确率更高,可见连续隐马尔可夫模型的语音识别系统优于其他系统。 相似文献
10.
11.
This paper investigates the modelling of the interframe dependence in a hidden Markov model (HMM) for speech recognition. First, a new observation model, assuming dependence on multiple previous frames, is proposed. This model represents such a dependence structure with a weighted mixture of a set of first-order conditional Gaussian densities, each mixture component accounting for a specific conditional frame. Next, an optimization in choosing the conditional frames/segment is performed in both training and recognition, thereby helping to remove the mismatch of the conditional segments due to different observation histories. An EM (Expectation–Maximization) iteration algorithm is developed for the estimation of the model parameters and for the optimization over the dependence structure. Experimental comparisons on a speaker-independent E-set database show that the new model, without optimization on the dependence structure, achieves better performance than the standard HMM, the bigram HMM and the linear-predictive HMM, all in comparable or smaller parameter sizes. The optimization over the dependence structure leads to further improvement in the performance. 相似文献
12.
In this paper, we propose a novel approach to improve the performance of minima controlled recursive averaging (MCRA) based on a conditional maximum a posteriori (MAP) criterion. From an investigation of the MCRA scheme, it is discovered that the MCRA method cannot take full consideration of the inter-frame correlation of voice activity since the noise power estimate is adjusted by the speech presence probability depending on an observation of the current frame. To avoid this phenomenon, the proposed MCRA approach incorporates the conditional MAP criterion in which the noise power estimate is obtained using the speech presence probability conditioned on both the current observation and the speech activity decision in the previous frame. Experimental results show that compared to the conventional MCRA method the proposed MCRA technique based on conditional MAP obtains low estimation error and when integrated into a speech enhancement system achieves improved speech quality. 相似文献
13.
Jen-Tzung Chien Chuan-Wei Ting 《IEEE transactions on audio, speech, and language processing》2009,17(7):1279-1291
This paper presents a novel streamed hidden Markov model (HMM) framework for speech recognition. The factor analysis (FA) principle is adopted to explore the common factors from acoustic features. The streaming regularities in building HMMs are governed by the correlation between cepstral features, which is inherent in common factors. Those features corresponding to the same factor are generated by the identical HMM state. Accordingly, the multiple Markov chains are adopted to characterize the variation trends in different dimensions of cepstral vectors. An FA streamed HMM (FASHMM) method is developed to relax the assumption of standard HMM topology, namely, that all features of a speech frame perform the same state emission. The proposed FASHMM is more flexible than the streamed factorial HMM (SFHMM) where the streaming was empirically determined. To reduce the number of factor loading matrices in FA, we evaluated the similarity between individual matrices to find the optimal solution to parameter clustering of FA models. A new decoding algorithm was presented to perform FASHMM speech recognition. FASHMM carries out the streamed Markov chains for a sequence of multivariate Gaussian mixture observations through the state transitions of the partitioned vectors. In the experiments, the proposed method reduced the recognition error rates significantly when compared with the standard HMM and SFHMM methods. 相似文献
14.
一种针对区分性训练的受限线性搜索优化方法 总被引:1,自引:0,他引:1
提出一种称为“受限线性搜索”的优化方法,并用于语音识别中混合高斯的连续密度隐马尔科夫(CDHMM)模型的区分性训练。该方法可用于优化基于最大互信息(MMI)准则的区分性训练目标函数。在该方法中,首先把隐马尔科夫模型(HMM)的区分性训练问题看成一个受限的优化问题,并利用模型间的KL度量作为优化过程中的一个限制。再基于线性搜索的思想,指出通过限制更新前后模型间的KL度量,可将HMM的参数表示成一种简单的二次形式。该方法可用于优化混合高斯CDHMM模型中的任何参数,包括均值、协方差矩阵、高斯权重等。将该方法分别用于中英文两个标准语音识别任务上,包括英文TIDIGITS数据库和中文863数据库。实验结果表明,该方法相对传统的扩展Baum-Welch方法在识别性能和收敛特性上都取得一致提升。 相似文献
15.
Liu P. Liu C. Jiang H. Soong F. Wang R.-H. 《IEEE transactions on audio, speech, and language processing》2008,16(5):900-909
In this paper, we propose a novel optimization algorithm called constrained line search (CLS) for discriminative training (DT) of Gaussian mixture continuous density hidden Markov model (CDHMM) in speech recognition. The CLS method is formulated under a general framework for optimizing any discriminative objective functions including maximum mutual information (MMI), minimum classification error (MCE), minimum phone error (MPE)/minimum word error (MWE), etc. In this method, discriminative training of HMM is first cast as a constrained optimization problem, where Kullback-Leibler divergence (KLD) between models is explicitly imposed as a constraint during optimization. Based upon the idea of line search, we show that a simple formula of HMM parameters can be found by constraining the KLD between HMM of two successive iterations in an quadratic form. The proposed CLS method can be applied to optimize all model parameters in Gaussian mixture CDHMMs, including means, covariances, and mixture weights. We have investigated the proposed CLS approach on several benchmark speech recognition databases, including TIDIGITS, Resource Management (RM), and Switchboard. Experimental results show that the new CLS optimization method consistently outperforms the conventional EBW method in both recognition performance and convergence behavior. 相似文献
16.
利用空间相关性的改进HMM模型 总被引:1,自引:0,他引:1
语音识别领域中所采用的经典HMM模型,忽略了语音信号间的相关信息.针对这一问题,利用语音信号的空间相关性对经典HMM模型进行补偿,得到一种改进模型.该方法通过空间相关变换,描述了当前语音特征与历史数据之间的空间相关性,从而对联合状态输出分布进行建模.改进模型的解码算法利用空间相关性变换的参数更新算法在经典ⅧⅥM的解码算法基础上得到.实验结果表明,上述方法在说话人无关连续语音识别系统上获得了明显的性能改进. 相似文献
17.
虽然基于对角协方差矩阵高斯分布的隐马尔可夫模型(HiddenMarkovModelBasedonDiagonalGaussiandistributions,HMM-DG)目前在现代大词表连续语音识别系统中得到了广泛的应用,但HMM-DG在帧内特征相关(intra-framefeaturescorrelation)建模方面存在缺陷。该文将因子分析方法与HMM-DG的混合高斯建模相结合,提出了一种具有弹性的帧内特征相关隐马尔可夫模型框架—基于因子分析的隐马尔可夫模型(HiddenMarkovModelBasedonFactorAnalysis,HMM-FA),并导出了HMM-FA的训练算法。仿真实验表明:在相同的条件下,HMM-FA的性能优于HMM-DG。 相似文献
18.
This paper investigates a noise robust technique for automatic speech recognition which exploits hidden Markov modeling of stereo speech features from clean and noisy channels. The HMM trained this way, referred to as stereo HMM, has in each state a Gaussian mixture model (GMM) with a joint distribution of both clean and noisy speech features. Given the noisy speech input, the stereo HMM gives rise to a two-pass compensation and decoding process where MMSE denoising based on N-best hypotheses is first performed and followed by decoding the denoised speech in a reduced search space on lattice. Compared to the feature space GMM-based denoising approaches, the stereo HMM is advantageous as it has finer-grained noise compensation and makes use of information of the whole noisy feature sequence for the prediction of each individual clean feature. Experiments on large vocabulary spontaneous speech from speech-to-speech translation applications show that the proposed technique yields superior performance than its feature space counterpart in noisy conditions while still maintaining decent performance in clean conditions. 相似文献
19.
从语音信号声学特征空间的非线性流形结构特点出发, 利用流形上的压缩感知原理, 构建新的语音识别声学模型. 将特征空间划分为多个局部区域, 对每个局部区域用一个低维的因子分析模型进行近似, 从而得到混合因子分析模型. 将上下文相关状态的观测矢量限定在该非线性低维流形结构上, 推导得到其观测概率模型. 最终, 每个状态由一个服从稀疏约束的权重矢量和若干个服从标准正态分布的低维局部因子矢量所决定. 文中给出了局部区域潜在维数的确定准则及模型参数的迭代估计算法. 基于RM语料库的连续语音识别实验表明, 相比于传统的高斯混合模型(Gaussian mixture model, GMM)和子空间高斯混合模型(Subspace Gaussian mixture model, SGMM), 新声学模型在测试集上的平均词错误率(Word error rate, WER)分别相对下降了33.1%和9.2%. 相似文献
20.
This paper presents a combination approach to robust speech recognition by using two-stage model-based feature compensation.
Gaussian mixture model (GMM)-based and hidden Markov model (HMM)-based compensation approaches are combined together and conducted
sequentially in the multiple-decoding recognition system. The clean speech is firstly modeled as a GMM in the initial pass,
and then modeled as a HMM generated from the initial pass in the following passes, respectively. The environment parameter
estimation on these two modeling strategies are formulated both under maximum a posteriori (MAP) criterion. Experimental result
shows that a significant improvement is achieved compared to European Telecommunications Standards Institute (ETSI) advanced
compensation approach, GMM-based feature compensation approach, HMM-based feature compensation approach, and acoustic model
compensation approach. 相似文献