期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A Study of Variable-Parameter Gaussian Mixture Hidden Markov Modeling for Noisy Speech Recognition

Cui X. Gong Y. 《IEEE transactions on audio, speech, and language processing》2007,15(4):1366-1376

To improve recognition performance in noisy environments, multicondition training is usually applied in which speech signals corrupted by a variety of noise are used in acoustic model training. Published hidden Markov modeling of speech uses multiple Gaussian distributions to cover the spread of the speech distribution caused by noise, which distracts the modeling of speech event itself and possibly sacrifices the performance on clean speech. In this paper, we propose a novel approach which extends the conventional Gaussian mixture hidden Markov model (GMHMM) by modeling state emission parameters (mean and variance) as a polynomial function of a continuous environment-dependent variable. At the recognition time, a set of HMMs specific to the given value of the environment variable is instantiated and used for recognition. The maximum-likelihood (ML) estimation of the polynomial functions of the proposed variable-parameter GMHMM is given within the expectation-maximization (EM) framework. Experiments on the Aurora 2 database show significant improvements of the variable-parameter Gaussian mixture HMMs compared to the conventional GMHMMs 相似文献

2.

脑干听觉诱发电位的子波变换提取技术

温玉汉卢战《数据采集与处理》1997,12(2):91-95

概要地论述了诱发电位的主要特点及检测技术，以脑干听觉诱发电位为例，根据其时，频特性，把子波变换技术应用到ＢＡＥＰ的提取中，提出了ＢＡＥＰ的子波变换技术应用到ＢＡＥＰ的提取中，提出了ＢＡＥＰ的子波变换去噪算法；根据ＢＡＥＰ的信噪比较低的实际情况，提出了结合平均技术的改进算法，通过与传统的平均法比较，该方法可以大大减少检测时间和获得较高的信噪比及满意的波形。相似文献

3.

Noise Statistics Update Adaptive Beamformer With PSD Estimation for Speech Extraction in Noisy Environment

《IEEE transactions on audio, speech, and language processing》2008,16(8):1633-1641

This paper addresses the problem of extracting a desired speech source from a multispeaker environment in the presence of background noise. A new adaptive beamforming structure is proposed for this speech enhancement problem. This structure incorporates power spectral density (PSD) estimation of the speech sources together with a noise statistics update. An inactive-source detector based on minimum statistics is developed to detect the speech presence and to track the noise statistics. Performance of the proposed beamformer is investigated and compared to the minimum variance distortionless response (MVDR) beamformer with or without a postfilter in a real hands-free communication environment. Evaluations show that the proposed beamformer offers good interference and noise suppression levels while maintaining low distortion of the desired source. 相似文献

4.

Joint Dereverberation and Residual Echo Suppression of Speech Signals in Noisy Environments

《IEEE transactions on audio, speech, and language processing》2008,16(8):1433-1451

Hands-free devices are often used in a noisy and reverberant environment. Therefore, the received microphone signal does not only contain the desired near-end speech signal but also interferences such as room reverberation that is caused by the near-end source, background noise and a far-end echo signal that results from the acoustic coupling between the loudspeaker and the microphone. These interferences degrade the fidelity and intelligibility of near-end speech. In the last two decades, postfilters have been developed that can be used in conjunction with a single microphone acoustic echo canceller to enhance the near-end speech. In previous works, spectral enhancement techniques have been used to suppress residual echo and background noise for single microphone acoustic echo cancellers. However, dereverberation of the near-end speech was not addressed in this context. Recently, practically feasible spectral enhancement techniques to suppress reverberation have emerged. In this paper, we derive a novel spectral variance estimator for the late reverberation of the near-end speech. Residual echo will be present at the output of the acoustic echo canceller when the acoustic echo path cannot be completely modeled by the adaptive filter. A spectral variance estimator for the so-called late residual echo that results from the deficient length of the adaptive filter is derived. Both estimators are based on a statistical reverberation model. The model parameters depend on the reverberation time of the room, which can be obtained using the estimated acoustic echo path. A novel postfilter is developed which suppresses late reverberation of the near-end speech, residual echo and background noise, and maintains a constant residual background noise level. Experimental results demonstrate the beneficial use of the developed system for reducing reverberation, residual echo, and background noise. 相似文献

5.

Tree-Based Covariance Modeling of Hidden Markov Models

《IEEE transactions on audio, speech, and language processing》2006,14(6):2134-2146

In this paper, we present a tree-based, full covariance hidden Markov modeling technique for automatic speech recognition applications. A multilayered tree is built first to organize all covariance matrices into a hierarchical structure. Kullback–Leibler divergence is used in the tree-building to measure inter-Gaussian distortion and successive splitting is used to construct the multilayer covariance tree. To cope with the data sparseness problem in estimating a full covariance matrix, we interpolate the diagonal covariance matrix of a leaf-node at the bottom of the tree with the full covariance of its parent and ancestors along the path up to the root node. The interpolation coefficients are estimated in the maximum likelihood sense via the EM algorithm. The interpolation is performed in three different parametric forms: 1) inverse covariance matrix, 2) covariance matrix, and 3) off-diagonal terms of the full covariance matrix. The proposed algorithm is tested in three different databases: 1) the DARPA Resource Management (RM), 2) the Switchboard, and 3) a Chinese dictation. In all three databases, we show that the proposed tree-based full covariance modeling consistently performs better than the baseline diagonal covariance modeling. The algorithm outperforms other covariance modeling techniques, including: 1) the semi-tied covariance modeling (STC), 2) heteroscedastic linear discriminant analysis (HLDA), 3) mixtures of inverse covariance (MIC), and 4) direct full covariance modeling. 相似文献

6.

Single and Multiple F₀ Contour Estimation Through Parametric Spectrogram Modeling of Speech in Noisy Environments

Le Roux J. Kameoka H. Ono N. de Cheveigne A. Sagayama S. 《IEEE transactions on audio, speech, and language processing》2007,15(4):1135-1145

相似文献

7.

基于混合因子分析的隐马尔可夫模型

王新民姚天任《计算机工程与应用》2005,41(24):50-52

经典隐马尔可夫模型用于语音识别存在的两个主要缺陷是“离散状态假设”和“独立分布假设”。前者忽略了语音信号的非平稳性,后者忽略了语音信号的相关性。文章将混合因子分析方法用于语音建模,提出了基于混合因子分析的隐马尔可夫模型框架,并用动态贝叶斯网络形象地表示。该模型框架不仅从理论上解决了上述问题,而且给出许多语音建模的选择。目前广泛使用的统计声学模型均可视为该模型的特例。相似文献

8.

Multichannel Eigenspace Beamforming in a Reverberant Noisy Environment With Multiple Interfering Speech Signals

《IEEE transactions on audio, speech, and language processing》2009,17(6):1071-1086

相似文献

9.

基于前置滤波和小波变换的带噪语音基音周期检测方法 总被引：10，自引：0，他引：10

李辉戴蓓蒨陆伟《数据采集与处理》2005,20(1):100-104

根据语音信号的基音周期范围有限和在声门闭合时刻语音信号出现锐变的特点,提出一种基于前置滤波和小波变换的基音周期检测方法。带噪语音信号经过3阶椭圆低通滤波器滤波后,采用以二次样条小波作为小波函数,进行一级小波变换检测语音信号的锐变点,再计算基音周期。实验表明,本文提出的基音周期检测方法,与平均幅度差函数(AMDF)和自相关函数(ACF)方法相比,提高了提取基音周期的准确率;与多尺度小波变换的基音周期检测方法相比,减小了计算量,削弱了噪声信号和语音的共振峰对基音周期检测的影响。相似文献

10.

基于间接拉普拉斯模型因子估计的语音增强算法

欧世峰赵晓晖顾海军《数据采集与处理》2006,21(4):386-391

对DCT城基于拉普拉斯统计模型的语音增强,分析了模型因子的估计误差及其对于算法整体增强性能的影响,并根据广义高斯分布模型度其形态参数的概念与性质,提出了一种新的拉普拉斯模型因子估计方法,该方法结构简单,它利用拉普拉斯模型条件下语音分量方差与模型因子的对应关系,间接地获取模型因子的估计,算法不仅有效地消除了噪声分量对于估计精度的影响,而且可以快速地跟踪语音分量的变化。仿真结果表明,基于该模型因子估计方法的语音增强算法在多种噪声背景下具有更出色的语音增强效果。相似文献

11.

HMM-Based Gain Modeling for Enhancement of Speech in Noise

David Y. Zhao W. Bastiaan Kleijn 《IEEE transactions on audio, speech, and language processing》2007,15(3):882-892

Accurate modeling and estimation of speech and noise gains facilitate good performance of speech enhancement methods using data-driven prior models. In this paper, we propose a hidden Markov model (HMM)-based speech enhancement method using explicit gain modeling. Through the introduction of stochastic gain variables, energy variation in both speech and noise is explicitly modeled in a unified framework. The speech gain models the energy variations of the speech phones, typically due to differences in pronunciation and/or different vocalizations of individual speakers. The noise gain helps to improve the tracking of the time-varying energy of nonstationary noise. The expectation-maximization (EM) algorithm is used to perform offline estimation of the time-invariant model parameters. The time-varying model parameters are estimated online using the recursive EM algorithm. The proposed gain modeling techniques are applied to a novel Bayesian speech estimator, and the performance of the proposed enhancement method is evaluated through objective and subjective tests. The experimental results confirm the advantage of explicit gain modeling, particularly for nonstationary noise sources 相似文献

12.

抗噪声语音识别及语音增强算法的应用 总被引：1，自引：0，他引：1

汤玲戴斌《计算机仿真》2006,23(9):80-82,143

提高语音识别系统的鲁棒性是语音识别技术一个重要的研究课题。语音识别系统往往由于训练环境下的数据和识别环境下的数据不匹配造成系统的识别性能下降,为了让语音识别系统在含噪的环境下获得令人满意的工作性能,该文根据人耳听觉特性提出了一种鲁棒语音特征提取方法。在MFCC特征提取之前先对含噪语音特征进行掩蔽特性处理,同时结合语音增强方法对特征进行处理,最后得到鲁棒语音特征。通过4种不同试验结果分析表明,将这种方法用于抗噪声分析可以提高系统的抗噪声能力;同时这种特征的处理方法对不同噪声在不同信噪比有很好的适应性。相似文献

13.

脉冲噪声环境下宽带循环平稳信号DOA估计算法

尤国红邱天爽兰天《数据采集与处理》2012,27(4):399-403

针对传统二阶循环相关算法在脉冲噪声环境中的显著退化问题,本文以α稳定分布为噪声模型,提出基于分数低阶循环相关的波达方向(Direction of arrival,DOA)估计算法。利用分数低阶循环相关的相移特性,将宽带循环平稳信号的DOA估计问题转化为"中心频率"为ε的窄带问题,解决了宽带情况下DOA估计困难的问题。计算机仿真结果进一步验证了此算法的有效性,且性能优于传统SC-SSF(Spectral correlation signal subspacefitting)算法。相似文献

14.

基于隐马尔可夫模型的网络控制系统延时状态估计

李楠于晓明《计算机测量与控制》2010,18(6)

网络控制系统(NCS)中存在不确定的网络延时,使网络控制系统的分析和设计十分困难;针对这一问题,采用隐马尔可夫模型(HMM)的隐含层状态估计算法,以校园网络为实验平台,通过观测到的网络控制系统输出信号来获得估计信号,从而间接地对被随机时延噪声污染而存在顺序混乱的状态信号进行估计;并对不同时间间隔条件下,直接状态估计和扩展空问状态估计这两种方法的适用性进行了分析比较;实验结果表明在中小型网络控制系统中,基于隐马尔可夫模型的时延估计方法是简单、有效的. 相似文献

15.

单通道同频线性调制混合信号的时延估计

涂世龙郑辉《数据采集与处理》2010,25(4)

针对同频线性调制混合信号单通道盲分离中的时延估计问题,提出了一种基于自相关函数的估计方法.该方法利用接收基带信号的自相关函数建立关于时延的非线性方程,并通过延迟采样进一步构造适定方程组,然后利用梯度下降算法对方程组进行迭代求解,完成整个估计过程.理论分析表明,时延估计的误差正比于自相关函数估计的误差,随数据量增大不断减小.仿真结果验证了该方法的有效性. 相似文献

16.

Discriminative Estimation of Subspace Constrained Gaussian Mixture Models for Speech Recognition

Scott Axelrod Vaibhava Goel Ramesh Gopinath Peder Olsen Karthik Visweswariah 《IEEE transactions on audio, speech, and language processing》2007,15(1):172-189

In this paper, we study discriminative training of acoustic models for speech recognition under two criteria: maximum mutual information (MMI) and a novel "error-weighted" training technique. We present a proof that the standard MMI training technique is valid for a very general class of acoustic models with any kind of parameter tying. We report experimental results for subspace constrained Gaussian mixture models (SCGMMs), where the exponential model weights of all Gaussians are required to belong to a common "tied" subspace, as well as for subspace precision and mean (SPAM) models which impose separate subspace constraints on the precision matrices (i.e., inverse covariance matrices) and means. It has been shown previously that SCGMMs and SPAM models generalize and yield significant error rate improvements over previously considered model classes such as diagonal models, models with semitied covariances, and extended maximum likelihood linear transformation (EMLLT) models. We show here that MMI and error-weighted training each individually result in over 20% relative reduction in word error rate on a digit task over maximum-likelihood (ML) training. We also show that a gain of as much as 28% relative can be achieved by combining these two discriminative estimation techniques 相似文献

17.

基于多流隐马尔可夫模型的网络信号传输过程建模

李楠于晓明《计算机测量与控制》2010,18(8)

网络输入信号进入网络后,会不可避免地被网络延时污染,所以接收端得到的网络输出信号与发送端的输入信号间存在着误差,这种误差在接收端分别为事件驱动(Event-Driven)节点触发方式和时间骄动(Time-Driven)节点触发方式时分别由不同原因造成;基于多流隐马尔可夫模型(MS-HMMs)对接收端分别工作在两种节点触发方式时的网络信号传输过程进行建模,采用Viterbi算法对网络输入信号进行估计;实验结果表明该模型能够有效地实现对网络输入信号的估计. 相似文献

18.

一种基于HMM的动态语音模式时间归一化方法

邓伟赵翊兰《数据采集与处理》2003,18(3):277-281

研究了利用隐马尔可夫模型(HMM)对动态语音模式进行时间归一化的方法。引入了借助于HMM对语音基元观测序列所做的一种分段,这种分段被称之为语音基元观测序列的HMM全状态分段,并且定义了HMM全状态分段的符合度。根据HMM全状态分段的符合度确定了语音基元观测序列的最优HMM全状态分段,通过最优HMM全状态分段把语音基元观测序列转换为固定维数的向量,从而实现了动态语音模式的时间归一化。将动态语音模式的这一时间归一化方法在结合HMM和人工神经网络(ANN)的混合语音识别方法中进行了应用,实验结果表明这一时间归一化方法的有效性。相似文献

19.

一种基于HMM和ANN的语音情感识别分类器 总被引：2，自引：0，他引：2

罗毅《微计算机信息》2007,23(34):218-219,296

针对在语音情感识别中孤立使用隐马尔科夫模型（HMM）固有的分类特性较差的缺点，本文提出了利用隐马尔科夫模型和径向基函数神经网络（RBF）对惊奇，愤怒，喜悦，悲伤，厌恶5种语音情感进行识别的方法。该方法借助HMM规整语音情感特征向量，并用RBF作为最终的决策分类器。实验结果表明在本文的实验条件下此方法和孤立HMM相比具有更好的性能，厌恶的识别率有了较大改进。相似文献

20.

噪音环境下孤立词的语音识别

刘鹏王怀杰《数字社区&智能家居》2007,(12):1399-1400,1404

噪音环境下的语音识别一直是语音识别的难点，本文采用了谱减法进行去噪，进行孤立词（数字0-9）的识别，提高系统的识别率相似文献