首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In this paper, a feature extraction (FE) method is proposed that is comparable to the traditional FE methods used in automatic speech recognition systems. Unlike the conventional spectral‐based FE methods, the proposed method evaluates the similarities between an embedded speech signal and a set of predefined speech attractor models in the reconstructed phase space (RPS) domain. In the first step, a set of Gaussian mixture models is trained to represent the speech attractors in the RPS. Next, for a new input speech frame, a posterior‐probability‐based feature vector is evaluated, which represents the similarity between the embedded frame and the learned speech attractors. We conduct experiments for a speech recognition task utilizing a toolkit based on hidden Markov models, over FARSDAT, a well‐known Persian speech corpus. Through the proposed FE method, we gain 3.11% absolute phoneme error rate improvement in comparison to the baseline system, which exploits the mel‐frequency cepstral coefficient FE method.  相似文献   

2.
A segment-based speech recognition scheme is proposed. The basic idea is to model explicitly the correlation among successive frames of speech signals by using features representing contours of spectral parameters. The speech signal of an utterance is regarded as a template formed by directly concatenating a sequence of acoustic segments. Each constituent acoustic segment is of variable length in nature and represented by a fixed dimensional feature vector formed by coefficients of discrete orthonormal polynomial expansions for approximating its spectral parameter contours. In the training, an automatic algorithm is proposed to generate several segment-based reference templates for each syllable class. In the testing, a frame-based dynamic programming procedure is employed to calculate the matching score of comparing the test utterance with each reference template. Performance of the proposed scheme was examined by simulations on multi-speaker speech recognition for 408 highly confusing isolated Mandarin base-syllables. A recognition rate of 81.1% was achieved for the case using 5-segment, 8-reference template models with cepstral and delta-cepstral coefficients as the recognition features. It is 4.5% higher than that of a well-modelled 12-state, 5-mixture CHMM method using cepstral, delta cepstral, and delta-delta cepstral coefficients  相似文献   

3.
Automatic speech recognition under adverse noise conditions has been a challenging problem. Under noise conditions when the stationarity assumption is valid, effective techniques have been established to provide excellent recognition accuracies. Under the conditions when this assumption cannot hold, recognition performance de- clines rapidly. Missing data, MD, theory is a promising method for robust automatic speech recognition, ASR, under an y noise condition. Unfortunately, the choice of feature used in the recognizer process is commonly limited to spectral based representations. The combination of recognizers approach to MD ASR allows the use of cepstral based features within the MD framework through a fusion of features mechanism in the pat- tern recognition stage. It was found that under two types of non-stationary noise conditions the combined fused effect, experienced by the fusion process, increased recognition accuracies substantially over traditional MD and cepstral based recognizers.  相似文献   

4.
The authors deal with the problem of automatic speech recognition in the presence of additive white noise. The effect of noise is modelled as an additive term to the power spectrum of the original clean speech. The cepstral coefficients of the noisy speech are then derived from this model. The reference cepstral vectors trained from clean speech are adapted to their appropriate noisy version to best fit the testing speech cepstral vector. The LPC coefficients, LPC derived cepstral coefficients, and the distance between test and reference, are all regarded as functions of the noise ratio (the spectral power ratio of noise to noisy speech). A gradient based algorithm is proposed to find the optimal noise ratio as well as the minimum distance between the test cepstral vector and the noise adapted reference. A recursive algorithm based on Levinson-Durbin recursion is proposed to simultaneously calculate the LPC coefficients and the derivatives of the LPC coefficients with respect to the noise ratio. The stability of the proposed adaptation algorithm is also addressed. Experiments on multispeaker (50 males and 50 females) isolated Mandarin digits recognition demonstrate remarkable performance improvements over noncompensated method under noisy environment. The results are also compared to the projection based approach, and experiments show that the proposed method is superior to the projection approach under a severe noisy environment  相似文献   

5.
We propose a new bandpass filter (BPF)‐based online channel normalization method to dynamically suppress channel distortion when the speech and channel noise components are unknown. In this method, an adaptive modulation frequency filter is used to perform channel normalization, whereas conventional modulation filtering methods apply the same filter form to each utterance. In this paper, we only normalize the two mel frequency cepstral coefficients (C0 and C1) with large dynamic ranges; the computational complexity is thus decreased, and channel normalization accuracy is improved. Additionally, to update the filter weights dynamically, we normalize the learning rates using the dimensional power of each frame. Our speech recognition experiments using the proposed BPF‐based blind channel normalization method show that this approach effectively removes channel distortion and results in only a minor decline in accuracy when online channel normalization processing is used instead of batch processing.  相似文献   

6.
薛峰  俞一彪 《信号处理》2010,26(1):127-131
缺失数据理论的置信度分析用于说话人识别时,使用的是滤波器组语音特征,虽然系统的鲁棒性可以提高,但整体的误识率依然很高。为了进一步降低系统的误识率,本文在滤波器组语音特征分量置信度的基础上,提出了一种用于计算倒谱域特征MFCC各维分量置信度的方法CBTM,该方法通过一个置信度变换矩阵,估算出经过Mel谱减法处理后的MFCC各维分量的置信度,在此基础上通过对GMM模型的方差加权来减少置信度小的特征分量对输出概率的影响,以此来提高系统的鲁棒性。在基于SUDA2002语料库的说话人辨认实验中,上述方法对NoiseX 92噪声库中的white、pink、factory1噪声表现出了比传统方法更低的误识率,说明了这种方法的有效性。   相似文献   

7.
本文根据倒谱系数矢量在特征空间的统计分布特性,提出了一种新的等方差加权倒谱失真测度,这种测度的加权函数充分刻画了语音倒谱矢量在特征空间分布的精细结构,从而有效地辨识不同讲话者的特征,实验表明,和常规的欧氏距离及方差倒数加权距离等相比,本文所提的失真测度能显著提高基于矢量量化的说话人识别的正识率。  相似文献   

8.
In this paper, alternative dynamic features for speech recognition are proposed. The goal of this work is to improve speech recognition accuracy by deriving the representation of distinctive dynamic characteristics from a speech spectrum. This work was inspired by two temporal dynamics of a speech signal. One is the highly non‐stationary nature of speech, and the other is the inter‐frame change of a speech spectrum. We adopt the use of a sub‐frame spectrum analyzer to capture very rapid spectral changes within a speech analysis frame. In addition, we attempt to measure spectral fluctuations of a more complex manner as opposed to traditional dynamic features such as delta or double‐delta. To evaluate the proposed features, speech recognition tests over smartphone environments were conducted. The experimental results show that the feature streams simply combined with the proposed features are effective for an improvement in the recognition accuracy of a hidden Markov model–based speech recognizer.  相似文献   

9.
In this paper, we propose a robust distant-talking speech recognition by combining cepstral domain denoising autoencoder (DAE) and temporal structure normalization (TSN) filter. As DAE has a deep structure and nonlinear processing steps, it is flexible enough to model highly nonlinear mapping between input and output space. In this paper, we train a DAE to map reverberant and noisy speech features to the underlying clean speech features in the cepstral domain. For the proposed method, after applying a DAE in the cepstral domain of speech to suppress reverberation, we apply a post-processing technology based on temporal structure normalization (TSN) filter to reduce the noise and reverberation effects by normalizing the modulation spectra to reference spectra of clean speech. The proposed method was evaluated using speech in simulated and real reverberant environments. By combining a cepstral-domain DAE and TSN, the average Word Error Rate (WER) was reduced from 25.2 % of the baseline system to 21.2 % in simulated environments and from 47.5 % to 41.3 % in real environments, respectively.  相似文献   

10.
简志华  杨震 《信号处理》2007,23(3):383-387
本文提出了一种改进的倒谱域特征参数补偿算法GMCSM。根据语音信号的时变特性,GMCSM算法使用广义自回归条件异方差(Generalized Auto-Regressive Conditional Heteroscedasticity,GARCH)模型对语音信号的方差进行建模。实验数据表明,与常规倒谱相减法CSM和MEMCSM相比,GMCSM能够更有效地补偿因加性噪声引起的倒谱特征参数失真,减少识别的错误率,特别是在信噪比较低的情况下,GMCSM的性能更为显著。  相似文献   

11.
稳健语音识别技术发展现状及展望   总被引:12,自引:0,他引:12  
姚文冰  姚天任  韩涛 《信号处理》2001,17(6):484-493
本文在简单叙述稳健语音识别技术产生的背景后,着重介绍了现阶段国内外有关稳健语音识别的主要技术、研究现状及未来发展方向.首先简述引起语音质量恶化、影响语音识别系统稳健性的干扰源及其影响.然后分别介绍语音增强、稳健语音特征的提取、基于特征和模型的补偿技术、麦克风阵列、基于人耳的听觉处理及听觉视觉双模态语音识别等技术路线及发展现状.最后讨论稳健语音识别技术朱来的发展方向.  相似文献   

12.
作为说话人识别特征参量的MFCC的提取过程   总被引:5,自引:0,他引:5  
说话人识别是人的个体特征识别中的一个重要分支,在实际生活中已得到广泛应用。而人的听觉系统是一个比较理想的说话人识别系统,MFCC(Mel倒谱系数)模拟了人的听觉特性,是符合人听觉特性的语音特征参量,在实际应用中取得了较高的识别率。文中通过一个卷积同态系统简单介绍了语音信号的倒谱分析方法,并通过对Mel频率刻度得到符合人听觉特性的Mel频率等效滤波器组,最后介绍了MFCC求取的一般过程和算法。  相似文献   

13.
We propose a novel phase‐based method for single‐channel speech enhancement to extract and enhance the desired signals in noisy environments by utilizing the phase information. In the method, a phase‐dependent a priori signal‐to‐noise ratio (SNR) is estimated in the log‐mel spectral domain to utilize both the magnitude and phase information of input speech signals. The phase‐dependent estimator is incorporated into the conventional magnitude‐based decision‐directed approach that recursively computes the a priori SNR from noisy speech. Additionally, we reduce the performance degradation owing to the one‐frame delay of the estimated phase‐dependent a priori SNR by using a minimum mean square error (MMSE)‐based and maximum a posteriori (MAP)‐based estimator. In our speech enhancement experiments, the proposed phase‐dependent a priori SNR estimator is shown to improve the output SNR by 2.6 dB for both the MMSE‐based and MAP‐based estimator cases as compared to a conventional magnitude‐based estimator.  相似文献   

14.
Emotion recognition is one of the latest challenges in human-robot interaction. This paper describes the realization of emotional interaction for a Thinking Robot, focusing on speech emotion recognition. In general, speaker-independent systems show a lower accuracy rate compared with speaker-dependent systems, as emotional feature values depend on the speaker and their gender. However, speaker-independent systems are required for commercial applications. In this paper, a novel speaker-independent feature, the ratio of a spectral flatness measure to a spectral center (RSS), with a small variation in speakers when constructing a speaker-independent system is proposed. Gender and emotion are hierarchically classified by using the proposed feature (RSS), pitch, energy, and the mel frequency cepstral coefficients. An average recognition rate of 57.2% (plusmn 5.7%) at a 90% confidence interval is achieved with the proposed system in the speaker-independent mode.  相似文献   

15.
周宇欢  张雄伟  付强  徐鑫  王金明 《信号处理》2011,27(12):1914-1919
语音是一种复杂的非线性信号,这使得基于线性系统理论发展起来的传统说话人识别技术性能难以进一步提高。本文提出了多分形谱簇分析方法,用于分析语音信号的非线性特征,并应用于短语音(2秒)说话人识别。通过对Cantor集的仿真实验,发现不同标度区能反映出系统不同阶段的生长规律,因此可用一组连续变化的多分形谱分层次地表征系统的分形特性,即多分形谱簇分析方法。然后结合语信号的分形特点,提出一种语音的多分形谱簇特征(Multifractal Spectrum Cluster Feature, MSCF)的提取方法。最后将几种非线性特征与短时谱特征结合用于说话人识别,基于TIMIT数据库50人的实验表明,非线性特征与短时谱特征互补性较强,特别是MSCF与MFCC、LPC特征结合,使得系统的误识率下降到0.8%。   相似文献   

16.
A discriminative temporal feature processing method for robust speech recognition is presented by combining the knowledge and the statistical methods. The cepstral features are first filtered by a RASTA method based on human hearing perception and then processed using the minimum classification error algorithm. Improved recognition performance can be achieved in both quiet and noisy environments  相似文献   

17.
本文在丢失数据技术与声学后退技术的基础上,提出了一种基于模糊规则的鲁棒语音识别方法,首先根据先验知识或假定建立特征分量的可靠程度与其概率分布之间的模糊规则,识别时观察矢量的输出概率由一个基于规则的模糊逻辑系统来得到,并针对倒谱识别系统给出了一种具体的实现方法.实验结果表明,所提识别方法的性能显著优于丢失数据技术和声学后退技术.  相似文献   

18.
In this paper, we derive the signal power bias that arises when spectral amplitudes are smoothed by reducing their variance in the cepstral domain (often referred to as cepstral smoothing) and develop a power bias compensation method. We show that if chi-distributed spectral amplitudes are smoothed in the cepstral domain, the resulting smoothed spectral amplitudes are also approximately chi-distributed but with more degrees of freedom and less signal power. The key finding for the proposed power bias compensation method is that the degrees of freedom of chi-distributed spectral amplitudes are directly related to their average cepstral variance. Furthermore, this work gives new insights into the statistics of the cepstral coefficients derived from chi-distributed spectral amplitudes using tapered spectral analysis windows. We derive explicit expressions for the variance and covariance of correlated chi-distributed spectral amplitudes and the resulting cepstral coefficients, parameterized by the degrees of freedom. The results in this work allow for a cepstral smoothing of spectral quantities without affecting their signal power. As we assume the parameterized chi-distribution for the spectral amplitudes, the results hold for Gaussian, super-Gaussian, and sub-Gaussian distributed complex spectral coefficients. The proposed bias compensation method is computationally inexpensive and shown to work very well for white and colored signals, as well as for rectangular and tapered spectral analysis windows.  相似文献   

19.
A comparative study is presented of three noise-compensation schemes, namely spectral subtraction, Wiener filters, and noise adaptation, for hidden-Markov-model-based speech recognition in adverse environments. The noise-compensation methods are evaluated on a spoken-digit database, in the presence of car noise and helicopter noise at different signal-to-noise ratios. Experimental results demonstrate that the noise-compensation methods achieve a substantial improvement in recognition accuracy across a wide range of signal-to-noise ratios. At a signal-to-noise ratio of -6 dB the recognition accuracy is improved from 11% to 83%. The use of cepstral-time matrices as an improved speech representation is also considered, and their combination with the noise-compensation methods is shown. Experiments show that the cepstral-time matrix is a more robust feature than a vector of identical size, composed of a combination of cepstral and differential cepstral features  相似文献   

20.
基于信号递归度分析的语音端点检测方法   总被引:1,自引:0,他引:1  
针对低信噪比、非平稳噪声环境下的语音端点检测,提出了一种基于语音/噪声的信源系统动力学特性差异,通过分析信号递归度变化,设定双门限判定语音端点的方法。和传统的能量法、倒谱距离测度法比较,准确率较高。为语音特征提取和识别研究提供了新的途径。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号