首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到17条相似文献,搜索用时 140 毫秒
1.
根据不同尺度子带特征反映语音的不同细节特性,提出一种噪声下的多层子带(MLS)语音识别方法。将语音频谱分成多层多个子带,首先各子带分另单独进行识别,然后将各层各子带识别概率综合起来得到最终识别结果。将新方法应用于TIMIT数据饣E-Set在NoiseX92白噪声和F16噪声下识别实验。实验结果表明,多层子带方法在噪声环境和无噪情况下识别性能都有很大提高。  相似文献   

2.
一种基于子带处理的PAC说话人识别方法研究   总被引:1,自引:1,他引:0  
目前,说话人识别系统对于干净语音已经达到较高的性能,但在噪声环境中,系统的性能急剧下降.一种基于子带处理的以相位自相关(PAC)系数及其能量作为特征的说话人识别方法,即宽带语音信号经Mel滤波器组后变为多个子带信号,对各个子带数据经DCT变换后提取PAC系数作为特征参数,然后对每个子带分别建立HMM模型进行识别,最后在识别概率层中将HMM得出的结果相结合之后得到最终的识别结果.实验表明,该方法在不同信噪比噪声和无噪声情况下的识别性能都有很大提高.  相似文献   

3.
基于子带信息的鲁棒语音特征提取框架   总被引:2,自引:1,他引:2  
本文提出一种鲁棒语音特征提取框架。通过使用一种基于子带能量分布的噪声估计方法,无需静音段,就可以估计出带噪语音的子带噪声,同时提出结合谱减和谱加权方法对特征进行处理,最终生成具有较高鲁棒性的特征。 实验证明,在语音识别系统中,这种特征可以有效提高语音识别的鲁棒性,在噪声较强(信噪比0dB到15dB)的情况下,识别率可以提高20%以上;并且,在干净语音的情况下又能保证识别率没有大的下降;同时,这种特征上的处理方法对各种噪声的适应能力都很强,无需对噪声进行预先分类即可得到很好的抗噪效果。  相似文献   

4.
提出一种用于语音识别的鲁棒特征提取算法。该算法基于子带主频率信息,实现子带主频率信息与子带能量信息相结合,在特征参数中保留语谱中子带峰值位置信息。使用该算法设计抗噪孤立词语音识别系统,分别在白高斯噪声和背景语音噪声环境下,与传统特征算法做多种信噪比对比实验。试验结果表明该特征算法在2种噪声环境下的识别率有不同程度提高,具有良好的噪声鲁棒性。  相似文献   

5.
基于子带能熵比的语音端点检测算法   总被引:1,自引:0,他引:1  
张毅  王可佳  席兵  颜博 《计算机科学》2017,44(5):304-307, 319
准确地识别语音端点是语音识别过程中的一个重要步骤。在低信噪比环境下,为更好地增强语音和噪声的区分度,提高语音端点检测系统的准确率,在分析了常规子带谱熵端点检测算法的基础上结合子带能量,提出了一种基于子带能熵比的语音端点检测算法。该算法将子带能量和子带谱熵的比值作为端点检测的重要参数,以此设定阈值进行语音端点的检测。实验表明,该算法快速高效,具有较高的鲁棒性,在较低的信噪比环境下能准确地进行语音端点检测。  相似文献   

6.
王娜  郑德忠  刘海龙 《控制工程》2007,14(5):495-498
干净语音环境下识别率很高的说话人识别系统,在有噪声语音环境下识别性能显著降低。针对这一问题,将小波语音增强算法应用于说话人识别系统,提出一种结点阈值去噪新方法。语音增强主要目的是从带噪语音中尽可能地提取纯净的原始语音。在不同信噪比条件下进行实验,结果表明,提出的方法比传统的阈值法能更好地提高语音质量。  相似文献   

7.
端点检测是语音识别系统的一个重要组成,尤其是在噪声环境中,其准确性对语音识别系统性能有直接影响。提出了一种基于小波子带倒谱系数(SBC)的语音信号端点检测方法,利用小波变换对频带进行尺度划分,采用小波子带倒谱能量检测语音端点。通过与MFCC的仿真对比以及大量实验分析,小波子带倒谱特征在语音端点检测中具有更好的识别性能。  相似文献   

8.
缪彩练  王阳生 《计算机工程》2003,29(18):11-13,76
PMC技术在提高语音识别的鲁棒性方面发挥重要作用。但PMC技术仍存在一些难点:如何获得精确的卷积噪声模型;如何在低信噪比情况下提高识别性能。该文提出了PMC技术的改进方法:引入新的残差噪声模型以及伪干净语音模型,结合信号增强技术,能有效地提高系统的鲁棒性。实验是在英国剑桥大学的HTK语音识别工具包的基础上进行,结果表明,新的PMC技术在噪声环境下能显著提高识别性能。  相似文献   

9.
为提高说话人识别中语音特征参数对噪声的鲁棒性,本文提出在对语音进行小波包分解基础上,分析噪声的特性,在不同子带内进行谱减并设立权重,提出了一种新的语音特征参数多层美尔倒谱系数.仿真实验表明,与MFCC特征参数相比,ML-MFCC在噪声环境下具有更好的抗噪性能和说话人识别率.  相似文献   

10.
改进的语音端点检测技术   总被引:1,自引:0,他引:1       下载免费PDF全文
为了提高低信噪比下语音端点检测的性能,提出了一种改进的基于谱减法和自适应子带谱熵的语音端点检测方法。该方法先利用谱减法对带噪语音消除加性噪声,及时更新背景噪声估计,再对增强后的语音信号利用改进的自适应子带谱熵进行端点检测。实验结果表明,该方法具有良好的检测性能,相对传统方法提高了端点检测的准确率,在低信噪比环境下仍能比较准确地检测到语音的端点。  相似文献   

11.
This paper presents a new approach to speech enhancement from single-channel measurements involving both noise and channel distortion (i.e., convolutional noise), and demonstrates its applications for robust speech recognition and for improving noisy speech quality. The approach is based on finding longest matching segments (LMS) from a corpus of clean, wideband speech. The approach adds three novel developments to our previous LMS research. First, we address the problem of channel distortion as well as additive noise. Second, we present an improved method for modeling noise for speech estimation. Third, we present an iterative algorithm which updates the noise and channel estimates of the corpus data model. In experiments using speech recognition as a test with the Aurora 4 database, the use of our enhancement approach as a preprocessor for feature extraction significantly improved the performance of a baseline recognition system. In another comparison against conventional enhancement algorithms, both the PESQ and the segmental SNR ratings of the LMS algorithm were superior to the other methods for noisy speech enhancement.  相似文献   

12.
为了改善传统语音特征参数在复杂环境下识别性能不足的问题,提出了一种基于Gammatone滤波器和子带能量规整的语音特征提取方法.该方法以能量规整倒谱系数(PNCC)特征算法为基础,在前端引入平滑幅度包络和归一化Gammatone滤波器组,并通过子带能量规整方法抑制真实环境的背景噪声,最后在后端进行特征弯折和信道补偿处理加以改进.实验采用高斯混合通用背景分类器模型(GMM-UBM)将该算法和其他特征参数进行对比.结果表明,在多种噪声环境中相比其他特征参数,本文方法表现出良好的抗噪能力,即使在低信噪比下仍有较好的识别效果.  相似文献   

13.
何志勇  朱忠奎 《计算机应用》2011,31(12):3441-3445
语音增强的目标在于从含噪信号中提取纯净语音,纯净语音在某些环境下会被脉冲噪声所污染,但脉冲噪声的时域分布特征却给语音增强带来困难,使传统方法在脉冲噪声环境下难以取得满意效果。为在平稳脉冲噪声环境下进行语音增强,提出了一种新方法。该方法通过计算确定脉冲噪声样本的能量与含噪信号样本的能量之比最大的频段,利用该频段能量分布情况逐帧判别语音信号是否被脉冲噪声所污染。进一步地,该方法只在被脉冲噪声污染的帧应用卡尔曼滤波算法去噪,并改进了传统算法执行时的自回归(AR)模型参数估计过程。实验中,采用白色脉冲噪声以及有色脉冲噪声污染语音信号,并对低输入信噪比的信号进行语音增强,结果表明所提出的算法能显著地改善信噪比和抑制脉冲噪声。  相似文献   

14.
Noise robustness and Arabic language are still considered as the main challenges for speech recognition over mobile environments. This paper contributed to these trends by proposing a new robust Distributed Speech Recognition (DSR) system for Arabic language. A speech enhancement algorithm was applied to the noisy speech as a robust front-end pre-processing stage to improve the recognition performance. While an isolated Arabic word engine was designed, and developed using HMM Model to perform the recognition process at the back-end. To test the engine, several conditions including clean, noisy and enhanced noisy speech were investigated together with speaker dependent and speaker independent tasks. With the experiments carried out on noisy database, multi-condition training outperforms the clean training mode in all noise types in terms of recognition rate. The results also indicate that using the enhancement method increases the DSR accuracy of our system under severe noisy conditions especially at low SNR down to 10 dB.  相似文献   

15.
This paper investigates a noise robust technique for automatic speech recognition which exploits hidden Markov modeling of stereo speech features from clean and noisy channels. The HMM trained this way, referred to as stereo HMM, has in each state a Gaussian mixture model (GMM) with a joint distribution of both clean and noisy speech features. Given the noisy speech input, the stereo HMM gives rise to a two-pass compensation and decoding process where MMSE denoising based on N-best hypotheses is first performed and followed by decoding the denoised speech in a reduced search space on lattice. Compared to the feature space GMM-based denoising approaches, the stereo HMM is advantageous as it has finer-grained noise compensation and makes use of information of the whole noisy feature sequence for the prediction of each individual clean feature. Experiments on large vocabulary spontaneous speech from speech-to-speech translation applications show that the proposed technique yields superior performance than its feature space counterpart in noisy conditions while still maintaining decent performance in clean conditions.  相似文献   

16.
This paper presents a new approach for speech feature enhancement in the log-spectral domain for noisy speech recognition. A switching linear dynamic model (SLDM) is explored as a parametric model for the clean speech distribution. Each multivariate linear dynamic model (LDM) is associated with the hidden state of a hidden Markov model (HMM) as an attempt to describe the temporal correlations among adjacent frames of speech features. The state transition on the Markov chain is the process of activating a different LDM or activating some of them simultaneously by different probabilities generated by the HMM. Rather than holding a transition probability for the whole process, a connectionist model is employed to learn the time variant transition probabilities. With the resulting SLDM as the speech model and with a model for the noise, speech and noise are jointly tracked by means of switching Kalman filtering. Comprehensive experiments are carried out using the Aurora2 database to evaluate the new algorithm. The results show that the new SLDM approach can further improve the speech feature enhancement performance in terms of noise-robust recognition accuracy, since the transition probabilities among the LDMs can be described more precisely at each time point.  相似文献   

17.
An analysis-based non-linear feature extraction approach is proposed, inspired by a model of how speech amplitude spectra are affected by additive noise. Acoustic features are extracted based on the noise-robust parts of speech spectra without losing discriminative information. Two non-linear processing methods, harmonic demodulation and spectral peak-to-valley ratio locking, are designed to minimize mismatch between clean and noisy speech features. A previously studied method, peak isolation [IEEE Transactions on Speech and Audio Processing 5 (1997) 451], is also discussed with this model. These methods do not require noise estimation and are effective in dealing with both stationary and non-stationary noise. In the presence of additive noise, ASR experiments show that using these techniques in the computation of MFCCs improves recognition performance greatly. For the TI46 isolated digits database, the average recognition rate across several SNRs is improved from 60% (using unmodified MFCCs) to 95% (using the proposed techniques) with additive speech-shaped noise. For the Aurora 2 connected digit-string database, the average recognition rate across different noise types, including non-stationary noise background, and SNRs improves from 58% to 80%.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号