共查询到20条相似文献,搜索用时 31 毫秒
1.
基于EMD和改进双门限法的语音端点检测 总被引:3,自引:0,他引:3
语音端点检测的准确与否直接影响到语音识别系统的计算复杂度和识别能力,在基于短时能量和过零率的端点检测算法中,能量计算方法不尽合理而且在低信噪比下检测效果大大降低。对此提出了一种基于经验模式分解和改进双门限法的语音端点检测算法,仿真结果表明在低信噪比情况下本文算法有更好的端点检测能力,显示了算法的优越性。 相似文献
2.
As a promising technique, sparse coding can be widely used for representation, compression, de-noising and separation of signals. This technique has been introduced into noisy speech processing, where enhancing speech itself or speech feature remains a challenge. Unlike other fields where noises are dense, the noises in speech are often sparse or partly sparse over the speech dictionary, re-sulting in performance degradation. It is necessary to un-derstand the noise conditions of speech environments and the applied range of sparse coding. This paper analyzes the assumptions of sparse coding and provides the bounds of reconstruction error for two sparse coding methods which are widely used. Based on this analysis, the performance of the two methods under different conditions are com-pared. The results show that the performance of sparse coding can be improved by a well-prepared noise dictio-nary. Experiments on speech enhancement and recognition are conducted, and the results coincide with the theoretical analysis well. 相似文献
3.
4.
Speech enhancement algorithms play an important role in speech signal processing. Over the past several decades, many algorithms have been studied for speech enhancement. A speech enhancement algorithm uses a noise removal method and a statistical model filter to analyze the speech signal in the frequency domain. Spectral subtraction and Wiener filters have been used as representative algorithms. These algorithms have excellent speech enhancement performance, but suffer from deterioration in performance due to specific noise or low signal-to-noise ratio (SNR) environments. In addition, according to estimations of erroneous noise, a noise existing in a voice signal is maintained so that a spectrum corresponding to a voice signal is distorted, or a frame corresponding to a voice signal cannot be retrieved, and voice recognition performance deteriorates. The problem of deterioration in speech recognition performance arises from the difference between speech recognition and training model. We use silence-feature normalization model as a methodology to improve the recognition rate resulting from the difference in the noisy environments. Conventional silence-feature normalization has a problem in that the silent part of the energy increases, which affects recognition performance due to unclear boundaries categorizing the voice. In this study, we use the cepstrum feature of the noise signals in the silence-feature normalization model to improve the performance of silence-feature normalization in a signal with a low SNR by setting a reference value for voiced and unvoiced classification. As a result of recognition rate confirmation, the recognition rates improve in performance, compared with other methods. 相似文献
5.
针对语音识别实际应用过程中的噪声问题,给出了一种新的抗噪声的特征提取算法,即先利用小波变换将语音信号进行小波子带分解,再根据人耳的听觉掩蔽效应,由谱压缩的技术,将小波变换后的子带语音信号进行压缩,从而提取其对应的语音特征。通过MATLAB软件建立实验平台,仿真实验结果表明该语音特征可以在噪声环境下得到较高的识别率。新的特征参数即充分利用了小波的抗噪声特性又有效地降低了语音识别中的训练环境和识别环境间的失配,具有抗噪声的特点。 相似文献
6.
Based on the observation that dissimilar speech enhancement algorithms perform differently for different types of interference and noise conditions, we propose a context-adaptive speech pre-processing scheme, which performs adaptive selection of the most advantageous speech enhancement algorithm for each condition. The selection process is based on an unsupervised clustering of the acoustic feature space and a subsequent mapping function that identifies the most appropriate speech enhancement channel for each audio input, corresponding to unknown environmental conditions. Experiments performed on the MoveOn motorcycle speech and noise database validate the practical value of the proposed scheme for speech enhancement and demonstrate a significant improvement in terms of speech recognition accuracy, when compared to the one of the best performing individual speech enhancement algorithm. This is expressed as accuracy gain of 3.3% in terms of word recognition rate. The advance offered in the present work reaches beyond the specifics of the present application, and can be beneficial to spoken interfaces operating in fast-varying noise environments. 相似文献
7.
Milner B.P. Vaseghi S.V. 《Vision, Image and Signal Processing, IEE Proceedings -》1994,141(5):280-288
A comparative study is presented of three noise-compensation schemes, namely spectral subtraction, Wiener filters, and noise adaptation, for hidden-Markov-model-based speech recognition in adverse environments. The noise-compensation methods are evaluated on a spoken-digit database, in the presence of car noise and helicopter noise at different signal-to-noise ratios. Experimental results demonstrate that the noise-compensation methods achieve a substantial improvement in recognition accuracy across a wide range of signal-to-noise ratios. At a signal-to-noise ratio of -6 dB the recognition accuracy is improved from 11% to 83%. The use of cepstral-time matrices as an improved speech representation is also considered, and their combination with the noise-compensation methods is shown. Experiments show that the cepstral-time matrix is a more robust feature than a vector of identical size, composed of a combination of cepstral and differential cepstral features 相似文献
8.
高脉冲噪声坏境中双门限法语音端点检测研究 总被引:1,自引:0,他引:1
语音端点检测是对有效语音段的识别关键技术,准确的端点检测使语音信号的后续处理计算量减少,有效地节约资源。现在多数语音端点检测技术例如能频值、谱熵、小波能量熵变换等都能准确检测出有效的语音段。文中介绍了一种双门限端点检测法,即利用短时平均过零率和短时平均能量法进行双门限检测,再设置一个最短时间门限,有效地在高脉冲噪声环境中准确识别汉语发音。通过与其他方法对比实验,文中双门限技术在短时高脉冲噪声环境下能有效提高语音识别率。仿真结果表明,端点检测正确率达93%。 相似文献
9.
10.
11.
12.
The state parameters of the hidden Markov model are represented by the autocorrelation coefficients of a context window that can be adaptively transformed to cepstral and delta cepstral coefficients according to the environmental noise. Experimental results show that it can significantly improve the speech recognition rate under noisy environments 相似文献
13.
Endpoint detection is one of the most important steps in speech recognition. In a high SNR environment, the algorithm based on short-time energy and zero rate could be used. But when the SNR is low, this method may not be accurate. Some researchers proposed an algorithm which is based on MFCC Euclidean distance. It has a better performance in a noise environment. But that algorithm needs two thresholds to find the start and end point. However, when the values of two thresholds are not suitable, the detected result could be extremely bad. In this paper, we proposed an improved algorithm which is based on MFCC cosine value. This method can reduce errors, since it only needs one single threshold. The benefit of this improved algorithm is that the result can surely contain the real voice component. According to the experiment data, this improved algorithm can improve the speech recognition rate by 10% even in noise environment (SNR = 0). Thus, it proved that this improved methods has better robustness. 相似文献
14.
It has become increasingly important to develop hands-free speech recognition techniques for the human-computer interface in car environments. However, severe car noise degrades the speech recognition performance substantially. To compensate the performance loss, it is necessary to adapt the original speech hidden Markov models (HMMs) to meet changing car environments. A novel frame-synchronous adaptation mechanism for in-car speech recognition is presented. This mechanism is intended to perform unsupervised model adaptation efficiently on a frame-by-frame basis instead of a conventional adaptation algorithm relying on batch adaptation data and supervision information. The proposed adaptation scheme is performed during frame likelihood calculation where an optimal equalisation factor is first computed to equalise the model mean vector and the input frame vector. This equalisation factor then serves as a reference index to retrieve an additional bias vector for model mean adaptation. As a result, a rapid and flexible algorithm is exploited to establish a new robust likelihood measure. In experiments on hands-free in-car speech recognition with the microphone far from the talker, this framework is found to be effective in terms of recognition rate and computational cost under various driving speeds 相似文献
15.
本文在文献(1)建立的外周听觉系统以及部分中枢听觉神经系统的基础上,建立了一个主意识别器。它由听觉模型作为语音声学前端处理器(即特征提取),由具有tonotopic组织结构的神经网络作为识别分类器。大量实验表明,由该听觉模型提取的特征参数不仅能很好地表示主意区别意义,而且对于噪声环境下的语音特征表示有较好tobustness。语音识别实验表明:在有噪声的情况下,采用听觉模型参数的识别器,其识别率明 相似文献
16.
17.
18.
Cun-Tai Guan Shu-Hung Leung Wing-Hong Lan 《Electronics letters》1998,34(1):30-32
A multi-model approach for noisy speech recognition is proposed. This approach comprised an SVD-based preprocessing front-end and a multi-model HMM recognition structure. It can provide a high recognition rate over a large range of SNRs for speech recognition in wide-band additive noise 相似文献
19.
并行子带HMM最大后验概率自适应非线性类估计算法 总被引:1,自引:0,他引:1
目前,自动语音识别(ASR)系统在实验室环境下获得了较高的识别率,但是在实际环境中,由于受到背景噪声和传输信道的影响,系统的识别性能急剧恶化.本文以听觉试验为基础,提出一种新的独立子带并行最大后验概率的非线性类估计算法,用以提高识别系统的鲁棒性.本算法利用多种噪声和识别内容功率谱差异,以及噪声在不同频带上对HMM影响的不同,采用多层感知机(MLP)对噪声环境下最大后验概率进行非线性映射,以减少识别系统由于环境不匹配而导致的识别性能下降.实验表明:该算法性能明显优于最大后验线性回归算法和Sangita提出的子带语音识别算法. 相似文献