首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Discusses the problem of single-channel speech enhancement in variable noise-level environment. Commonly used, single-channel subtractive-type speech enhancement algorithms always assume that the background noise level is fixed or slowly varying. In fact, the background noise level may vary quickly. This condition usually results in wrong speech/noise detection and wrong speech enhancement process. In order to solve this problem, we propose a subtractive-type speech enhancement scheme. This new enhancement scheme uses the RTF (refined time-frequency parameter)-based RSONFIN (recurrent self-organizing neural fuzzy inference network) algorithm we developed previously to detect the word boundaries in the condition of variable background noise level. In addition, a new parameter (MiFre) is proposed to estimate the varying background noise level. Based on this parameter, the noise level information used for subtractive-type speech enhancement can be estimated not only during speech pauses, but also during speech segments. This new subtractive-type enhancement scheme has been tested and found to perform well, not only in variable background noise level condition, but also in fixed background noise level condition.  相似文献   

2.
基于自适应子带功率谱熵的语音端点检测算法   总被引:2,自引:1,他引:1  
在语音处理中,鲁棒性端点检测是语音处理最重要的领域之一,首先提出了一种子带功率谱熵(SPSE)的特征参数,然后,该参数结合Wuetal提出的自适应子带方法(ABS);发现了一种新颖的鲁棒特征参数-自适应子带谱熵(ASPSE),它能成功地在不同的背景噪声下检测语音端点。实验结果表明,在不同的噪声环境和信噪比下,ASPSE参数非常有效,而且该算法优于其它算法。  相似文献   

3.
为了克服基于小波尺度谱重排的时频分析方法中时、频分辨率不佳及时频分布可读性较差等问题,本文提出了一种基于参数优化Morlet小波变换和奇异值分解的海杂波背景下舰船目标检测算法。算法首先利用Shannon小波熵作为目标函数,根据高频地波雷达信号的特点自适应地优化Morlet小波变换的时间带宽积参数,使得后续重排尺度谱的时、频分辨率同时达到最佳。然后再对重排小波尺度谱进行基于奇异值分解的降噪处理,以抑制环境噪声的影响,进一步提高时频分布的可读性。实验结果表明:与传统的时频分析算法相比,本文提出的算法具有更好的时频聚集性和较强的噪声抑制能力,能有效地检测海杂波背景下缓慢运动的匀速和匀加速舰船目标。  相似文献   

4.
Several algorithms have been developed for tracking formant frequency trajectories of speech signals, however most of these algorithms are either not robust in real-life noise environments or are not suitable for real-time implementation. The algorithm presented in this paper obtains formant frequency estimates from voiced segments of continuous speech by using a time-varying adaptive filterbank to track individual formant frequencies. The formant tracker incorporates an adaptive voicing detector and a gender detector for formant extraction from continuous speech, for both male and female speakers. The algorithm has a low signal delay and provides smooth and accurate estimates for the first four formant frequencies at moderate and high signal-to-noise ratios. Thorough testing of the algorithm has shown that it is robust over a wide range of signal-to-noise ratios for various types of background noises.  相似文献   

5.
Speech recognizers achieve high recognition accuracy under quiet acoustic environments, but their performance degrades drastically when they are deployed in real environments, where the speech is degraded by additive ambient noise. This paper advocates a two phase approach for robust speech recognition in such environment. Firstly, a front end subband speech enhancement with adaptive noise estimation (ANE) approach is used to filter the noisy speech. The whole noisy speech spectrum is portioned into eighteen dissimilar subbands based on Bark scale and noise power from each subband is estimated by the ANE approach, which does not require the speech pause detection. Secondly, the filtered speech spectrum is processed by the non parametric frequency domain algorithm based on human perception along with the back end building a robust classifier to recognize the utterance. A suite of experiments is conducted to evaluate the performance of the speech recognizer in a variety of real environments, with and without the use of a front end speech enhancement stage. Recognition accuracy is evaluated at the word level, and at a wide range of signal to noise ratios for real world noises. Experimental evaluations show that the proposed algorithm attains good recognition performance when signal to noise ratio is lower than 5 dB.  相似文献   

6.
基于鲁棒H滤波器理论和共轭梯度自适应参数估计方法提出了一种对复杂噪声有抑制效果的语音增强算法。应用这种方法自适应地从带噪信号中提取语音参数时不必预先知道噪声源的统计特性,只要求噪声信号能量有限。因为它基于H滤波器,所以可保证由外界干扰和附加噪声引起的性能指标恶化达到最小。仿真结果表明:该语音增强算法具有计算速度快、鲁棒性好、语音增强效果明显、易于实现、可抑制复杂背景噪声等特点。  相似文献   

7.
语音识别系统中语音活动性检测方法的研究   总被引:1,自引:0,他引:1  
针对当前语音活动性检测技术中传统方法普适性差和在低信噪比下检测性能陡降的问题,研究了在低信噪比强噪声(平稳和非平稳)环境下的语音时频增强相和基于改进谱熵能量的活动性检测相结合的语音识别系统的研究。首先估计背景噪声能量,分别对语音信号进行频域和时域的增强处理;然后利用一种鲁棒性更好的特征参数来判断语音端点。验证结果,表明,该方法在平稳和非平稳两类噪声环境下均具有较好的检测性能,其应用范围更广泛。  相似文献   

8.
李强  陈浩  陈丁当 《计算机应用》2016,36(11):3212-3216
针对现有基于隐马尔可夫模型(HMM)的语音激活检测(VAD)算法对噪声的跟踪性能不佳的问题,提出采用Baum-Welch算法对具有不同特性的噪声进行训练,并生成相应噪声模型,建立噪声库的方法。在语音激活检测时,根据待测语音背景噪声的不同,动态地匹配噪声库中的噪声模型;同时,为了适应语音信号的实时处理,降低了语音参数提取的复杂度,并对判决阈值提出改进,以保证语音信号帧间的相关性。在不同噪声环境下对改进算法进行性能测试并与自适应多速率编码(AMR)标准、国际电信联盟电信标准分局(ITU-T)的G.729B标准比较,测试结果表明,改进算法在实时语音信号处理中能够有效提高检测的准确率及噪声跟踪能力。  相似文献   

9.
针对现有的助听器语音增强算法在非平稳噪声环境下,残留大量背景噪声的同时还引入了“音乐噪声”,致使增强语音可懂度和信噪比不理想等问题。提出了一种基于噪声估计的二值掩蔽语音增强算法,该算法利用人耳听觉感知理论,结合人耳的听觉特性和耳蜗的工作机理。采用最小值控制递归平均(Minima-Controlled Recursive Averaging,MCRA)算法获得估计噪声和初步增强语音;将估计噪声和初步增强语音分别通过可以模拟人工耳蜗模型的gammatone滤波器组进行滤波处理,得到各自的时频表示形式;利用人耳的听觉掩蔽特性,计算含噪语音在时频域的二值掩蔽;利用二值掩蔽得到增强语音。实验结果表明:该算法很大程度上去除了谱减法引入的“音乐噪声”,与基于MCRA谱减法相比,增强语音的语言可懂度指数(Speech Intelligibility Index,SII)、主观语音质量评估(Perceptual Evaluation of Speech Quality,PESQ)和信噪比(Signal to Noise Ratio,SNR)都得到了提高。  相似文献   

10.
提出了一种基于EEMD域统计模型的话音激活检测算法。算法首先利用总体平均经验模态分解(Ensemble empirical mode decomposition,EEMD)对带噪语音进行分解,得到信号的本征模式函数(Intrinsicmode function,IMF)分量,选择与原信号的相关性最高的两个分量相加组成主分量;然后对主分量进行频域分解,引入统计模型,求出EEMD域特征参数;最后利用噪声与语音的EEMD域特征参数的不同来进行语音激活检测。实验结果表明,在不同信噪比情况下,本文算法性能优于目前常用的VAD算法,特别在噪声强度大时体现出明显的优势。  相似文献   

11.
韦国刚  周萍  杨青 《测控技术》2015,34(2):31-34
语音端点检测是语音识别系统非常重要的组成部分,一种理想的语音端点检测方法,在噪声环境中要具有较强的鲁棒性.为了提高检测方法在噪声环境中的鲁棒性,在短时能量的基础上,结合谱平度和幅度谱的主频率特征,分别进行判决,再采用投票决策机制确定端点检测结果,提出了一种比较理想的语音端点检测方法.实验结果表明,与传统的短时能量法和短时TEO能量法相比,该算法在各种加性噪声下具有良好的鲁棒性,在较低信噪比下仍能准确地区分有用信号和噪声,验证了该算法的有效性.  相似文献   

12.
提出一种用于语音识别的鲁棒特征提取算法。该算法基于子带主频率信息,实现子带主频率信息与子带能量信息相结合,在特征参数中保留语谱中子带峰值位置信息。使用该算法设计抗噪孤立词语音识别系统,分别在白高斯噪声和背景语音噪声环境下,与传统特征算法做多种信噪比对比实验。试验结果表明该特征算法在2种噪声环境下的识别率有不同程度提高,具有良好的噪声鲁棒性。  相似文献   

13.
This paper proposes a new speech detection method by recurrent neural fuzzy network in variable noise-level environments. The detection method uses wavelet energy (WE) and zero crossing rate (ZCR) as detection parameters. The WE is a new and robust parameter, and is derived using wavelet transformation. It can reduce the influences of different types of noise at different levels. With the inclusion of ZCR, we can robustly and effectively detect speech from noise with only two parameters. For detector design, a singleton-type recurrent fuzzy neural network (SRNFN) is proposed. The SRNFN is constructed by recurrent fuzzy if-then rules with fuzzy singletons in the consequences, and the recurrent property makes them suitable for processing speech patterns with temporal characteristics. The learning ability of SRNFN helps avoid the need of empirically determining a threshold in normal detection algorithms. Experiments with different types of noises and various signal-to noise ratios (SNRs) are performed. The results show that using the WE and ZCR parameters-based SRNFN, a pretty good performance is achieved. Comparisons with another robust detection method, the refined time–frequency-based method, and other detectors have also verified the performance of the proposed method.  相似文献   

14.
基于语音存在概率和听觉掩蔽特性的语音增强算法   总被引:1,自引:0,他引:1  
宫云梅  赵晓群  史仍辉 《计算机应用》2008,28(11):2981-2983
低信噪比下,谱减语音增强法中一直存在的去噪度、残留的音乐噪声和语音畸变度三者间均衡这一关键问题显得尤为突出。为降低噪声对语音通信的干扰,提出了一种适于低信噪比下的语音增强算法。在传统的谱减法基础上,根据噪声的听觉掩蔽阈值自适应调整减参数,利用语音存在概率,对语音、噪声信号估计,避免低信噪比下端点检测(VAD)的不准确,有更强的鲁棒性。对算法进行了客观和主观测试,结果表明:相对于传统的谱减法,在几乎不损伤语音清晰度的前提下该算法能更好地抑制残留噪声和背景噪声,特别是对低信噪比和非平稳噪声干扰的语音信号,效果更加明显。  相似文献   

15.
Accurate detection of the boundaries of a speech utterance during a recording interval has been shown to be crucial for reliable and robust automatic speech recognition. The endpoint detection problem is fairly straightforward for high-level speech signals spoken in low-level stationary noise environments (e.g. signal-to-noise ratios greater than 30 dB). However, these ideal conditions do not always exist. One example, where reliable word detection is difficult, is speech spoken in a mobile environment. Because of road, tire, fan noises, etc. detection of speech often becomes problematic.Currently, most endpoint detection algorithms use only signal energy and duration information to perform the endpoint detection task. These algorithms perform quite well with reasonable signal-to-noise ratios. However, under the harshest of conditions (e.g. in a car travelling at 60 mph with the fan on high) these algorithms begin to fail.In this paper, an endpoint detection algorithm is presented which is based on hidden Markov model (HMM) technology. The algorithm explicitly determines a set of speech endpoints based on the output of a Viterbi decoding algorithm. This algorithm was tested using a template-based speech recognition system and also using an HMM based system.Based on a speaker dependent speech database from four talkers, recorded in a mobile environment under five different driving conditions (including traveling at 60 mph with the fan on), we tested several endpoint detection schemes. The results showed that, under some conditions, the HMM-based approach to endpoint detection performed significantly better than the energy-based system. The overall accuracy of the system using the HMM endpoint detector, when trained with clean inputs and when tested on the 11 word digits vocabulary (zero through nine and oh) with speech recorded in various mobile environments, was 99.7%. The equivalent accuracy of the energy based endpoint detector was 95.2% in a template based recognizer.  相似文献   

16.
Recently, several algorithms have been proposed to enhance noisy speech by estimating a binary mask that can be used to select those time-frequency regions of a noisy speech signal that contain more speech energy than noise energy. This binary mask encodes the uncertainty associated with enhanced speech in the linear spectral domain. The use of the cepstral transformation smears the information from the noise dominant time-frequency regions across all the cepstral features. We propose a supervised approach using regression trees to learn the nonlinear transformation of the uncertainty from the linear spectral domain to the cepstral domain. This uncertainty is used by a decoder that exploits the variance associated with the enhanced cepstral features to improve robust speech recognition. Systematic evaluations on a subset of the Aurora4 task using the estimated uncertainty show substantial improvement over the baseline performance across various noise conditions.  相似文献   

17.
We present a new speech enhancement scheme for a single-microphone system to meet the demand for quality noise reduction algorithms capable of operating at a very low signal-to-noise ratio. A psychoacoustic model is incorporated into the generalized perceptual wavelet denoising method to reduce the residual noise and improve the intelligibility of speech. The proposed method is a generalized time-frequency subtraction algorithm, which advantageously exploits the wavelet multirate signal representation to preserve the critical transient information. Simultaneous masking and temporal masking of the human auditory system are modeled by the perceptual wavelet packet transform via the frequency and temporal localization of speech components. The wavelet coefficients are used to calculate the Bark spreading energy and temporal spreading energy, from which a time-frequency masking threshold is deduced to adaptively adjust the subtraction parameters of the proposed method. An unvoiced speech enhancement algorithm is also integrated into the system to improve the intelligibility of speech. Through rigorous objective and subjective evaluations, it is shown that the proposed speech enhancement system is capable of reducing noise with little speech degradation in adverse noise environments and the overall performance is superior to several competitive methods.  相似文献   

18.
We present a new speech enhancement scheme for a single-microphone system to meet the demand for quality noise reduction algorithms capable of operating at a very low signal-to-noise ratio. A psychoacoustic model is incorporated into the generalized perceptual wavelet denoising method to reduce the residual noise and improve the intelligibility of speech. The proposed method is a generalized time-frequency subtraction algorithm, which advantageously exploits the wavelet multirate signal representation to preserve the critical transient information. Simultaneous masking and temporal masking of the human auditory system are modeled by the perceptual wavelet packet transform via the frequency and temporal localization of speech components. The wavelet coefficients are used to calculate the Bark spreading energy and temporal spreading energy, from which a time-frequency masking threshold is deduced to adaptively adjust the subtraction parameters of the proposed method. An unvoiced speech enhancement algorithm is also integrated into the system to improve the intelligibility of speech. Through rigorous objective and subjective evaluations, it is shown that the proposed speech enhancement system is capable of reducing noise with little speech degradation in adverse noise environments and the overall performance is superior to several competitive methods.  相似文献   

19.
These days’ speech processing devices like voice-controlled devices, radio, and cell phones have gained more popularity in the area of military, audio forensics, speech recognition, education and health sectors. In the real world, speech signal during communication always contains background noise. The main task of speech related applications is voice activity detection (VAD) which include speech communication, speech recognition, and speech coding. Noise-reduction schemes for speech communication may increase the quality of speech and improve working efficiency in military aviation. Most of the developed algorithms can improve the quality of speech but unable to remove the background noise from the speech. This study provides researchers with a summary of the challenges in speech communication with background noise and provides research directions in the area of military personnel and workforces who work in noisy environments. Results of the study reveal that the DSP-based voice activity detection and background noise reduction algorithm reduced the spurious values of the speech signal.  相似文献   

20.
语音信号端点检测方法综述及展望*   总被引:4,自引:1,他引:3  
端点检测是语音信号处理过程中非常重要的一步,它的准确性直接影响语音信号处理的速度和结果,因此端点检测方法的研究,特别是在噪声环境下端点检测的研究,一直是语音信号处理中的热点。从基于时域参数、频域参数、时频参数、模型匹配等方法的角度,较全面地回顾了端点检测方法的发展历程,对各种方法的优缺点进行了比较分析,并给出了这些方法的改进意见,对端点检测未来的研究方向进行了展望。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号