期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Speech enhancement based on undecimated wavelet packet-perceptual filterbanks and MMSE–STSA estimation in various noise environments

Hac&#x; Ergun 《Digital Signal Processing》2008,18(5):797-812

In this paper, we proposed a new speech enhancement system, which integrates a perceptual filterbank and minimum mean square error–short time spectral amplitude (MMSE–STSA) estimation, modified according to speech presence uncertainty. The perceptual filterbank was designed by adjusting undecimated wavelet packet decomposition (UWPD) tree, according to critical bands of psycho-acoustic model of human auditory system. The MMSE–STSA estimation (modified according to speech presence uncertainty) was used for estimation of speech in undecimated wavelet packet domain. The perceptual filterbank provides a good auditory representation (sufficient frequency resolution), good perceptual quality of speech and low computational load. The MMSE–STSA estimator is based on a priori SNR estimation. A priori SNR estimation, which is a key parameter in MMSE–STSA estimator, was performed by using “decision directed method.” The “decision directed method” provides a trade off between noise reduction and signal distortion when correctly tuned. The experiments were conducted for various noise types. The results of proposed method were compared with those of other popular methods, Wiener estimation and MMSE–log spectral amplitude (MMSE–LSA) estimation in frequency domain. To test the performance of the proposed speech enhancement system, three objective quality measurement tests (SNR, segSNR and Itakura–Saito distance (ISd)) were conducted for various noise types and SNRs. Experimental results and objective quality measurement test results proved the performance of proposed speech enhancement system. The proposed speech enhancement system provided sufficient noise reduction and good intelligibility and perceptual quality, without causing considerable signal distortion and musical background noise. 相似文献

2.

A perceptually motivated stationary wavelet packet filterbank using improved spectral over-subtraction for enhancement of speech in various noise environments

Navneet Upadhyay Abhijit Karmakar 《International Journal of Speech Technology》2014,17(2):117-132

In this paper, we propose a speech enhancement method where the front-end decomposition of the input speech is performed by temporally processing using a filterbank. The proposed method incorporates a perceptually motivated stationary wavelet packet filterbank (PM-SWPFB) and an improved spectral over-subtraction (I-SOS) algorithm for the enhancement of speech in various noise environments. The stationary wavelet packet transform (SWPT) is a shift invariant transform. The PM-SWPFB is obtained by selecting the stationary wavelet packet tree in such a manner that it matches closely the non-linear resolution of the critical band structure of the psychoacoustic model. After the decomposition of the input speech, the I-SOS algorithm is applied in each subband, separately for the estimation of speech. The I-SOS uses a continuous noise estimation approach and estimate noise power from each subband without the need of explicit speech silence detection. The subband noise power is estimated and updated by adaptively smoothing the noisy signal power. The smoothing parameter in each subband is controlled by a function of the estimated signal-to-noise ratio (SNR). The performance of the proposed speech enhancement method is tested on speech signals degraded by various real-world noises. Using objective speech quality measures (SNR, segmental SNR (SegSNR), perceptual evaluation of speech quality (PESQ) score), and spectrograms with informal listening tests, we show that the proposed speech enhancement method outperforms than the spectral subtractive-type algorithms and improves quality and intelligibility of the enhanced speech. 相似文献

3.

基于遗传算法的仿生小波语音增强

董胡蒋伟进《测控技术》2016,35(11):1-4

分析遗传算法和仿生小波变换的原理和方法,提出一种基于遗传算法的仿生小波语音增强算法.首先将普通小波变换转换为仿生小波变换,得到仿生小波变换系数,接着利用遗传算法的选择、交叉、变异获得仿生小波的优化阈值参数,从而确定最优小波阈值,随后结合最优小波阈值和改进阈值函数去噪,最终将经阈值处理后的仿生小波的系数变换至普通小波域且实行连续小波逆变换,获得增强的语音信号.仿真结果表明,在低信噪比环境下,与传统的最小统计和仿生小波变换算法相比较,经本文提出的算法处理后的增强语音其失真和残余噪声更小,语音质量和可懂度都较高. 相似文献

4.

基于小波变换和Kalman滤波的语音增强方法 总被引：1，自引：0，他引：1

张恩东黄文浩《模式识别与人工智能》2009,22(1):28-31

针对受加性噪声干扰的语音信号,采用基于小波变换的Kalman滤波方法,提出一种有效的语音增强方法.分析在实际处理中所遇到的二进小波变换、滤波参数估计、Kalman滤波发散等问题.语音增强的效果采用信噪比来进行评估.仿真实验表明在加性噪声为高斯白噪声和色噪的情况下,该方法均具有较好的有效性. 相似文献

5.

一种基于仿生小波变换的语音增强方法

下载免费PDF全文

王月屈百达徐保国《计算机工程与应用》2008,44(11):165-167

提出了一种新的基于仿生小波变换的语音增强方法。该方法通过对仿生小波变换系数进行阈值处理,从而达到语音增强的目的。实验结果表明：该方法在四种实际噪声环境下均优于一些经典方法如：谱减法、维纳滤波和基于离散小波变换的阈值去噪方法,具有更好的语音增强效果。相似文献

6.

基于小波包与自适应维纳滤波的语音增强算法

董胡徐雨明马振中李列文任可《计算机技术与发展》2020,(1):50-53

语音增强主要用来提高受噪声污染的语音可懂度和语音质量,它的主要应用与在嘈杂环境中提高移动通信质量有关。传统的语音增强方法有谱减法、维纳滤波、小波系数法等。针对复杂噪声环境下传统语音增强算法增强后的语音质量不佳且存在音乐噪声的问题,提出了一种结合小波包变换和自适应维纳滤波的语音增强算法。分析小波包多分辨率在信号频谱划分中的作用,通过小波包对含噪信号作多尺度分解,对不同尺度的小波包系数进行自适应维纳滤波,使用滤波后的小波包系数重构进而获取增强的语音信号。仿真实验结果表明,与传统增强算法相比,该算法在低信噪比的非平稳噪声环境下不仅可以更有效地提高含噪语音的信噪比,而且能较好地保存语音的谱特征,提高了含噪语音的质量。相似文献

7.

Speech enhancement using Teager energy operated ERB-like perceptual wavelet packet decomposition

Anirban Bhowmick Mahesh Chandra Astik Biswas 《International Journal of Speech Technology》2017,20(4):813-827

In recent past, wavelet packet (WP) based speech enhancement techniques have been gaining popularity due to their inherent nature of noise minimization. WP based techniques appeared as more robust and efficient than short-time Fourier transform based methods. In the present work, a speech enhancement method using Teager energy operated equal rectangular bandwidth (ERB)-like WP decomposition has been proposed. Twenty four sub-band perceptual wavelet packet decomposition (PWPD) structure is implemented according to the auditory ERB scale. ERB scale based decomposition structure is used because the central frequency of the ERB scale distribution is similar to the frequency response of the human cochlea. Teager energy operator is applied to estimate the threshold value for the PWPD coefficients. Lastly, Wiener filtering is applied to remove the low frequency noise before final reconstruction stage. The proposed method has been applied to evaluate the Hindi sentences database, corrupted with six noise conditions. The proposed method’s performance is analysed with respect to several speech quality parameters and output signal to noise ratio levels. Performance indicates that the proposed technique outperforms some traditional speech enhancement algorithms at all SNR levels. 相似文献

8.

A Generalized Time–Frequency Subtraction Method for Robust Speech Enhancement Based on Wavelet Filter Banks Modeling of Human Auditory System

Yu Shao Chip-Hong Chang 《IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics》2007,37(4):877-889

We present a new speech enhancement scheme for a single-microphone system to meet the demand for quality noise reduction algorithms capable of operating at a very low signal-to-noise ratio. A psychoacoustic model is incorporated into the generalized perceptual wavelet denoising method to reduce the residual noise and improve the intelligibility of speech. The proposed method is a generalized time-frequency subtraction algorithm, which advantageously exploits the wavelet multirate signal representation to preserve the critical transient information. Simultaneous masking and temporal masking of the human auditory system are modeled by the perceptual wavelet packet transform via the frequency and temporal localization of speech components. The wavelet coefficients are used to calculate the Bark spreading energy and temporal spreading energy, from which a time-frequency masking threshold is deduced to adaptively adjust the subtraction parameters of the proposed method. An unvoiced speech enhancement algorithm is also integrated into the system to improve the intelligibility of speech. Through rigorous objective and subjective evaluations, it is shown that the proposed speech enhancement system is capable of reducing noise with little speech degradation in adverse noise environments and the overall performance is superior to several competitive methods. 相似文献

9.

A generalized time-frequency subtraction method for robust speech enhancement based on wavelet filter banks modeling of human auditory system. 总被引：2，自引：0，他引：2

Yu Shao Chip-Hong Chang 《IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics》2007,37(4):877-889

We present a new speech enhancement scheme for a single-microphone system to meet the demand for quality noise reduction algorithms capable of operating at a very low signal-to-noise ratio. A psychoacoustic model is incorporated into the generalized perceptual wavelet denoising method to reduce the residual noise and improve the intelligibility of speech. The proposed method is a generalized time-frequency subtraction algorithm, which advantageously exploits the wavelet multirate signal representation to preserve the critical transient information. Simultaneous masking and temporal masking of the human auditory system are modeled by the perceptual wavelet packet transform via the frequency and temporal localization of speech components. The wavelet coefficients are used to calculate the Bark spreading energy and temporal spreading energy, from which a time-frequency masking threshold is deduced to adaptively adjust the subtraction parameters of the proposed method. An unvoiced speech enhancement algorithm is also integrated into the system to improve the intelligibility of speech. Through rigorous objective and subjective evaluations, it is shown that the proposed speech enhancement system is capable of reducing noise with little speech degradation in adverse noise environments and the overall performance is superior to several competitive methods. 相似文献

10.

基于子带谱熵的仿生小波语音增强

刘艳倪万顺《计算机应用》2015,35(3):868-871

前端噪声处理直接关系着语音识别的准确性和稳定性,针对小波去噪算法所分离出的信号不是原始信号的最佳估计,提出一种基于子带谱熵的仿生小波变换(BWT)去噪算法。充分利用子带谱熵端点检测的精确性,区分含噪语音部分和噪声部分,实时更新仿生小波变换中的阈值,精确地区分出噪声信号小波系数,达到语音增强目的。实验结果表明,提出的基于子带谱熵的仿生小波语音增强方法与维纳滤波方法相比,信噪比(SNR)平均提高约8%,所提方法对噪声环境下语音信号有显著的增强效果。相似文献

11.

Multiple statistical models for soft decision in noisy speech enhancement

Joon-Hyuk Chang Author Vitae Saeed Gazor Author Vitae Author Vitae Sanjit K. Mitra Author Vitae 《Pattern recognition》2007,40(3):1123-1134

Most speech enhancement algorithms are based on the assumption that speech and noise are both Gaussian in the discrete cosine transform (DCT) domain. For further enhancement of noisy speech in the DCT domain, we consider multiple statistical distributions (i.e., Gaussian, Laplacian and Gamma) as a set of candidates to model the noise and speech. We first use the goodness-of-fit (GOF) test in order to measure how far the assumed model deviate from the actual distribution for each DCT component of noisy speech. Our evaluations illustrate that the best candidate is assigned to each frequency bin depending on the Signal-to-Noise-Ratio (SNR) and the Power Spectral Flatness Measure (PSFM). In particular, since the PSFM exhibits a strong relation with the best statistical fit we employ a simple recursive estimation of the PSFM in the model selection. The proposed speech enhancement algorithm employs a soft estimate of the speech absence probability (SAP) separately for each frequency bin according to the selected distribution. Both objective and subjective tests are performed for the evaluation of the proposed algorithms on a large speech database, for various SNR values and types of background noise. Our evaluations show that the proposed soft decision scheme based on multiple statistical modeling or the PSFM provides further speech quality enhancement compared with recent methods through a number of subjective and objective tests. 相似文献

12.

盲源分离在单通道语音增强算法中的应用

马建芬李鸿燕张雪英王华奎《计算机应用》2006,26(11):2694-2695

提出一种单通道语音增强算法。首先由接收到的单声道语音信号的含噪部分构造一个假想噪声源,将这一噪声源和含噪的信号作为多通道自适应去相关（MAD）盲分离算法的输入,得到增强的语音信号。进一步将这一增强的语音作为输入,利用Daubechies小波对其进行分解,在小波域中选取合适的阈值函数进行滤波,然后合成时域语音信号。根据以上步骤得到的增强语音有较高的信噪比及可懂度。相似文献

13.

基于先验信噪比估计和增益平滑的语音增强

安扣成《计算机应用》2012,32(Z1):29-31,35

针对语音增强算法残留“音乐噪声”的问题,分析了基于先验信噪比估计的语音增强算法,并在此基础上提出自适应先验信噪比估计与增益平滑相结合的方法.这种方法先对先验信嗓比进行估计,然后对增益函数进行平滑,减小相邻增益函数的随机跳变,弥补了传统先验信噪比估计的不足.最后对含高斯白噪声的语音信号进行处理,仿真结果表明,该算法在抑制“音乐噪声”的效果上得到一定改善,提高了语音增强的性能. 相似文献

14.

Wavelet based speech presence probability estimator for speech enhancement

Daniel Pak-Kong Lun Tak-Wai Shen Tai-Chiu Hsung Dominic K.C. Ho 《Digital Signal Processing》2012,22(6):1161-1173

A reliable speech presence probability (SPP) estimator is important to many frequency domain speech enhancement algorithms. It is known that a good estimate of SPP can be obtained by having a smooth a-posteriori signal to noise ratio (SNR) function, which can be achieved by reducing the noise variance when estimating the speech power spectrum. Recently, the wavelet denoising with multitaper spectrum (MTS) estimation technique was suggested for such purpose. However, traditional approaches directly make use of the wavelet shrinkage denoiser which has not been fully optimized for denoising the MTS of noisy speech signals. In this paper, we firstly propose a two-stage wavelet denoising algorithm for estimating the speech power spectrum. First, we apply the wavelet transform to the periodogram of a noisy speech signal. Using the resulting wavelet coefficients, an oracle is developed to indicate the approximate locations of the noise floor in the periodogram. Second, we make use of the oracle developed in stage 1 to selectively remove the wavelet coefficients of the noise floor in the log MTS of the noisy speech. The wavelet coefficients that remained are then used to reconstruct a denoised MTS and in turn generate a smooth a-posteriori SNR function. To adapt to the enhanced a-posteriori SNR function, we further propose a new method to estimate the generalized likelihood ratio (GLR), which is an essential parameter for SPP estimation. Simulation results show that the new SPP estimator outperforms the traditional approaches and enables an improvement in both the quality and intelligibility of the enhanced speeches. 相似文献

15.

A wavelet- based transform method for quality improvement in noisy speech patterns of Arabic language

Sachin Singh A. M. Mutawa 《International Journal of Speech Technology》2016,19(4):677-685

This paper addresses the problem of single-channel speech enhancement of low (negative) SNR of Arabic noisy speech signals. For this aim, a binary mask thresholding function based coiflet5 mother wavelet transform is proposed for Arabic speech enhancement. The effectiveness of binary mask thresholding function based coiflet5 mother wavelet transform is compared with Wiener method, spectral subtraction, log-MMSE, test-PSC and p-mmse in presence of babble, pink, white, f-16 and Volvo car interior noise. The noisy input speech signals are processed at various levels of input SNR range from ?5 to ?25 dB. Performance of the proposed method is evaluated with the help of PESQ, SNR and cepstral distance measure. The results obtained by proposed binary mask thresholding function based coiflet5 wavelet transform method are very encouraging and shows that the proposed method is much helpful in Arabic speech enhancement than other existing methods. 相似文献

16.

A novel fast nonstationary noise tracking approach based on MMSE spectral power estimator

《Digital Signal Processing》2019

Estimating the noise power spectral density (PSD) from the corrupted speech signal is an essential component for speech enhancement algorithms. In this paper, a novel noise PSD estimation algorithm based on minimum mean-square error (MMSE) is proposed. The noise PSD estimate is obtained by recursively smoothing the MMSE estimation of the current noise spectral power. For the noise spectral power estimation, a spectral weighting function is derived, which depends on the a priori signal-to-noise ratio (SNR). Since the speech spectral power is highly important for the a priori SNR estimate, this paper proposes an MMSE spectral power estimator incorporating speech presence uncertainty (SPU) for speech spectral power estimate to improve the a priori SNR estimate. Moreover, a bias correction factor is derived for speech spectral power estimation bias. Then, the estimated speech spectral power is used in “decision-directed” (DD) estimator of the a priori SNR to achieve fast noise tracking. Compared to three state-of-the-art approaches, i.e., minimum statistics (MS), MMSE-based approach, and speech presence probability (SPP)-based approach, it is clear from experimental results that the proposed algorithm exhibits more excellent noise tracking capability under various nonstationary noise environments and SNR conditions. When employed in a speech enhancement system, improved speech enhancement performances in terms of segmental SNR improvements (SSNR+) and perceptual evaluation of speech quality (PESQ) can be observed. 相似文献

17.

一种时间自适应阈值的小波包改进语音增强算法

田岚侯正信孙晋松《控制与决策》2009,24(6)

针对传统小波语音增强算法存在过度周值处理的问题,提出一种改进的时间自适应阈值小波包去噪算法.该方法采用听觉感知小波包对噪声语音进行分解,得到小波包听觉感知节点上的系数,并基于语音存在概率估计按帧自动调节去噪周值,因改进的闲值能更好地避免语音小波包系数被过度阈值处理的情况,从而在抑制噪声的同时保留了更多的原始语音成分,进一步提高了降噪效果,实验结果表明,该算法比常规小波自适应闻值算法能得到更清晰的语音增强信号. 相似文献

18.

Simultaneous Detection and Estimation Approach for Speech Enhancement

《IEEE transactions on audio, speech, and language processing》2007,15(8):2348-2359

In this paper, we present a simultaneous detection and estimation approach for speech enhancement. A detector for speech presence in the short-time Fourier transform domain is combined with an estimator, which jointly minimizes a cost function that takes into account both detection and estimation errors. Cost parameters control the tradeoff between speech distortion, caused by missed detection of speech components and residual musical noise resulting from false-detection. Furthermore, a modified decision-directed a priori signal-to-noise ratio (SNR) estimation is proposed for transient-noise environments. Experimental results demonstrate the advantage of using the proposed simultaneous detection and estimation approach with the proposed a priori SNR estimator, which facilitate suppression of transient noise with a controlled level of speech distortion. 相似文献

19.

Spectral difference for statistical model-based speech enhancement in speech recognition

Soojeong Lee Joon-Hyuk Chang 《Multimedia Tools and Applications》2017,76(23):24917-24929

In this paper, we propose a statistical model-based speech enhancement technique using the spectral difference scheme for the speech recognition in virtual reality. In the analyzing step, two principal parameters, the weighting parameter in the decision-directed (DD) method and the long-term smoothing parameter in noise estimation, are uniquely determined as optimal operating points according to the spectral difference under various noise conditions. These optimal operating points, which are specific according to different spectral differences, are estimated based on the composite measure, which is a relevant criterion in terms of speech quality. An efficient mapping function is also presented to provide an index of the metric table associated with the spectral difference so that operating points can be determined according to various noise conditions for an on-line step. In the on-line speech enhancement step, different parameters are chosen on a frame-by-frame basis under the metric table of the spectral difference. The performance of the proposed method is evaluated using objective and subjective speech quality measures in various noise environments. Our experimental results show that the proposed algorithm yields better performances than conventional algorithms. 相似文献

20.

Speech enhancement using a masking threshold constrained Kalman filter and its heuristic implementations

Ning Ma Bouchard M. Goubran R.A. 《IEEE transactions on audio, speech, and language processing》2006,14(1):19-32

A masking threshold constrained Kalman filter for speech enhancement is derived in the paper. A key step in a traditional Kalman filter requires minimizing an estimation error variance between a clean signal and its estimation. Our new method is to minimize the estimation error variance under the constraint that the energy of the estimation error is smaller than a masking threshold, computed from both time-domain forward masking and frequency-domain simultaneous masking properties of human auditory systems. The new Kalman filter provides a theoretical base for the application of the masking properties in Kalman filtering for speech enhancement. Due to the high computation cost of the proposed perceptually constrained Kalman filter, a perceptual post-filter concatenated with a standard Kalman filter is also proposed as a heuristic alternative for real-time implementation. The post-filter is constructed to make the estimation error obtained from the Kalman filter lower than the masking threshold. A wavelet Kalman filter with post-filtering is introduced to further reduce the computational load. Experimental results with colored noise show that the new constrained Kalman filter method produces the best performance when compared with other recent methods, and that the proposed heuristics with post-filtering can also produce a significant performance gain over other recent methods. 相似文献