首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
This paper discusses the problem of automatic word boundary detection in the presence of variable-level background noise. Commonly used robust word boundary detection algorithms always assume that the background noise level is fixed. In fact, the background noise level may vary during the procedure of recording. This is the major reason that most robust word boundary detection algorithms cannot work well in the condition of variable background noise level. In order to solve this problem, we first propose a refined time-frequency (RTF) parameter for extracting both the time and frequency features of noisy speech signals. The RTF parameter extends the (time-frequency) TF parameter proposed by Junqua et al. from single band to multiband spectrum analysis, where the frequency bands help to make the distinction between speech signal and noise clear. The RTF parameter can extract useful frequency information. Based on this RTF parameter, we further propose a new word boundary detection algorithm by using a recurrent self-organizing neural fuzzy inference network (RSONFIN). Since RSONPIN can process the temporal relations, the proposed RTF-based RSONFIN algorithm can find the variation of the background noise level and detect correct word boundaries in the condition of variable background noise level. As compared to normal neural networks, the RSONFIN can always find itself an economic network size with high-learning speed. Due to the self-learning ability of RSONFIN, this RTF-based RSONFIN algorithm avoids the need for empirically determining ambiguous decision rules in normal word boundary detection algorithms. Experimental results show that this new algorithm achieves higher recognition rate than the TF-based algorithm which has been shown to outperform several commonly used word boundary detection algorithms by about 12% in variable background noise level condition, It also reduces the recognition error rate due to endpoint detection to about 23%, compared to an average of 47% obtained by the TF-based algorithm in the same condition.  相似文献   

2.
基于语音存在概率和听觉掩蔽特性的语音增强算法   总被引:1,自引:0,他引:1  
宫云梅  赵晓群  史仍辉 《计算机应用》2008,28(11):2981-2983
低信噪比下,谱减语音增强法中一直存在的去噪度、残留的音乐噪声和语音畸变度三者间均衡这一关键问题显得尤为突出。为降低噪声对语音通信的干扰,提出了一种适于低信噪比下的语音增强算法。在传统的谱减法基础上,根据噪声的听觉掩蔽阈值自适应调整减参数,利用语音存在概率,对语音、噪声信号估计,避免低信噪比下端点检测(VAD)的不准确,有更强的鲁棒性。对算法进行了客观和主观测试,结果表明:相对于传统的谱减法,在几乎不损伤语音清晰度的前提下该算法能更好地抑制残留噪声和背景噪声,特别是对低信噪比和非平稳噪声干扰的语音信号,效果更加明显。  相似文献   

3.
This paper presents a new approach to speech enhancement from single-channel measurements involving both noise and channel distortion (i.e., convolutional noise), and demonstrates its applications for robust speech recognition and for improving noisy speech quality. The approach is based on finding longest matching segments (LMS) from a corpus of clean, wideband speech. The approach adds three novel developments to our previous LMS research. First, we address the problem of channel distortion as well as additive noise. Second, we present an improved method for modeling noise for speech estimation. Third, we present an iterative algorithm which updates the noise and channel estimates of the corpus data model. In experiments using speech recognition as a test with the Aurora 4 database, the use of our enhancement approach as a preprocessor for feature extraction significantly improved the performance of a baseline recognition system. In another comparison against conventional enhancement algorithms, both the PESQ and the segmental SNR ratings of the LMS algorithm were superior to the other methods for noisy speech enhancement.  相似文献   

4.
Traditional single-channel subspace-based schemes for speech enhancement rely mostly on linear minimum mean-square error estimators, which are globally optimal only if the Karhunen-Loeacuteve transform (KLT) coefficients of the noise and speech processes are Gaussian distributed. We derive in this paper subspace-based nonlinear estimators assuming that the speech KLT coefficients are distributed according to a generalized super-Gaussian distribution which has as special cases the Laplacian and the two-sided Gamma distribution. As with the traditional linear estimators, the derived estimators are functions of the a priori signal-to-noise ratio (SNR) in the subspaces spanned by the KLT transform vectors. We propose a scheme for estimating these a priori SNRs, which is in fact a generalization of the "decision-directed" approach which is well-known from short-time Fourier transform (STFT)-based enhancement schemes. We show that the proposed a priori SNR estimation scheme leads to a significant reduction of the residual noise level, a conclusion which is confirmed in extensive objective speech quality evaluations as well as subjective tests. We also show that the derived estimators based on the super-Gaussian KLT coefficient distribution lead to improvements for different noise sources and levels as compared to when a Gaussian assumption is imposed  相似文献   

5.
In this paper, we propose a speech enhancement method where the front-end decomposition of the input speech is performed by temporally processing using a filterbank. The proposed method incorporates a perceptually motivated stationary wavelet packet filterbank (PM-SWPFB) and an improved spectral over-subtraction (I-SOS) algorithm for the enhancement of speech in various noise environments. The stationary wavelet packet transform (SWPT) is a shift invariant transform. The PM-SWPFB is obtained by selecting the stationary wavelet packet tree in such a manner that it matches closely the non-linear resolution of the critical band structure of the psychoacoustic model. After the decomposition of the input speech, the I-SOS algorithm is applied in each subband, separately for the estimation of speech. The I-SOS uses a continuous noise estimation approach and estimate noise power from each subband without the need of explicit speech silence detection. The subband noise power is estimated and updated by adaptively smoothing the noisy signal power. The smoothing parameter in each subband is controlled by a function of the estimated signal-to-noise ratio (SNR). The performance of the proposed speech enhancement method is tested on speech signals degraded by various real-world noises. Using objective speech quality measures (SNR, segmental SNR (SegSNR), perceptual evaluation of speech quality (PESQ) score), and spectrograms with informal listening tests, we show that the proposed speech enhancement method outperforms than the spectral subtractive-type algorithms and improves quality and intelligibility of the enhanced speech.  相似文献   

6.
Noise reduction for speech enhancement is a useful technique, but in general it is a challenging problem. While a single-channel algorithm is easy to use in practice, it inevitably introduces speech distortion to the desired speech signal while reducing noise. Today, the explosive growth in computational power and the continuous drop in the cost and size of acoustic electric transducers are driving the interest of employing multiple microphones in speech processing systems. This opens new opportunities for noise reduction. In this paper, we present an analysis of three multichannel noise reduction algorithms, namely Wiener filter, subspace, and spatial-temporal prediction, in a common framework. We intend to investigate whether it is possible for the multichannel noise reduction algorithms to reduce noise without speech distortion. Finally, we justify what we learn via theoretical analyses by simulations using real impulse responses measured in the varechoic chamber at Bell Labs.  相似文献   

7.
基于鲁棒H滤波器理论和共轭梯度自适应参数估计方法提出了一种对复杂噪声有抑制效果的语音增强算法。应用这种方法自适应地从带噪信号中提取语音参数时不必预先知道噪声源的统计特性,只要求噪声信号能量有限。因为它基于H滤波器,所以可保证由外界干扰和附加噪声引起的性能指标恶化达到最小。仿真结果表明:该语音增强算法具有计算速度快、鲁棒性好、语音增强效果明显、易于实现、可抑制复杂背景噪声等特点。  相似文献   

8.
We consider the enhancement of speech corrupted by additive white Gaussian noise. In a Bayesian inference framework, maximum a posteriori (MAP) estimation of the signal is performed, along the lines developed by Lim & Oppenheim (1978). The speech enhancement problem is treated as a signal estimation problem, whose aim is to obtain a MAP estimate of the clean speech signal, given the noisy observations. The novelty of our approach, over previously reported work, is that we relate the variance of the additive noise and the gain of the autoregressive (AR) process to hyperparameters in a hierarchical Bayesian framework. These hyperparameters are computed from the noisy speech data to maximize the denominator in Bayes formula, also known as the evidence. The resulting Bayesian scheme is capable of performing speech enhancement from the noisy data without the need for silence detection. Experimental results are presented for stationary and slowly varying additive white Gaussian noise. The Bayesian scheme is also compared to the Lim and Oppenheim system, and the spectral subtraction method.  相似文献   

9.
针对传统单通道语音增强方法中用带噪语音相位代替纯净语音相位重建时域信号,使得语音主观感知质量改善受限的情况,提出了一种改进相位谱补偿的语音增强算法。该算法提出了基于每帧语音输入信噪比的Sigmoid型相位谱补偿函数,能够根据噪声的变化来灵活地对带噪语音的相位谱进行补偿;结合改进DD的先验信噪比估计与语音存在概率算法(SPP)来估计噪声功率谱;在维纳滤波中结合新的语音存在概率噪声功率谱估计与相位谱补偿来提高语音的增强效果。相比传统相位谱补偿(PSC)算法而言,改进算法可以有效抑制音频信号中的各类噪声,同时增强语音信号感知质量,提升语音的可懂度。  相似文献   

10.
针对MMSE方法语音失真较大的缺点,提出一种将噪声被掩蔽概率引入高分辨率掩蔽感知模型的方法,通过初始噪声序列进行噪声谱的更新,然后计算噪声掩蔽参数,适时更新数据参数以动态确定每一帧的权值,实验结果表明,该方法在有效抑制背景噪声的同时还降低了音乐噪声,在语音降噪方面实现了比MMSE方法更好的增强效果.  相似文献   

11.
We propose a noise estimation algorithm for single-channel noise suppression in dynamic noisy environments. A stochastic-gain hidden Markov model (SG-HMM) is used to model the statistics of nonstationary noise with time-varying energy. The noise model is adaptive and the model parameters are estimated online from noisy observations using a recursive estimation algorithm. The parameter estimation is derived for the maximum-likelihood criterion and the algorithm is based on the recursive expectation maximization (EM) framework. The proposed method facilitates continuous adaptation to changes of both noise spectral shapes and noise energy levels, e.g., due to movement of the noise source. Using the estimated noise model, we also develop an estimator of the noise power spectral density (PSD) based on recursive averaging of estimated noise sample spectra. We demonstrate that the proposed scheme achieves more accurate estimates of the noise model and noise PSD, and as part of a speech enhancement system facilitates a lower level of residual noise.  相似文献   

12.
This paper addresses the problem of extracting a desired speech source from a multispeaker environment in the presence of background noise. A new adaptive beamforming structure is proposed for this speech enhancement problem. This structure incorporates power spectral density (PSD) estimation of the speech sources together with a noise statistics update. An inactive-source detector based on minimum statistics is developed to detect the speech presence and to track the noise statistics. Performance of the proposed beamformer is investigated and compared to the minimum variance distortionless response (MVDR) beamformer with or without a postfilter in a real hands-free communication environment. Evaluations show that the proposed beamformer offers good interference and noise suppression levels while maintaining low distortion of the desired source.   相似文献   

13.
李艳生  刘园  张毅 《计算机应用》2019,39(3):894-898
针对非负矩阵分解(NMF)语音增强算法在低信噪比(SNR)非稳定环境下存在噪声残留的问题,提出一种基于感知掩蔽的重构NMF(PM-RNMF)单通道语音增强算法。首先,将心理声学掩蔽特性应用于NMF语音增强算法中;其次,对不同频率位采用不同的掩蔽阈值,建立自适应感知掩蔽增益函数,通过阈值约束残余噪声能量和语音失真能量;最后,结合语音存在概率(SPP)进行感知增益修正,重构NMF算法,以此建立新的目标函数。仿真结果表明,在不同SNR的3种非稳定噪声环境下,与NMF、重构NMF(RNMF)、感知掩蔽深度神经网络(PM-DNN)算法相比,PM-RNMF算法的感知语音质量评估(PESQ)平均值分别提高了0.767、0.474、0.162,信源失真比(SDR)平均值分别提高了2.785、1.197、0.948。实验结果表明,无论是在低频还是高频PM-RNMF有更好的降噪效果。  相似文献   

14.
A new robust microphone array method to enhance speech signals generated by a moving person in a noisy environment is presented. This blind approach is based on a two-stage scheme. First, a subband time-delay estimation method is used to localize the dominant speech source. The second stage involves speech enhancement, based on the acquired spatial information, by means of a soft-constrained subband beamformer. The novelty of the proposed method involves considering the spatial spreading of the sound source as equivalent to a time-delay spreading, thus, allowing for the estimated intersensor time-delays to be directly used in the beamforming operations. In comparison to previous approaches, this new method requires no special array geometry, knowledge of the array manifold, or acquisition of calibration data to adapt the array weights. Furthermore, such a scheme allows for the beamformer to efficiently adapt to speaker movement. The robustness of the time-delay estimation of speech signals in high noise levels is improved by making use of the non-Gaussian nature of speech trough a subband Kurtosis-weighted structure. Evaluation in a real environment with a moving speaker shows promising results, with suppression levels of up to 16 dB for background noise and interfering (speech) signals, associated to a relatively small effect of speech distortion.  相似文献   

15.
This paper addresses the problem of speech enhancement and acoustic noise reduction by adaptive filtering algorithms. Recently, we have proposed a new Forward blind source separation algorithm that enhances very noisy speech signals with a subband approach. In this paper, we propose a new variable subband step-sizes algorithm that allows improving the previous algorithm behaviour when the number of subband is selected high. This new proposed algorithm is based on recursive formulas to compute the new variable step-sizes of the cross-coupling filters by using the decorrelation criterion between the estimated sub-signals at each subband output. This new algorithm has shown an important improvement in the steady state and the mean square error values. Along this paper, we present the obtained simulation results by the proposed algorithm that confirm its superiority in comparison with its original version that employs fixed step-sizes of the cross-coupling adaptive filters and with another fullband algorithm.  相似文献   

16.
频域语音增强算法在高信噪比的条件下有明显的降噪效果,而在低信噪比条件下频域语音增强算法的性能会大幅下降。针对这个问题,将基于声纹的掩码应用到频域语音增强网络,利用声纹的先验信息,提升网络对说话人和噪声的区分度。另外,为了进一步改善频域语音算法在低信噪比条件下的性能,提出基于映射的声纹嵌入语音增强算法,避免了可能因采用掩模方案造成的语音失真问题。实验结果表明,在引入相同声纹信息时,基于映射的声纹嵌入语音增强网络在低信噪比条件下的增强性能表现更好,特别是在改善语音失真方面优势明显。相较于基于掩模的声纹掩码网络,基于映射的声纹嵌入网络在PESQ、STOI和SSNR这三项指标上分别实现了6.40%、1.46%和24.84%的相对提升。  相似文献   

17.
This paper presents a new approach to speech enhancement based on modified least mean square-multi notch adaptive digital filter (MNADF). This approach differs from traditional speech enhancement methods since no a priori knowledge of the noise source statistics is required. Specifically, the proposed method is applied to the case where speech quality and intelligibility deteriorates in the presence of background noise. Speech coders and automatic speech recognition systems are designed to act on clean speech signals. Therefore, corrupted speech signals by the noise must be enhanced before their processing. The proposed method uses a primary input containing the corrupted speech signal and a reference input containing noise only. The new computationally efficient algorithm is developed here based on tracking significant frequencies of the noise and implementing MNADF at those frequencies. To track frequencies of the noise time-frequency analysis method such as short time frequency transform is used. Different types of noises from Noisex-92 database are used to degrade real speech signals. Objective measures, the study of the speech spectrograms and global signal-to-noise ratio (SNR), segmental SNR (segSNR) as well as subjective listing test demonstrate consistently superior enhancement performance of the proposed method over tradition speech enhancement method such as spectral subtraction.  相似文献   

18.
Numerous efforts have focused on the problem of reducing the impact of noise on the performance of various speech systems such as speech coding, speech recognition and speaker recognition. These approaches consider alternative speech features, improved speech modeling, or alternative training for acoustic speech models. In this paper, we propose a new speech enhancement technique, which integrates a new proposed wavelet transform which we call stationary bionic wavelet transform (SBWT) and the maximum a posterior estimator of magnitude-squared spectrum (MSS-MAP). The SBWT is introduced in order to solve the problem of the perfect reconstruction associated with the bionic wavelet transform. The MSS-MAP estimation was used for estimation of speech in the SBWT domain. The experiments were conducted for various noise types and different speech signals. The results of the proposed technique were compared with those of other popular methods such as Wiener filtering and MSS-MAP estimation in frequency domain. To test the performance of the proposed speech enhancement system, four objective quality measurement tests [signal to noise ratio (SNR), segmental SNR, Itakura–Saito distance and perceptual evaluation of speech quality] were conducted for various noise types and SNRs. Experimental results and objective quality measurement test results proved the performance of the proposed speech enhancement technique. It provided sufficient noise reduction and good intelligibility and perceptual quality, without causing considerable signal distortion and musical background noise.  相似文献   

19.
由于噪声的影响导致语音信号的质量降低,因此需要对语音信号进行语音增强。语音增强是语音信号处理的前沿领域,其主要目标足从带噪语音中提取纯净的原始语音信号。介绍了实现语音增强方法的原理,利用实验仿真了传统谱减法和改进谱减方法,改进法通过对带噪信号进行参数调整,然后进行频域谱减,实验结果表明改进方法对语音增强效果明显好于传统方法。此外,对传统谱减法和改进谱减法的信噪比分别进行了计算,结果表明改进谱减方法的信噪比相对传统谱减方法有很大提高。  相似文献   

20.
In this paper, we propose a statistical model-based speech enhancement technique using a multivariate polynomial regression (MPR) based on spectral difference scheme. In the analyzing step, three principal parameters, the weighting parameter in the decision-directed (DD) method, the long-term smoothing parameter for the noise estimation, and the control parameter of the minimum gain value are estimated as optimal operating points technique by using to the spectral difference under various noise conditions. These optimal operating points, which are specific according to different spectral differences, are estimated based on the composite measure, which is a relevant criterion in terms of speech quality. Thus, we apply the MPR technique by incorporating the spectral differences as independent variables in order to estimate the optimal operating points. The MPR technique offers an effective scheme to represent complex nonlinear input-output relationship between the optimal operating points and spectral differences so that operating points can be determined according to various noise conditions in the off-line step. In the on-line speech enhancement step, different parameters are chosen on a frame-by-frame basis through the regression according to the spectral difference. The performance of the proposed method is evaluated using objective and subjective speech quality measures in various noise environments. Our experimental results show that the proposed algorithm yields better performances than conventional algorithms.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号