首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In this paper, we propose a statistical model-based speech enhancement technique using the spectral difference scheme for the speech recognition in virtual reality. In the analyzing step, two principal parameters, the weighting parameter in the decision-directed (DD) method and the long-term smoothing parameter in noise estimation, are uniquely determined as optimal operating points according to the spectral difference under various noise conditions. These optimal operating points, which are specific according to different spectral differences, are estimated based on the composite measure, which is a relevant criterion in terms of speech quality. An efficient mapping function is also presented to provide an index of the metric table associated with the spectral difference so that operating points can be determined according to various noise conditions for an on-line step. In the on-line speech enhancement step, different parameters are chosen on a frame-by-frame basis under the metric table of the spectral difference. The performance of the proposed method is evaluated using objective and subjective speech quality measures in various noise environments. Our experimental results show that the proposed algorithm yields better performances than conventional algorithms.  相似文献   

2.
We propose a new approach to estimate the a priori signal-to-noise ratio (SNR) based on a multiple linear regression (MLR) technique. In contrast to estimation of the a priori SNR employing the decision-directed (DD) method, which uses the estimated speech spectrum in previous frame, we propose to find the a priori SNR based on the MLR technique by incorporating regression parameters such as the ratio between the local energy of the noisy speech and its derived minimum along with the a posteriori SNR. In the experimental step, regression coefficients obtained using the MLR are assigned according to various noise types, for which we employ a real-time noise classification scheme based on a Gaussian mixture model (GMM). Evaluations using both objective speech quality measures and subjective listening tests under various ambient noise environments show that the performance of the proposed algorithm is better than that of the conventional methods.  相似文献   

3.
This paper proposes an improved semi-fragile speech watermarking scheme by quantization of linear prediction (LP) parameters, i.e., the inverse sine (IS) parameters. The spectral distortion due to watermark embedding is controlled to meet the ‘transparency’ criterion in speech coding. A modified bit allocation algorithm combined with watermarking is developed to determine the quantization step so that the ‘transparency’ requirement is satisfied. Due to the statistical nature, the LP coefficients estimated from the watermarked speech signal are different from the watermarked LP coefficients even in the absence of attacks. This effect is the cause of increase in decoding error and minimum authentication length. To tackle this problem, an Analysis by Synthesis (AbS) scheme is developed to reduce the difference between the estimated LP coefficients and the watermarked ones. The watermark detection threshold and minimum authentication length are then derived according to the probability of error and the signal to noise ratio (SNR) requirements. Experimental results show that the proposed AbS based method can effectively reduce the difference between the watermarked IS parameter and the extracted IS parameter when there is no attacks. In addition, the modified bit allocation algorithm can automatically find the appropriate quantization step used in the odd-even modulation so that the transparency requirement is satisfied.  相似文献   

4.
We propose a noise estimation algorithm for single-channel noise suppression in dynamic noisy environments. A stochastic-gain hidden Markov model (SG-HMM) is used to model the statistics of nonstationary noise with time-varying energy. The noise model is adaptive and the model parameters are estimated online from noisy observations using a recursive estimation algorithm. The parameter estimation is derived for the maximum-likelihood criterion and the algorithm is based on the recursive expectation maximization (EM) framework. The proposed method facilitates continuous adaptation to changes of both noise spectral shapes and noise energy levels, e.g., due to movement of the noise source. Using the estimated noise model, we also develop an estimator of the noise power spectral density (PSD) based on recursive averaging of estimated noise sample spectra. We demonstrate that the proposed scheme achieves more accurate estimates of the noise model and noise PSD, and as part of a speech enhancement system facilitates a lower level of residual noise.  相似文献   

5.
In this paper, we present a new technique for the estimation of short-term linear predictive parameters of speech and noise from noisy data and their subsequent use in waveform enhancement schemes. The method exploits a priori information about speech and noise spectral shapes stored in trained codebooks, parameterized as linear predictive coefficients. The method also uses information about noise statistics estimated from the noisy observation. Maximum-likelihood estimates of the speech and noise short-term predictor parameters are obtained by searching for the combination of codebook entries that optimizes the likelihood. The estimation involves the computation of the excitation variances of the speech and noise auto-regressive models on a frame-by-frame basis, using the a priori information and the noisy observation. The high computational complexity resulting from a full search of the joint speech and noise codebooks is avoided through an iterative optimization procedure. We introduce a classified noise codebook scheme that uses different noise codebooks for different noise types. Experimental results show that the use of a priori information and the calculation of the instantaneous speech and noise excitation variances on a frame-by-frame basis result in good performance in both stationary and nonstationary noise conditions.  相似文献   

6.
Discusses the problem of single-channel speech enhancement in variable noise-level environment. Commonly used, single-channel subtractive-type speech enhancement algorithms always assume that the background noise level is fixed or slowly varying. In fact, the background noise level may vary quickly. This condition usually results in wrong speech/noise detection and wrong speech enhancement process. In order to solve this problem, we propose a subtractive-type speech enhancement scheme. This new enhancement scheme uses the RTF (refined time-frequency parameter)-based RSONFIN (recurrent self-organizing neural fuzzy inference network) algorithm we developed previously to detect the word boundaries in the condition of variable background noise level. In addition, a new parameter (MiFre) is proposed to estimate the varying background noise level. Based on this parameter, the noise level information used for subtractive-type speech enhancement can be estimated not only during speech pauses, but also during speech segments. This new subtractive-type enhancement scheme has been tested and found to perform well, not only in variable background noise level condition, but also in fixed background noise level condition.  相似文献   

7.
In this paper, we propose a speech enhancement method where the front-end decomposition of the input speech is performed by temporally processing using a filterbank. The proposed method incorporates a perceptually motivated stationary wavelet packet filterbank (PM-SWPFB) and an improved spectral over-subtraction (I-SOS) algorithm for the enhancement of speech in various noise environments. The stationary wavelet packet transform (SWPT) is a shift invariant transform. The PM-SWPFB is obtained by selecting the stationary wavelet packet tree in such a manner that it matches closely the non-linear resolution of the critical band structure of the psychoacoustic model. After the decomposition of the input speech, the I-SOS algorithm is applied in each subband, separately for the estimation of speech. The I-SOS uses a continuous noise estimation approach and estimate noise power from each subband without the need of explicit speech silence detection. The subband noise power is estimated and updated by adaptively smoothing the noisy signal power. The smoothing parameter in each subband is controlled by a function of the estimated signal-to-noise ratio (SNR). The performance of the proposed speech enhancement method is tested on speech signals degraded by various real-world noises. Using objective speech quality measures (SNR, segmental SNR (SegSNR), perceptual evaluation of speech quality (PESQ) score), and spectrograms with informal listening tests, we show that the proposed speech enhancement method outperforms than the spectral subtractive-type algorithms and improves quality and intelligibility of the enhanced speech.  相似文献   

8.
This study proposes an unsupervised noise reduction scheme that improves the performance of voice-based information retrieval tasks in mobile environments. Various types of noises could interfere with speech processing tasks, and noise reduction has become an essential technique in this field. In particular, noise reduction needs to be carefully processed in mobile environments based on the speech coding system and the client-server architecture. In this study, we propose an effective noise reduction scheme that employs the adaptive comb filtering technique. A way of directly using several codec parameters during the filtering process is also investigated. In particular, we modify the conventional comb filter using line spectral pair parameters. To verify the efficiency of the proposed noise reduction approach, we conducted speech recognition experiments using the Aurora2 database. Our approach provided superior recognition performance under various noise conditions compared to the conventional techniques.  相似文献   

9.
A signal subspace scheme based on masking properties is proposed for enhancement of speech degraded by additive noise. Since the masking properties are related to the critical frequency band that is derived from the characteristics of human cochlea, the incorporation of masking threshold into a subspace technique requires the transformation between the frequency and eigen domains. We present and apply an invertible transformation between the frequency and eigen domains. In this paper, we use masking properties of the human auditory system to define the audible noise quantity in the eigendomain. We derive the eigen-decomposition of the estimated speech autocorrelation matrix with the assumption of white noise. Subsequently, an audible noise reduction scheme is developed based on a signal subspace technique, and the implementation of our proposed scheme is outlined. We further extend the scheme to the colored noise case. Simulation results show the superiority of our proposed scheme over other existing subspace methods in terms of segmental signal-to-noise ratio (SNR), perceptual evaluation of speech quality (PESQ), modified Bark spectral distortion (MBSD), spectrogram and informal listening tests.  相似文献   

10.
In this paper, we proposed a new speech enhancement system, which integrates a perceptual filterbank and minimum mean square error–short time spectral amplitude (MMSE–STSA) estimation, modified according to speech presence uncertainty. The perceptual filterbank was designed by adjusting undecimated wavelet packet decomposition (UWPD) tree, according to critical bands of psycho-acoustic model of human auditory system. The MMSE–STSA estimation (modified according to speech presence uncertainty) was used for estimation of speech in undecimated wavelet packet domain. The perceptual filterbank provides a good auditory representation (sufficient frequency resolution), good perceptual quality of speech and low computational load. The MMSE–STSA estimator is based on a priori SNR estimation. A priori SNR estimation, which is a key parameter in MMSE–STSA estimator, was performed by using “decision directed method.” The “decision directed method” provides a trade off between noise reduction and signal distortion when correctly tuned. The experiments were conducted for various noise types. The results of proposed method were compared with those of other popular methods, Wiener estimation and MMSE–log spectral amplitude (MMSE–LSA) estimation in frequency domain. To test the performance of the proposed speech enhancement system, three objective quality measurement tests (SNR, segSNR and Itakura–Saito distance (ISd)) were conducted for various noise types and SNRs. Experimental results and objective quality measurement test results proved the performance of proposed speech enhancement system. The proposed speech enhancement system provided sufficient noise reduction and good intelligibility and perceptual quality, without causing considerable signal distortion and musical background noise.  相似文献   

11.
一种改进的维纳滤波语音增强算法   总被引:1,自引:0,他引:1       下载免费PDF全文
提出了一种改进的语音增强算法,该算法以基于先验信噪比估计的维纳滤波法为基础。首先通过计算无声段的统计平均得到初始噪声功率谱;其次,计算语音段间带噪语音功率谱,并平滑处理初始噪声功率谱和带噪语音功率谱,更新了噪声功率谱;最后,考虑了某频率点处噪声急剧增大的情况,通过计算带噪语音功率谱与噪声功率谱的比值,自适应地调整噪声功率谱。将该算法与其他基于短时谱估计的语音增强算法进行了对比实验,实验结果表明:该算法能有效地减少残留噪声和语音畸变,提高语音可懂度。  相似文献   

12.
针对语音系统受外界强噪声干扰而导致识别精度降低以及通信质量受损的问题,提出一种基于自适应噪声估计的语音增强方法。通过端点检测将语音信号分为语音段与非语音段,对这两种情况的噪声幅度谱分别进行自适应估计,并对谱减法中不具有通用性的假设进行研究从而改进原理公式。实验结果表明,相对于传统谱减法,该方法能更好地抑制音乐噪声,并保持较高清晰度和可懂度,提高了强噪声环境下的语音识别精度和通信质量。  相似文献   

13.
提出一种可适应非平稳噪声环境的基于码本学习的改进谱减语音增强算法。该算法分为训练阶段和增强阶段。训练阶段,使用自回归模型对语音和噪声的频谱形状进行建模并构造语音和噪声码本;增强阶段,采用对数谱最小化算法估计出语音和噪声的频谱,通过谱相减消除噪声。算法在每个时间帧估计语音和噪声频谱,即使在语音存在时仍能够有效跟踪快速变化的非平稳噪声;采用自回归模型能得到噪声频谱的平滑估计,减少了音乐噪声。实验仿真表明,相比于传统谱减法和多带谱减法,改进的谱减法具有更好的噪声抑制性能并且语音失真更小。  相似文献   

14.
Some approximate indexing schemes have been recently proposed in metric spaces which sort the objects in the database according to pseudo-scores. It is known that (1) some of them provide a very good trade-off between response time and accuracy, and (2) probability-based pseudo-scores can provide an optimal trade-off in range queries if the probabilities are correctly estimated. Based on these facts, we propose a probabilistic enhancement scheme which can be applied to any pseudo-score based scheme. Our scheme computes probability-based pseudo-scores using pseudo-scores obtained from a pseudo-score based scheme. In order to estimate the probability-based pseudo-scores, we use the object-specific parameters in logistic regression and learn the parameters using MAP (Maximum a Posteriori) estimation and the empirical Bayes method. We also propose a technique which speeds up learning the parameters using pseudo-scores. We applied our scheme to the two state-of-the-art schemes: the standard pivot-based scheme and the permutation-based scheme, and evaluated them using various kinds of datasets from the Metric Space Library. The results showed that our scheme outperformed the conventional schemes, with regard to both the number of distance computations and the CPU time, in all the datasets.  相似文献   

15.
This paper considers estimation of the noise spectral variance from speech signals contaminated by highly nonstationary noise sources. The method can accurately track fast changes in noise power level (up to about 10 dB/s). In each time frame, for each frequency bin, the noise variance estimate is updated recursively with the minimum mean-square error (mmse) estimate of the current noise power. A time- and frequency-dependent smoothing parameter is used, which is varied according to an estimate of speech presence probability. In this way, the amount of speech power leaking into the noise estimates is kept low. For the estimation of the noise power, a spectral gain function is used, which is found by an iterative data-driven training method. The proposed noise tracking method is tested on various stationary and nonstationary noise sources, for a wide range of signal-to-noise ratios, and compared with two state-of-the-art methods. When used in a speech enhancement system, improvements in segmental signal-to-noise ratio of more than 1 dB can be obtained for the most nonstationary noise sources at high noise levels.  相似文献   

16.
Estimating the noise power spectral density (PSD) from the corrupted speech signal is an essential component for speech enhancement algorithms. In this paper, a novel noise PSD estimation algorithm based on minimum mean-square error (MMSE) is proposed. The noise PSD estimate is obtained by recursively smoothing the MMSE estimation of the current noise spectral power. For the noise spectral power estimation, a spectral weighting function is derived, which depends on the a priori signal-to-noise ratio (SNR). Since the speech spectral power is highly important for the a priori SNR estimate, this paper proposes an MMSE spectral power estimator incorporating speech presence uncertainty (SPU) for speech spectral power estimate to improve the a priori SNR estimate. Moreover, a bias correction factor is derived for speech spectral power estimation bias. Then, the estimated speech spectral power is used in “decision-directed” (DD) estimator of the a priori SNR to achieve fast noise tracking. Compared to three state-of-the-art approaches, i.e., minimum statistics (MS), MMSE-based approach, and speech presence probability (SPP)-based approach, it is clear from experimental results that the proposed algorithm exhibits more excellent noise tracking capability under various nonstationary noise environments and SNR conditions. When employed in a speech enhancement system, improved speech enhancement performances in terms of segmental SNR improvements (SSNR+) and perceptual evaluation of speech quality (PESQ) can be observed.  相似文献   

17.
针对非平稳噪声环境和低信噪比的情况,提出了一种基于低频区语音特性的非平稳噪声估计方法,通过构造一个时变的权值,实现对噪声的实时估计,同时结合人耳听觉掩蔽效应,利用估计出的噪声自适应设定增强系数。仿真实验表明,该方法能够较好地抑制背景噪声,提高信噪比,减少语音失真。  相似文献   

18.
This paper proposes a method for enhancing speech signals contaminated by room reverberation and additive stationary noise. The following conditions are assumed. 1) Short-time spectral components of speech and noise are statistically independent Gaussian random variables. 2) A room's convolutive system is modeled as an autoregressive system in each frequency band. 3) A short-time power spectral density of speech is modeled as an all-pole spectrum, while that of noise is assumed to be time-invariant and known in advance. Under these conditions, the proposed method estimates the parameters of the convolutive system and those of the all-pole speech model based on the maximum likelihood estimation method. The estimated parameters are then used to calculate the minimum mean square error estimates of the speech spectral components. The proposed method has two significant features. 1) The parameter estimation part performs noise suppression and dereverberation alternately. (2) Noise-free reverberant speech spectrum estimates, which are transferred by the noise suppression process to the dereverberation process, are represented in the form of a probability distribution. This paper reports the experimental results of 1500 trials conducted using 500 different utterances. The reverberation time RT60 was 0.6 s, and the reverberant signal to noise ratio was 20, 15, or 10 dB. The experimental results show the superiority of the proposed method over the sequential performance of the noise suppression and dereverberation processes.  相似文献   

19.
视觉语音参数估计在视觉语音的研究中占有重要的地位.从MPEG-4定义的人脸动画参数FAP中选择24个与发音有直接关系的参数来描述视觉语音,将统计学习方法和基于规则的方法结合起来,利用人脸颜色概率分布信息和先验形状及边缘知识跟踪嘴唇轮廓线和人脸特征点,取得了较为精确的跟踪效果.在滤除参考点跟踪中的高频噪声后,利用人脸上最为突出的4个参考点估计出主要的人脸运动姿态,从而消除了全局运动的影响,最后根据这些人脸特征点的运动计算出准确的视觉语音参数,并得到了实际应用.  相似文献   

20.
Recently, several algorithms have been proposed to enhance noisy speech by estimating a binary mask that can be used to select those time-frequency regions of a noisy speech signal that contain more speech energy than noise energy. This binary mask encodes the uncertainty associated with enhanced speech in the linear spectral domain. The use of the cepstral transformation smears the information from the noise dominant time-frequency regions across all the cepstral features. We propose a supervised approach using regression trees to learn the nonlinear transformation of the uncertainty from the linear spectral domain to the cepstral domain. This uncertainty is used by a decoder that exploits the variance associated with the enhanced cepstral features to improve robust speech recognition. Systematic evaluations on a subset of the Aurora4 task using the estimated uncertainty show substantial improvement over the baseline performance across various noise conditions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号