期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Speech Enhancement Using Gaussian Scale Mixture Models

Hao J Lee TW Sejnowski TJ 《IEEE transactions on audio, speech, and language processing》2010,18(6):1127-1136

This paper presents a novel probabilistic approach to speech enhancement. Instead of a deterministic logarithmic relationship, we assume a probabilistic relationship between the frequency coefficients and the log-spectra. The speech model in the log-spectral domain is a Gaussian mixture model (GMM). The frequency coefficients obey a zero-mean Gaussian whose covariance equals to the exponential of the log-spectra. This results in a Gaussian scale mixture model (GSMM) for the speech signal in the frequency domain, since the log-spectra can be regarded as scaling factors. The probabilistic relation between frequency coefficients and log-spectra allows these to be treated as two random variables, both to be estimated from the noisy signals. Expectation-maximization (EM) was used to train the GSMM and Bayesian inference was used to compute the posterior signal distribution. Because exact inference of this full probabilistic model is computationally intractable, we developed two approaches to enhance the efficiency: the Laplace method and a variational approximation. The proposed methods were applied to enhance speech corrupted by Gaussian noise and speech-shaped noise (SSN). For both approximations, signals reconstructed from the estimated frequency coefficients provided higher signal-to-noise ratio (SNR) and those reconstructed from the estimated log-spectra produced lower word recognition error rate because the log-spectra fit the inputs to the recognizer better. Our algorithms effectively reduced the SSN, which algorithms based on spectral analysis were not able to suppress. 相似文献

2.

OM-LSA和小波阈值去噪结合的语音增强

刘凤增李国辉李博《计算机科学与探索》2011,5(6):547-552

针对OM-LSA(optimally modified log-spectral amplitude estimator)算法产生的残留噪声,提出了一种结合OM-LSA和小波阈值去噪的语音增强算法。首先,进行语音对数幅度谱估计;然后,估计残留噪声,利用带噪语音第一级小波系数和语音不存在时的增益函数进行估计,解决了常规方法对增强后语音噪声估计不准确的问题;最后,在小波域利用软阈值法对语音信号进行阈值处理。实验结果表明,提出的算法有效地去除了OM-LSA算法中的残余噪声,在分段信噪比(segmental signal-to-noise ratio,SegSNR)和对数谱失真(log-spectral distortion,LSD)等指标评价上有较大的提高。相似文献

3.

Enhancement of speech signals separated from their convolutive mixture by FDICA algorithm

Rajkishore Prasad Hiroshi Saruwatari Kyohiro Shikano 《Digital Signal Processing》2009,19(1):127-133

This paper presents a novel method for the enhancement of independent components of mixed speech signal segregated by the frequency domain independent component analysis (FDICA) algorithm. The enhancement algorithm proposed here is based on maximum a posteriori (MAP) estimation of the speech spectral components using generalized Gaussian distribution (GGD) function as the statistical model for the time–frequency series of speech (TFSS) signal. The proposed MAP estimator has been used and evaluated as the post-processing stage for the separation of convolutive mixture of speech signals by the fixed-point FDICA algorithm. It has been found that the combination of separation algorithm with the proposed enhancement algorithm provides better separation performance under both the reverberant and non-reverberant conditions. 相似文献

4.

Speech enhancement based on undecimated wavelet packet-perceptual filterbanks and MMSE–STSA estimation in various noise environments

Hac&#x; Ergun 《Digital Signal Processing》2008,18(5):797-812

In this paper, we proposed a new speech enhancement system, which integrates a perceptual filterbank and minimum mean square error–short time spectral amplitude (MMSE–STSA) estimation, modified according to speech presence uncertainty. The perceptual filterbank was designed by adjusting undecimated wavelet packet decomposition (UWPD) tree, according to critical bands of psycho-acoustic model of human auditory system. The MMSE–STSA estimation (modified according to speech presence uncertainty) was used for estimation of speech in undecimated wavelet packet domain. The perceptual filterbank provides a good auditory representation (sufficient frequency resolution), good perceptual quality of speech and low computational load. The MMSE–STSA estimator is based on a priori SNR estimation. A priori SNR estimation, which is a key parameter in MMSE–STSA estimator, was performed by using “decision directed method.” The “decision directed method” provides a trade off between noise reduction and signal distortion when correctly tuned. The experiments were conducted for various noise types. The results of proposed method were compared with those of other popular methods, Wiener estimation and MMSE–log spectral amplitude (MMSE–LSA) estimation in frequency domain. To test the performance of the proposed speech enhancement system, three objective quality measurement tests (SNR, segSNR and Itakura–Saito distance (ISd)) were conducted for various noise types and SNRs. Experimental results and objective quality measurement test results proved the performance of proposed speech enhancement system. The proposed speech enhancement system provided sufficient noise reduction and good intelligibility and perceptual quality, without causing considerable signal distortion and musical background noise. 相似文献

5.

基于快速傅里叶变换的正弦信号频率高精度估计算法

樊磊齐国清《计算机应用》2015,35(11):3280-3283

为了进一步提高加性高斯白噪声背景中正弦信号的频率估计精度,提出了一种新的基于插值快速傅里叶变换(FFT)的正弦信号频率估计算法.首先,对N点正弦采样序列进行等长度时域补零延长,再进行 2N 点FFT; 然后, 搜索幅度最大离散谱线位置得到频率粗估计值; 最后, 采用幅度最大谱线以及原信号的离散时间傅里叶变换(DTFT)在幅度最大谱线左右两侧的两点抽样值进行精估计.仿真结果表明,当信号实际频率位于FFT两条离散谱线之间任意位置时,所提算法的频率估计均方根误差均接近克拉美罗下限,具有较好的一致性,估计精度高于Candan算法、Fang算法、三谱线合理结合(RCTSL)算法和Aboutanios算法, 且信噪比阈值较低,估计性能优于现有频率估计算法. 相似文献

6.

基于增益字典查询的语音增强算法

庞亮陈亮张翼鹏黄清泉《计算机科学》2015,42(10):16-19

对于基于统计模型的语音增强算法,不同分布模型对应于不同的增益函数,由于语音信号的不确定性,没有一种分布函数能准确对语音和噪声谱的分布建模,因此任何一种固定的统计模型均会存在一定的误差。所以提出一种增益字典查询的语音增强算法,该算法通过采用对数谱失真准则对一个语音噪声库进行增益的训练,得到一个增益的字典,其中输入为先验信噪比和后验信噪比的估计值。最后采用ITU-T P.826 PESQ、分段信噪比、总信噪比和对数谱失真对该算法进行了测试,并与基于高斯分布模型、拉普拉斯分布模型的算法进行了对比。实验结果表明,该算法无论在非平稳噪声还是平稳噪声环境下都比其他几种算法增强效果好,且音乐噪声和残留背景噪声也可以得到很好的抑制。相似文献

7.

修正的基于广义Gamma语音模型语音增强算法

赵改华周彬张雄伟《计算机工程与应用》2014,50(18):230-235

广义Gamma模型是近年来新提出的一种语音分布模型,相对于传统的高斯或超高斯模型具有更好的普适性和灵活性,提出一种基于广义Gamma语音模型和语音存在概率修正的语音增强算法。在假设语音和噪声的幅度谱系数分别服从广义Gamma分布和Gaussian分布的基础上,推导了语音信号对数谱的最小均方误差估计式;在该模型下进一步推导了语音存在概率,对最小均方误差估计进行修正。仿真结果表明,与传统的短时谱估计算法相比,该算法不仅能够进一步提高增强语音的信噪比,而且可以有效减小增强语音的失真度,提高增强语音的主观感知质量。相似文献

8.

A novel fast nonstationary noise tracking approach based on MMSE spectral power estimator

《Digital Signal Processing》2019

Estimating the noise power spectral density (PSD) from the corrupted speech signal is an essential component for speech enhancement algorithms. In this paper, a novel noise PSD estimation algorithm based on minimum mean-square error (MMSE) is proposed. The noise PSD estimate is obtained by recursively smoothing the MMSE estimation of the current noise spectral power. For the noise spectral power estimation, a spectral weighting function is derived, which depends on the a priori signal-to-noise ratio (SNR). Since the speech spectral power is highly important for the a priori SNR estimate, this paper proposes an MMSE spectral power estimator incorporating speech presence uncertainty (SPU) for speech spectral power estimate to improve the a priori SNR estimate. Moreover, a bias correction factor is derived for speech spectral power estimation bias. Then, the estimated speech spectral power is used in “decision-directed” (DD) estimator of the a priori SNR to achieve fast noise tracking. Compared to three state-of-the-art approaches, i.e., minimum statistics (MS), MMSE-based approach, and speech presence probability (SPP)-based approach, it is clear from experimental results that the proposed algorithm exhibits more excellent noise tracking capability under various nonstationary noise environments and SNR conditions. When employed in a speech enhancement system, improved speech enhancement performances in terms of segmental SNR improvements (SSNR+) and perceptual evaluation of speech quality (PESQ) can be observed. 相似文献

9.

Optimal speech enhancement under signal presence uncertainty using Log Gabor Wavelet and Bayesian Joint Statistics

Suman Senapati 《International Journal of Speech Technology》2013,16(4):439-459

This paper investigates the problem of speech enhancement when only a single microphone is used and the statistics of the interfering noise and speech are not available a priori. Thus it seeks to address a pitfall of many current enhancement techniques and look towards a system which would have application in the real world. This paper focuses on Log Gabor Wavelet (LGW) based Long Term Squared Spectral Amplitude estimator using the Maximum a Posteriori (MAP) criterion. To begin with, long term cepstral mean subtraction technique with LGW is proposed to suppress telephone channel and handset effect from the speech signals. Then a novel speech enhancer by MAP based Bayesian Bivariate Model is developed to suppress the background noise. This work also introduces an inter-scale dependency between the coefficients and their parents by a Circularly Symmetric probability density function related to the family of Spherically Invariant Random Process (SIRPs). The corresponding joint estimator is derived by MAP estimation theory. The inter-scale noise variance of the coefficients is kept constant which gives closed form solution. Consideration of speech presence uncertainty (SPU) estimator is another contribution to the proposed estimator. Therefore, in this paper, the main contributions are; (i) combination of LGW, SIRPs and SPU for background noise reduction, (ii) LGW and Long Term Cepstral Mean Subtraction to reduce the effects of both telephone channel and handsets, (iii) circularly Symmetric probability density function to exploit the inter-scale dependency between the coefficients and their parents and corresponding joint estimators are derived by MAP estimation theory, (iv) the inter-scale noise variance of the coefficients is kept constant which gives closed form solution, (v) idea refines the estimate of the magnitudes by scaling them by the SPU probability. Extensive comparisons are done among the proposed and existing speech enhancement algorithms on NOIZEUS speech database which has different types of noise. We report the subjective and objective evaluations encompassing four classes of algorithms: spectral subtractive, subspace, statistical model based and Wiener type against the proposed methods. Experimental results show that the proposed estimator yields a higher improvement in Segmental SNR (SSNR), lower Log Area Ratio (LAR), Weighted Spectral Slope (WSS) distortion, higher Perceptual Evaluation of Speech Quality (PESQ) and Mean Opinion Score (MOS) compared to the existing speech enhancement algorithms. For SSNR measure, the proposed methods show 2 dB of improvement than existing methods for almost every Noise sources. For MOS measure, the proposed methods show improvement than existing methods for almost every Noise sources. Therefore the proposed methods are aiming to enhance the speech quality as well as intelligibility at a time. 相似文献

10.

基于自适应超高斯混合模型的语音增强算法

赵改华《数据采集与处理》2014,29(2):232-237

摘要：语音信号的频谱结构复杂性决定了其短时谱分布不能用单一的概率密度函数(probability density function; PDF )准确描述,据此,本文提出了一种采用超高斯混合模型对语音信号幅度谱建模以实现语音增强的新方法。首先,采用超高斯混合模型对语音信号幅度谱的先验分布进行建模,相对于传统的单一模型,该模型能更好地描述语音信号的多类特性;然后,在增强过程中自适应更新混合分量的PDF及其权重,从而克服了传统模型难以跟踪语音信号分布动态变化的缺点。仿真结果表明与传统的短时谱估计算法相比,该算法的噪声抑制性能有较大的提升,增强语音的主观感知质量也有明显改善。相似文献

11.

基于联合最大后验概率的语音增强算法

李婉玲张秋菊《计算机系统应用》2018,27(12):163-168

针对传统谱减法存在的算法缺陷,提出一种基于联合最大后验概率的改进谱减法.传统谱减法通过获取带噪语音与噪声的幅度差值,并提取带噪语音的相位信息进行语音信号重建.该方法因为谱相减产生“音乐噪声”,并因为相位估计不准确,导致低信噪比下信号增强效果不理想.为此,引入多频带谱减法和相位估计,通过划分频谱,分别在子频带进行谱减法,有效降低“音乐噪声”的影响;同时构建基于最大后验概率的相位估计器,联合信号幅度函数和相位函数,通过多次交替迭代得到相位估值.实验结果表明,相对于传统谱减法,在低信噪比下该算法有效提高增强语音的质量感知和可懂度. 相似文献

12.

Low bit-rate speech coding based on multicomponent AFM signal model

Mohan Bansal Pradip Sircar 《International Journal of Speech Technology》2018,21(4):783-795

In this paper, we propose a novel multicomponent amplitude and frequency modulated (AFM) signal model for parametric representation of speech phonemes. An efficient technique is developed for parameter estimation of the proposed model. The Fourier–Bessel series expansion is used to separate a multicomponent speech signal into a set of individual components. The discrete energy separation algorithm is used to extract the amplitude envelope (AE) and the instantaneous frequency (IF) of each component of the speech signal. Then, the parameter estimation of the proposed AFM signal model is carried out by analysing the AE and IF parts of the signal component. The developed model is found to be suitable for representation of an entire speech phoneme (voiced or unvoiced) irrespective of its time duration, and the model is shown to be applicable for low bit-rate speech coding. The symmetric Itakura–Saito and the root-mean-square log-spectral distance measures are used for comparison of the original and reconstructed speech signals. 相似文献

13.

Bayesian marginal statistics for speech enhancement using log Gabor wavelet

Suman Senapati Neeraj Bhende Goutam Saha 《International Journal of Speech Technology》2011,14(3):193-210

This paper deals with single-channel speech enhancement technique. Initially, the suitability of Log Gabor Wavelet (LGW) is investigated in speech enhancement approach and a novel speech enhancer by Bayesian Maximum a Posteriori (MAP) based Marginal Statistical Characterization (MSC) is developed. The LGW filters are traditional choice for obtaining localized frequency information and these offer the best simultaneous localization of time and frequency information. The MSC is applied in each scale of the LGW, that means a level dependent shrinkage rule is taken to suppress the background perturbations. The pdf of the LGW filtered speech coefficient is modeled with Generalized Laplacian Distribution (GLD), which allows a high approximation accuracy for Laplace distributed real and imaginary parts of the speech coefficients. The robustness of the proposed framework is tested on NOIZEUS speech corpus against seven different established speech enhancement algorithms. Experimental results show that the proposed estimator yield a higher improvement in Segmental SNR (S-SNR), lower Log Area Ratio (LAR) and Weighted Spectral Slope (WSS) distortion compared to existing speech enhancement algorithms. 相似文献

14.

基于TF-GSC的多通道后置滤波语音增强算法

马子骥倪忠余旭《传感器与微系统》2018,(5):105-107,111

针对传统语音增强算法在非平稳噪声,尤其是在噪声为语音的环境下,对噪声的抑制效果急剧下降的情况,提出了一种基于传递函数—广义旁瓣抵消(TF-GSC)和最佳修正测井谱振幅估计量(OM-LSA)的改进型多通道后置滤波语音增强算法.算法在后置滤波时,利用TF-GSC输出信号与参考噪声之间的相互关系求解出语音存在概率,并更新噪声功率谱估计.实验结果表明:算法可以有效地抑制非平稳噪声,提高语音增强算法在语音噪声环境下的鲁棒性. 相似文献

15.

Speech enhancement using a Bayesian evidence approach

《Computer Speech and Language》2001,15(2):101-125

We consider the enhancement of speech corrupted by additive white Gaussian noise. In a Bayesian inference framework, maximum a posteriori (MAP) estimation of the signal is performed, along the lines developed by Lim & Oppenheim (1978). The speech enhancement problem is treated as a signal estimation problem, whose aim is to obtain a MAP estimate of the clean speech signal, given the noisy observations. The novelty of our approach, over previously reported work, is that we relate the variance of the additive noise and the gain of the autoregressive (AR) process to hyperparameters in a hierarchical Bayesian framework. These hyperparameters are computed from the noisy speech data to maximize the denominator in Bayes formula, also known as the evidence. The resulting Bayesian scheme is capable of performing speech enhancement from the noisy data without the need for silence detection. Experimental results are presented for stationary and slowly varying additive white Gaussian noise. The Bayesian scheme is also compared to the Lim and Oppenheim system, and the spectral subtraction method. 相似文献

16.

拉普拉斯分布下的MMSE谱减语音增强算法

王永彪张文喜王亚慧孔新新吕彤《计算机应用》2020,40(3):878-882

针对基于高斯分布的谱减语音增强算法,增强语音出现噪声残留和语音失真的问题,提出了基于拉普拉斯分布的最小均方误差（MMSE）谱减算法。首先,对原始带噪语音信号进行分帧、加窗处理,并对处理后每帧的信号进行傅里叶变换,得到短时语音的离散傅里叶变换（DFT）系数;然后,通过计算每一帧的对数谱能量及谱平坦度,进行噪声帧检测,更新噪声估计;其次,基于语音DFT系数服从拉普拉斯分布的假设,在最小均方误差准则下,求解最佳谱减系数,使用该系数进行谱减,得到增强信号谱;最后,对增强信号谱进行傅里叶逆变换、组帧,得到增强语音。实验结果表明,使用所提算法增强的语音信噪比（SNR）平均提高了4.3 dB,与过减法相比,有2 dB的提升;在语音质量感知评估（PESQ）得分方面,与过减法相比,所提算法平均得分有10%的提高。该算法有更好的噪声抑制能力和较小的语音失真,在SNR和PESQ评价标准上有较大提升。相似文献

17.

基于码本学习的改进谱减语音增强算法

下载免费PDF全文

隋璐瑛张雄伟黄建军赵改华《计算机工程与应用》2013,49(16):216-220

提出一种可适应非平稳噪声环境的基于码本学习的改进谱减语音增强算法。该算法分为训练阶段和增强阶段。训练阶段,使用自回归模型对语音和噪声的频谱形状进行建模并构造语音和噪声码本;增强阶段,采用对数谱最小化算法估计出语音和噪声的频谱,通过谱相减消除噪声。算法在每个时间帧估计语音和噪声频谱,即使在语音存在时仍能够有效跟踪快速变化的非平稳噪声;采用自回归模型能得到噪声频谱的平滑估计,减少了音乐噪声。实验仿真表明,相比于传统谱减法和多带谱减法,改进的谱减法具有更好的噪声抑制性能并且语音失真更小。相似文献

18.

Speech enhancement based on stationary bionic wavelet transform and maximum a posterior estimator of magnitude-squared spectrum

Talbi Mourad 《International Journal of Speech Technology》2017,20(1):75-88

Numerous efforts have focused on the problem of reducing the impact of noise on the performance of various speech systems such as speech coding, speech recognition and speaker recognition. These approaches consider alternative speech features, improved speech modeling, or alternative training for acoustic speech models. In this paper, we propose a new speech enhancement technique, which integrates a new proposed wavelet transform which we call stationary bionic wavelet transform (SBWT) and the maximum a posterior estimator of magnitude-squared spectrum (MSS-MAP). The SBWT is introduced in order to solve the problem of the perfect reconstruction associated with the bionic wavelet transform. The MSS-MAP estimation was used for estimation of speech in the SBWT domain. The experiments were conducted for various noise types and different speech signals. The results of the proposed technique were compared with those of other popular methods such as Wiener filtering and MSS-MAP estimation in frequency domain. To test the performance of the proposed speech enhancement system, four objective quality measurement tests [signal to noise ratio (SNR), segmental SNR, Itakura–Saito distance and perceptual evaluation of speech quality] were conducted for various noise types and SNRs. Experimental results and objective quality measurement test results proved the performance of the proposed speech enhancement technique. It provided sufficient noise reduction and good intelligibility and perceptual quality, without causing considerable signal distortion and musical background noise. 相似文献

19.

脉冲噪声环境下基于卡尔曼滤波的语音增强

何志勇朱忠奎《计算机应用》2011,31(12):3441-3445

语音增强的目标在于从含噪信号中提取纯净语音,纯净语音在某些环境下会被脉冲噪声所污染,但脉冲噪声的时域分布特征却给语音增强带来困难,使传统方法在脉冲噪声环境下难以取得满意效果。为在平稳脉冲噪声环境下进行语音增强,提出了一种新方法。该方法通过计算确定脉冲噪声样本的能量与含噪信号样本的能量之比最大的频段,利用该频段能量分布情况逐帧判别语音信号是否被脉冲噪声所污染。进一步地,该方法只在被脉冲噪声污染的帧应用卡尔曼滤波算法去噪,并改进了传统算法执行时的自回归(AR)模型参数估计过程。实验中,采用白色脉冲噪声以及有色脉冲噪声污染语音信号,并对低输入信噪比的信号进行语音增强,结果表明所提出的算法能显著地改善信噪比和抑制脉冲噪声。相似文献

20.

Directionlet-based method using the Gaussian mixture prior to SAR image despeckling

Yixiang Lu Dong Sun Dexiang Zhang 《International journal of remote sensing》2013,34(3):1143-1161

In this article, a new denoising algorithm is proposed based on the directionlet transform and the maximum a posteriori (MAP) estimation. The detailed directionlet coefficients of the logarithmically transformed noise-free image are considered to be Gaussian mixture probability density functions (PDFs) with zero means, and the speckle noise in the directionlet domain is modelled as additive noise with a Gaussian distribution. Then, we develop a Bayesian MAP estimator using these assumed prior distributions. Because the estimator that is the solution of the MAP equation is a function of the parameters of the assumed mixture PDF models, the expectation-maximization (EM) algorithm is also utilized to estimate the parameters, including weight factors and variances. Finally, the noise-free SAR image is restored from the estimated coefficients yielded by the MAP estimator. Experimental results show that the directionlet-based MAP method can be successfully applied to images and real synthetic aperture radar images to denoise speckle. 相似文献