首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
针对非负矩阵分解稀疏性不够,通过引入平滑矩阵调节字典矩阵和系数矩阵的稀疏性,提出基于非平滑非负矩阵分解语音增强算法。算法通过语音和噪声的先验字典学习构造联合字典矩阵;然后通过非平滑非负矩阵分解更新带噪语音在联合字典矩阵下的投影系数实现语音增强;同时通过滑动窗口法实时更新先验噪声字典。仿真结果表明,该算法相对非负矩阵分解语音增强算法和MMSE算法具有更好的抑制噪声能力。  相似文献   

2.
何志勇  朱忠奎 《计算机应用》2011,31(12):3441-3445
语音增强的目标在于从含噪信号中提取纯净语音,纯净语音在某些环境下会被脉冲噪声所污染,但脉冲噪声的时域分布特征却给语音增强带来困难,使传统方法在脉冲噪声环境下难以取得满意效果。为在平稳脉冲噪声环境下进行语音增强,提出了一种新方法。该方法通过计算确定脉冲噪声样本的能量与含噪信号样本的能量之比最大的频段,利用该频段能量分布情况逐帧判别语音信号是否被脉冲噪声所污染。进一步地,该方法只在被脉冲噪声污染的帧应用卡尔曼滤波算法去噪,并改进了传统算法执行时的自回归(AR)模型参数估计过程。实验中,采用白色脉冲噪声以及有色脉冲噪声污染语音信号,并对低输入信噪比的信号进行语音增强,结果表明所提出的算法能显著地改善信噪比和抑制脉冲噪声。  相似文献   

3.
针对现有的语音增强算法存在增强效果差、语音信号失真等问题,提出了稀疏低秩模型及改进型相位谱补偿的语音增强算法。首先,用稀疏低秩模型处理含噪语音的幅度谱,得到分离后的语音。接着,用归一化最小均方自适应滤波算法优化相位谱补偿算法的补偿因子。然后,对稀疏低秩分离后的语音进行改进型相位谱补偿处理,得到最终增强的语音。最后,对增强后的语音进行感知语音质量评价分析及频谱分析。实验结果表明,该方法能够有效地去除噪声,并且在低信噪比的情况下,可以保持语音的清晰度。  相似文献   

4.
压缩感知分组分离语音增强   总被引:1,自引:0,他引:1  
压缩感知(Compressive Sensing,CS)是一种基于信号稀疏性的采样方法,可以有效提取信号中所包含的信息。提出了一种分组分离压缩感知语音增强新算法。算法利用语音在离散快速傅里叶变换(Fast Fourier Transform,FFT)域下的稀疏性,设计复域观测矩阵与软阈值对带噪语音进行压缩测量与去噪,通过可分组分离逼近稀疏重建(Sparse Reconstruction by Separable Approximation,SpaRSA)算法恢复语音信号,实现语音增强。实验表明:该算法对含噪信号压缩重构,信噪比幅度较大提高,能更有效地抑制背景噪声。  相似文献   

5.
基于MP算法的语音信号稀疏分解   总被引:4,自引:1,他引:3       下载免费PDF全文
语音信号稀疏分解是一种新的语音信号分解方法,可以将语音信号分解为很简洁的近似表达形式。在语音信号稀疏分解的基础上,可应用于语音处理的多个方面,如语音压缩、语音去噪和语音识别等。研究利用Matching Pursuit(MP)算法实现语音信号的稀疏分解,实验结果表明基于MP算法的语音信号稀疏分解具有较好的重建精度和较高的稀疏度。  相似文献   

6.
根据人耳感知特性,提出了一种小波包感知滤波器与统计方法相结合的语音增强新算法.小波包感知滤波器根据人耳Bark域频率感知特性,将含噪语音频带划分成24个频率群,每个频率群内信号进行最小均方误差对数谱幅度(MMSE-LSA)的估计.通过估计各频率群的先验信噪比得到待估计语音与含噪语音的增益方程,从而得到该频率群内的估计语音,最后将所得的分段估计语音重建即得到增强后的语音.实验结果表明,在各种噪声情况下,该方法均优于其他方法.  相似文献   

7.
针对传统谱减法存在的算法缺陷,提出一种基于联合最大后验概率的改进谱减法.传统谱减法通过获取带噪语音与噪声的幅度差值,并提取带噪语音的相位信息进行语音信号重建.该方法因为谱相减产生“音乐噪声”,并因为相位估计不准确,导致低信噪比下信号增强效果不理想.为此,引入多频带谱减法和相位估计,通过划分频谱,分别在子频带进行谱减法,有效降低“音乐噪声”的影响;同时构建基于最大后验概率的相位估计器,联合信号幅度函数和相位函数,通过多次交替迭代得到相位估值.实验结果表明,相对于传统谱减法,在低信噪比下该算法有效提高增强语音的质量感知和可懂度.  相似文献   

8.
Audio classification is an important problem in signal processing and pattern recognition with potential applications in audio retrieval, documentation and scene analysis. Common to general signal classification systems, it involves both training and classification (or testing) stages. The performance of an audio classification system, such as its complexity and classification accuracy, depends highly on the choice of the signal features and the classifiers. Several features have been widely exploited in existing methods, such as the mel-frequency cepstrum coefficients (MFCCs), line spectral frequencies (LSF) and short time energy (STM). In this paper, instead of using these well-established features, we explore the potential of sparse features, derived from the dictionary of signal atoms using sparse coding based on e.g. orthogonal matching pursuit (OMP), where the atoms are adapted directly from audio training data using the K-SVD dictionary learning algorithm. To reduce the computational complexity, we propose to perform pooling and sampling operations on the sparse coefficients. Such operations also help to maintain a unified dimension of the signal features, regardless of the various lengths of the training and testing signals. Using the popular support vector machine (SVM) as the classifier, we examine the performance of the proposed classification system for two binary classification problems, namely speech–music classification and male–female speech discrimination and a multi-class problem, speaker identification. The experimental results show that the sparse (max-pooled and average-pooled) coefficients perform better than the classical MFCCs features, in particular, for noisy audio data.  相似文献   

9.
针对带噪面罩语音识别率低的问题,结合语音增强算法,对面罩语音进行噪声抑制处理,提高信噪比,在语音增强中提出了一种改进的维纳滤波法,通过谱熵法检测有话帧和无话帧来更新噪声功率谱,同时引入参数控制增益函数;提取面罩语音信号的Mel频率倒谱系数(MFCC)作为特征参数;通过卷积神经网络(CNN)进行训练和识别,并在每个池化层后经局部响应归一化(LRN)进行优化.实验结果表明:该识别系统能够在很大程度上提高带噪面罩语音的识别率.  相似文献   

10.
针对基于高斯分布的谱减语音增强算法,增强语音出现噪声残留和语音失真的问题,提出了基于拉普拉斯分布的最小均方误差(MMSE)谱减算法。首先,对原始带噪语音信号进行分帧、加窗处理,并对处理后每帧的信号进行傅里叶变换,得到短时语音的离散傅里叶变换(DFT)系数;然后,通过计算每一帧的对数谱能量及谱平坦度,进行噪声帧检测,更新噪声估计;其次,基于语音DFT系数服从拉普拉斯分布的假设,在最小均方误差准则下,求解最佳谱减系数,使用该系数进行谱减,得到增强信号谱;最后,对增强信号谱进行傅里叶逆变换、组帧,得到增强语音。实验结果表明,使用所提算法增强的语音信噪比(SNR)平均提高了4.3 dB,与过减法相比,有2 dB的提升;在语音质量感知评估(PESQ)得分方面,与过减法相比,所提算法平均得分有10%的提高。该算法有更好的噪声抑制能力和较小的语音失真,在SNR和PESQ评价标准上有较大提升。  相似文献   

11.
针对传统单通道语音增强方法中用带噪语音相位代替纯净语音相位重建时域信号,使得语音主观感知质量改善受限的情况,提出了一种改进相位谱补偿的语音增强算法。该算法提出了基于每帧语音输入信噪比的Sigmoid型相位谱补偿函数,能够根据噪声的变化来灵活地对带噪语音的相位谱进行补偿;结合改进DD的先验信噪比估计与语音存在概率算法(SPP)来估计噪声功率谱;在维纳滤波中结合新的语音存在概率噪声功率谱估计与相位谱补偿来提高语音的增强效果。相比传统相位谱补偿(PSC)算法而言,改进算法可以有效抑制音频信号中的各类噪声,同时增强语音信号感知质量,提升语音的可懂度。  相似文献   

12.
Looking at the speaker's face can be useful to better hear a speech signal in noisy environment and extract it from competing sources before identification. This suggests that the visual signals of speech (movements of visible articulators) could be used in speech enhancement or extraction systems. In this paper, we present a novel algorithm plugging audiovisual coherence of speech signals, estimated by statistical tools, on audio blind source separation (BSS) techniques. This algorithm is applied to the difficult and realistic case of convolutive mixtures. The algorithm mainly works in the frequency (transform) domain, where the convolutive mixture becomes an additive mixture for each frequency channel. Frequency by frequency separation is made by an audio BSS algorithm. The audio and visual informations are modeled by a newly proposed statistical model. This model is then used to solve the standard source permutation and scale factor ambiguities encountered for each frequency after the audio blind separation stage. The proposed method is shown to be efficient in the case of 2 times 2 convolutive mixtures and offers promising perspectives for extracting a particular speech source of interest from complex mixtures  相似文献   

13.
在Bark子波的构造的基础上,提出一种改进的Bark子波变换构造方法,即直接由临界带中心频率确定Bark子波的中心频率,保证了其通带和临界带的对应一致性,并与人耳的听觉系统十分吻合。采用Bark子波对带噪语音进行分解,在语音信号的子带层次上用一种类似于软阈值的无穷阶可导的函数进行阈值处理,并应用谱减法进行二次增强。仿真实验表明,构建Bark子波与增强算法使信噪比和PESQ得分都有较大提高,特别是在信噪比较高时,语音具有很好的清晰度和可懂度。  相似文献   

14.
针对带噪面罩语音清晰度和可懂度低的问题,提出了一种将压缩感知和经验模式分解(Empirical Mode Decomposition,EMD)相结合的方法来对带噪面罩语音进行增强。首先对带噪面罩语音进行EMD分解得到其本征模式函数信号分量,对其特定本征模式分量进行小波阈值去噪;然后对全部信号分量进行压缩感知,最后重构信号分量得到增强后面罩语音。由实验结果可知,文中提出的方法去噪效果较好,重构误差较小,稳定性较高,有效地实现了面罩语音的增强。  相似文献   

15.
This paper presents a method for reconstructing unreliable spectral components of speech signals using the statistical distributions of the clean components. Our goal is to model the temporal patterns in speech signal and take advantage of correlations between speech features in both time and frequency domain simultaneously. In this approach, a hidden Markov model (HMM) is first trained on clean speech data to model the temporal patterns which appear in the sequences of the spectral components. Using this model and according to the probabilities of occurring noisy spectral component at each states, a probability distributions for noisy components are estimated. Then, by applying maximum a posteriori (MAP) estimation on the mentioned distributions, the final estimations of the unreliable spectral components are obtained. The proposed method is compared to a common missing feature method which is based on the probabilistic clustering of the feature vectors and also to a state of the art method based on sparse reconstruction. The experimental results exhibits significant improvement in recognition accuracy over a noise polluted Persian corpus.  相似文献   

16.
In this paper, we proposed a new speech enhancement system, which integrates a perceptual filterbank and minimum mean square error–short time spectral amplitude (MMSE–STSA) estimation, modified according to speech presence uncertainty. The perceptual filterbank was designed by adjusting undecimated wavelet packet decomposition (UWPD) tree, according to critical bands of psycho-acoustic model of human auditory system. The MMSE–STSA estimation (modified according to speech presence uncertainty) was used for estimation of speech in undecimated wavelet packet domain. The perceptual filterbank provides a good auditory representation (sufficient frequency resolution), good perceptual quality of speech and low computational load. The MMSE–STSA estimator is based on a priori SNR estimation. A priori SNR estimation, which is a key parameter in MMSE–STSA estimator, was performed by using “decision directed method.” The “decision directed method” provides a trade off between noise reduction and signal distortion when correctly tuned. The experiments were conducted for various noise types. The results of proposed method were compared with those of other popular methods, Wiener estimation and MMSE–log spectral amplitude (MMSE–LSA) estimation in frequency domain. To test the performance of the proposed speech enhancement system, three objective quality measurement tests (SNR, segSNR and Itakura–Saito distance (ISd)) were conducted for various noise types and SNRs. Experimental results and objective quality measurement test results proved the performance of proposed speech enhancement system. The proposed speech enhancement system provided sufficient noise reduction and good intelligibility and perceptual quality, without causing considerable signal distortion and musical background noise.  相似文献   

17.
提出了一种基于人类听觉系统的多频带非线性谱减法来进行语音增强。根据人耳听觉特性,将含噪语音信号分在24个临界频带内,由各频带的不同信噪比来确定对应的谱减参数值。实验结果证明,在相同实验条件下,与功率谱减法(PSS)、非线性谱减法(NSS)和传统多频带谱减法(MBSS)相比,该方法增强后的语音信号具有更高的输出信噪比;能更好地消除背景噪声,抑制残留噪声;增强后的语音具有更好的可懂度和清晰度。  相似文献   

18.
该文通过对噪音和语音频谱特性的分析,针对不同使用环境,采用不同方法对语音信号进行端点检测。利用短时过零率、短时幅度和语音持续时间实现准静音环境下的端点检测,对语音频谱进行增强处理,实现噪音环境下的端点检测。  相似文献   

19.
提出一种基于交替方向乘子法的(Alternating Direction Method of Multipliers,ADMM)稀疏非负矩阵分解语音增强算法,该算法既能克服经典非负矩阵分解(Nonnegative Matrix Factorization,NMF)语音增强算法存在收敛速度慢、易陷入局部最优等问题,也能发挥ADMM分解矩阵具有的强稀疏性。算法分为训练和增强两个阶段:训练时,采用基于ADMM非负矩阵分解算法对噪声频谱进行训练,提取噪声字典,保存其作为增强阶段的先验信息;增强时,通过稀疏非负矩阵分解算法,从带噪语音频谱中对语音字典和语音编码进行估计,重构原始干净的语音,实现语音增强。实验表明,该算法速度更快,增强后语音的失真更小,尤其在瞬时噪声环境下效果显著。  相似文献   

20.
An effective way to increase noise robustness in automatic speech recognition is to label the noisy speech features as either reliable or unreliable (‘missing’), and replace (‘impute’) the missing ones by clean speech estimates. Conventional imputation techniques employ parametric models and impute the missing features on a frame-by-frame basis. At low SNRs, frame-based imputation techniques fail because many time frames contain few, if any, reliable features. In previous work, we introduced an exemplar-based method, dubbed sparse imputation, which can impute missing features using reliable features from neighbouring frames. We achieved substantial gains in performance at low SNRs for a connected digit recognition task. In this work, we investigate whether the exemplar-based approach can be generalised to a large vocabulary task.Experiments on artificially corrupted speech show that sparse imputation substantially outperforms a conventional imputation technique when the ideal ‘oracle’ reliability of features is used. With error-prone estimates of feature reliability, sparse imputation performance is comparable to our baseline imputation technique in the cleanest conditions, and substantially better at lower SNRs. With noisy speech recorded in realistic noise conditions, sparse imputation performs slightly worse than our baseline imputation technique in the cleanest conditions, but substantially better in the noisier conditions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号