首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 46 毫秒
1.
葛宛营  张天骐 《计算机应用》2019,39(10):3065-3070
单通道语音增强算法通过从带噪语音中估计并抑制噪声成分来得到增强语音。然而,噪声估计算法在计算时存在过估现象,导致部分估计噪声能量值比实际值大。尽管可以通过补偿消去这些过估值,但引入的误差同样会降低增强语音的整体质量。针对此问题,提出一种基于计算听觉场景分析(CASA)的时频掩蔽估计与优化算法。首先,通过直接判决(DD)算法估计先验信噪比(SNR)并计算初始掩蔽;其次,利用噪声与带噪语音在Gammatone频带内的互相关(ICC)系数来计算噪声的存在概率,结合带噪语音能量谱得到新的噪声估计,减少原估计噪声中的过估成分;然后,利用优化算法对初始掩蔽进行迭代处理以减少其中因噪声过估而存在的误差并增加其中的目标语音成分,在满足条件后停止迭代并得到新的掩蔽;最后,利用新的掩蔽合成增强语音。实验结果表明在不同的背景噪声下,相比优化前,新的掩蔽使增强语音获得了较高的主观语音质量(PESQ)和语音可懂度(STOI)值,提升了语音听感与可懂度。  相似文献   

2.
基于感知掩蔽深度神经网络的单通道语音增强方法   总被引:1,自引:0,他引:1  
本文将心理声学掩蔽特性应用于基于深度神经网络(Deep neural network,DNN)的单通道语音增强任务中,提出了一种具有感知掩蔽特性的DNN结构.首先,提出的DNN对带噪语音幅度谱特征进行训练并分别得到纯净语音和噪声的幅度谱估计.其次,利用估计的纯净语音幅度谱计算噪声掩蔽阈值.然后,将噪声掩蔽阈值和估计的噪声幅度谱联合计算得到一个感知增益函数.最后,利用感知增益函数从带噪语音幅度谱中估计出增强语音幅度谱.在TIMIT数据库上,对不同信噪比下的20种噪声进行的仿真实验表明,无论噪声类型是否在语音的训练集中出现,所提出的感知掩蔽DNN都能够在有效去除噪声的同时保持较小的语音失真,增强效果明显优于常见的DNN增强方法以及NMF(Nonnegative matrix factorization)增强方法.  相似文献   

3.
针对传统单通道语音增强方法中用带噪语音相位代替纯净语音相位重建时域信号,使得语音主观感知质量改善受限的情况,提出了一种改进相位谱补偿的语音增强算法。该算法提出了基于每帧语音输入信噪比的Sigmoid型相位谱补偿函数,能够根据噪声的变化来灵活地对带噪语音的相位谱进行补偿;结合改进DD的先验信噪比估计与语音存在概率算法(SPP)来估计噪声功率谱;在维纳滤波中结合新的语音存在概率噪声功率谱估计与相位谱补偿来提高语音的增强效果。相比传统相位谱补偿(PSC)算法而言,改进算法可以有效抑制音频信号中的各类噪声,同时增强语音信号感知质量,提升语音的可懂度。  相似文献   

4.
论文基于多带谱减法提出了一种改进的单通道语音增强算法研究。对补偿相位谱中的相位补偿函数进行改进,将等效矩形带宽(Equivalent Rectangular Bandwidth)尺度应用于相位补偿函数中,最终把谱减后的语音幅度谱与修正的补偿相位谱相结合得到增强的语音复频谱,而不是保留带噪语音信号的相位谱。对提出的语音增强算法进行性能分析发现,本文提出的算法从主、客观两方面评价均可有效地抑制背景噪声与残余噪声。  相似文献   

5.
深度神经网络(Deep neural networks,DNNs)依靠其良好的特征提取能力,在语音增强任务中得到了广泛应用。为进一步提高深度神经网络的语音增强效果,提出一种将深度神经网络和约束维纳滤波联合训练优化的新型网络结构。该网络首先对带噪语音幅度谱进行训练并分别得到纯净语音和噪声的幅度谱估计,然后利用语音和噪声的幅度谱估计计算得到一个约束维纳增益函数,最后利用约束维纳增益函数从带噪语音幅度谱中估计出增强语音幅度谱作为网络的训练输出。对不同信噪比下的20种噪声进行的仿真实验表明,无论噪声类型是否在网络的训练集中出现,本文方法都能够在有效去除噪声的同时保持较小的语音失真,增强效果明显优于DNN及NMF增强方法。  相似文献   

6.
为解决传统单声道语音增强方法在对相位处理时存在的不足以及降噪过程中普遍存在的语音失真问题,提出改进相位补偿结合谐波重构的语音增强方法.通过深度学习模型估计先验信噪比并利用先验信噪比对传统相位谱补偿(PSC)函数进行改进,针对在降噪过程中出现的语音失真问题,对增强后的语音通过谐波重构进行二次增强.实验结果表明,改进相位补...  相似文献   

7.
基于Bark域噪声估计及掩蔽效应的语音增强   总被引:4,自引:3,他引:1       下载免费PDF全文
赵欢  熊敏  侯卫国 《计算机工程》2009,35(12):261-263
针对非平稳环境下噪声估计和语音增强性能降低的特点,提出一种基于Bark域的快速自适应噪声谱估计算法。它基于听觉模型,将带噪信号变换到Bark域,并在Bark域内实现基于人耳掩蔽的语音增强。仿真实验表明该算法能充分利用Bark带内频带间的相关性,跟踪快变的背景噪声,提高语音增强性能,减少运算量和复杂度。  相似文献   

8.
针对现有的助听器语音增强算法在非平稳噪声环境下,残留大量背景噪声的同时还引入了“音乐噪声”,致使增强语音可懂度和信噪比不理想等问题。提出了一种基于噪声估计的二值掩蔽语音增强算法,该算法利用人耳听觉感知理论,结合人耳的听觉特性和耳蜗的工作机理。采用最小值控制递归平均(Minima-Controlled Recursive Averaging,MCRA)算法获得估计噪声和初步增强语音;将估计噪声和初步增强语音分别通过可以模拟人工耳蜗模型的gammatone滤波器组进行滤波处理,得到各自的时频表示形式;利用人耳的听觉掩蔽特性,计算含噪语音在时频域的二值掩蔽;利用二值掩蔽得到增强语音。实验结果表明:该算法很大程度上去除了谱减法引入的“音乐噪声”,与基于MCRA谱减法相比,增强语音的语言可懂度指数(Speech Intelligibility Index,SII)、主观语音质量评估(Perceptual Evaluation of Speech Quality,PESQ)和信噪比(Signal to Noise Ratio,SNR)都得到了提高。  相似文献   

9.
针对深度信念网络(Deep Believe Network,DBN)模型泛化能力较弱,导致语音增强效果不佳的问题,提出了一种特征联合优化的回归DBN语音增强算法。该算法对语音和噪声不做任何假设。该算法分别提取语音信号的LMPS(Log-Mel frequency Power Spectrum)和MFCC(Mel-Frequency Cepstral Coefficients)特征。LMPS用于直接重构增强语音,保证了语音听觉质量,MFCC作为辅助次级特征。将两种特征联合输入到DBN体系中对网络参数进行优化。这种联合优化在对LMPS的直接预测中加入MFCC限制,提升了模型对LMPS估计的泛化能力,更加准确地重构增强语音。仿真结果表明,在不同的信噪比环境下,与LPS(Log Power Spectrum)和LMPS单特征优化相比,LMPS和MFCC联合优化使增强语音获得了较高的PESQ和SNR,提高了语音质量和可懂度。  相似文献   

10.
基于神经网络的语音增强任务中相位估计不准确会导致增强语音质量差,针对这一问题,提出了一种基于复数卷积循环神经网络的语音增强算法,在复数域实现语音幅度和相位的同时增强,以提高增强语音的质量。使用基于复数卷积网络的编码器在复数域提取语音局部特征,再利用复数卷积循环网络对语音的长时信息进行建模,最后使用复数卷积上采样解码器计算语音复数时频掩蔽,实现语音幅度与相位增强。在公开数据集上的实验结果表明,使用所提方法得到的增强语音在语音质量和信噪比提升中均优于主流方法,验证了该网络模型在语音增强任务中的有效性。  相似文献   

11.
针对MMSE语音增强算法低信噪比时产生较大的语音畸变的缺点,提出了一种结合人耳听觉掩蔽效应的MMSE语音增强算法。该算法利用掩蔽阈值来调整MMSE算法中的增益值,使得增强后的语音信号残留噪声和语音畸变较小。通过计算机仿真对增强前后语音信号的信噪比分析以及主观试听表明:改进的MMSE语音增强算法不仅提高了语音信号的信噪比,而且减少了语音畸变,提高了语音的可懂度。  相似文献   

12.
李艳生  刘园  张毅 《计算机应用》2019,39(3):894-898
针对非负矩阵分解(NMF)语音增强算法在低信噪比(SNR)非稳定环境下存在噪声残留的问题,提出一种基于感知掩蔽的重构NMF(PM-RNMF)单通道语音增强算法。首先,将心理声学掩蔽特性应用于NMF语音增强算法中;其次,对不同频率位采用不同的掩蔽阈值,建立自适应感知掩蔽增益函数,通过阈值约束残余噪声能量和语音失真能量;最后,结合语音存在概率(SPP)进行感知增益修正,重构NMF算法,以此建立新的目标函数。仿真结果表明,在不同SNR的3种非稳定噪声环境下,与NMF、重构NMF(RNMF)、感知掩蔽深度神经网络(PM-DNN)算法相比,PM-RNMF算法的感知语音质量评估(PESQ)平均值分别提高了0.767、0.474、0.162,信源失真比(SDR)平均值分别提高了2.785、1.197、0.948。实验结果表明,无论是在低频还是高频PM-RNMF有更好的降噪效果。  相似文献   

13.
基于短时谱估计的语音增强研究   总被引:3,自引:2,他引:1  
基于短时谱估计的语音增强算法具有良好的降噪性能,算法高效且易于实现。本文对谱减法、维纳滤波、最小均方误差估计等此类算法进行系统的论证,结合实验,分析比较了它们的性能差异,并指出了它们各自的优缺点及适用环境。  相似文献   

14.
Speech is the main medium for human communication and interaction. Apart from the traditional telephones, more and more applications come with speech interfaces, which use speech signal as an input for various purposes. However, many of these applications might fail to perform in noisy environments as the signal-to-noise ratio (SNR) degrades. Two important measures for any speech enhancement algorithm are noise suppression and speech distortion. Naturally, different speech enhancement algorithms will have different trade-offs. Moreover, depending on the environment, it is possible that one algorithm will outperform the others in some respects. This paper proposes a multi-filter system, which has the capability of continually adjusting the noise suppression level and the speech distortion level in a Pareto fashion. Moreover, we show that the system works under a variety of noisy environments and we obtain the efficient frontier of the combined filters for each background noise. Because the multi-filters are adapting in parallel, the final system can be implemented on FPGA efficiently.  相似文献   

15.

The speech signals are affected by the background noise distortion that is unfavorable to both the intelligibility as well as the speech quality. Most of the speech processing algorithms function with the spectral magnitude without consideration of the spectral phase by leaving them unexplored and unstructured. The proposed single channel speech enhancement model called the Adaptive Recurrent Nonnegative Matrix Factorization (AR-NMF) is designed based on the phase compensation strategy with deep learning. The two major phases considered here are the training phase and the testing phase. During the process of training, the noisy speech signal is decomposed by the Hurst exponent-based Empirical Mode Decomposition (HEMD) and is converted into the frequency domain using Short Time Fourier Transform. Further, the new AR-NMF is used for denoising, where the tuning factor is optimally generated by the optimized RNN. Here, the hidden neurons are optimized using the proposed Adaptive Attack Power-based Sail Fish Optimization (AAP-SFO) with consideration of minimizing the Mean Absolute Error between the actual value and the predicted value. Finally, this phase compensated speech signal is given to the ISTFT that results in the final denoised clean speech signal. From the analysis, the CSED of AAP-SFO-AR-NMF for the street noise is 58.24%, 57.34%, 56.72%, and 77.37% more than RNMF, esHRNR, esTSNR, and Vuvuzela respectively. The performance of the proposed deep enhancement method is extensively evaluated and compared to diverse adverse noisy environments that describe the superiority of the proposed method.

  相似文献   

16.
为了减小传统谱减法引入的音乐噪声,提出了一种将多频带谱减和听觉掩蔽效应相结合的语音增强算法.用加权递归平滑的方法估计噪声的功率谱,对带噪的语音信号进行多频带谱减,计算听觉掩蔽阈值,再根据掩蔽阈值动态地调节谱减因子,通过增益函数得到增强后语音信号的频谱.仿真实验结果表明,与传统的谱减法相比,该算法在信噪比较低情况下,背景噪声和残余噪声得到了有效的抑制,语音信号的清晰度和可懂度也有了明显提升.  相似文献   

17.
A gain factor adapted by both the intra-frame masking properties of the human auditory system and the inter-frame SNR variation is proposed to enhance a speech signal corrupted by additive noise. In this article we employ an averaging factor, varying with time–frequency, to improve the estimate of the a priori SNR. In turn, this SNR estimate is utilized to adapt a gain factor for speech enhancement. This gain factor reduces the spectral variation over successive frames, so the effect of musical residual noise is mitigated. In addition, the simultaneous masking property of the human ears is also employed to adapt the gain factor. Imperceptive residual noise with energy below the noise masking threshold is retained, resulting in a reduction of speech distortion. Experimental results show that the proposed scheme can efficiently reduce the effect of musical residual noise.  相似文献   

18.
Speech enhancement has received a significant amount of research attention over the past several decades. The enhancement of speech signal is needed so as to improve the degraded signal and the goal is to separate a single mixture into its underlying clean speech and interferer components. This is achieved by having prior knowledge through learning and generation of masks accordingly. Hybridization of the spectral filtering and optimization algorithm is employed for speech enhancement in this paper. The proposed technique uses MMSE (Minimum Mean Squared Error) and PSO (Particle Swarm Optimization) for effective enhancement. The proposed technique is three module technique consisting of pre-processing module, optimization module and spectral filtering module. Loizou’s database and Aurora dataset are used for evaluating the proposed technique using standard evaluation metrics consists of PESQ and SNR. Comparative analysis is also made by comparing with other existing techniques such as MMSE and BNMF. Highest PESQ for proposed technique is 2.75 and highest SNR came about 32.97. The technique gave average PESQ of 2.18 and average SNR of 20.53 which was higher than the average values for other techniques. Hence, we can observe that proposed technique yielded better evaluation metrics than the existing methods.  相似文献   

19.
刘威      郭直清      刘光伟  靳宝      王东 《智能系统学报》2022,17(3):602-616
针对原子优化算法寻优精度弱且易陷入局部极值的问题,本文从种群多样性、参数适应性和位置动态性角度提出一种融合混沌优化、振幅随机补偿和步长演变机制改进的原子搜索优化算法(improved atom search optimization, IASO),并将其成功应用于分类任务。首先,引入帐篷映射(Tent混沌)增强原子种群在搜索空间中的分布均匀性;其次,通过构建振幅函数对算法参数进行随机扰动并加入步长演变因子更新原子位置,以增强算法全局性和收敛性;最后,再将改进算法应用于误差反馈神经网络(BP神经网络)参数优化。通过与6种元启发式算法在20个基准测试函数下的数值实验对比表明:IASO不仅在求解多维基准函数上具有好的寻优性能,且在对BP神经网络参数进行优化时相较于2种对比算法具有更高的分类精度。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号