首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
It is essential to ensure quality of service (QoS) when offering a speech recognition service for use in noisy environments. This means that the recognition performance in the target noise environment must be investigated. One approach is to estimate the recognition performance from a distortion value, which represents the difference between noisy speech and its original clean version. Previously, estimation methods using the segmental signal-to-noise ratio (SNRseg), the cepstral distance (CD), and the perceptual evaluation of speech quality (PESQ) have been proposed. However, their estimation accuracy has not been verified for the case when a noise reduction algorithm is adopted as a preprocessing stage in speech recognition. We, therefore, evaluated the effectiveness of these distortion measures by experiments using the AURORA-2J connected digit recognition task and four different noise reduction algorithms. The results showed that in each case the distortion measure correlates well with the word accuracy when the estimators used are optimized for each individual noise reduction algorithm. In addition, it was confirmed that when a single estimator, optimized for all the noise reduction algorithms, is used, the PESQ method gives a more accurate estimate than SNRseg and CD. Furthermore, we have proposed the use of artificial voice of several seconds duration instead of a large amount of real speech and confirmed that a relatively accurate estimate can be obtained by using the artificial voice.  相似文献   

2.
We describe a novel single-ended algorithm constructed from models of speech signals, including clean and degraded speech, and speech corrupted by multiplicative noise and temporal discontinuities. Machine learning methods are used to design the models, including Gaussian mixture models, support vector machines, and random forest classifiers. Estimates of the subjective mean opinion score (MOS) generated by the models are combined using hard or soft decisions generated by a classifier which has learned to match the input signal with the models. Test results show the algorithm outperforming ITU-T P.563, the current “state-of-art” standard single-ended algorithm. Employed in a distributed double-ended measurement configuration, the proposed algorithm is found to be more effective than P.563 in assessing the quality of noise reduction systems and can provide a functionality not available with P.862 PESQ, the current double-ended standard algorithm.  相似文献   

3.
柏财通  崔翛龙  郑会吉  李爱 《计算机应用》2022,42(10):3217-3223
针对标注神经网络训练数据的成本日益增加与噪声干扰阻碍语音识别系统性能提升的问题,提出一种基于自监督知识迁移的鲁棒性语音识别模型的模型训练算法。首先,在预处理阶段提取原始语音样本的三个人工特征;然后,在训练阶段将特征提取网络生成的高级特征分别通过三个浅层网络来拟合预处理阶段提取的人工特征;同时,把特征提取前端与语音识别后端进行交叉训练,并合并它们的损失函数;最后,通过梯度反向传播令特征提取网络学会提取更有助于去噪语音识别的高级特征,从而实现人工知识迁移与去噪,并高效利用了训练数据。在军事装备控制的应用场景下,基于加噪后的THCHS-30、希尔贝壳数据集AISHELL-1与ST-CMDS这三个开源中文语音识别数据集以及军事装备控制指令的数据集上进行测试,实验结果表明,基于自监督知识迁移的鲁棒性语音识别模型的模型训练算法词错率可以降低到0.12,不仅可以实现对鲁棒性语音识别模型的模型训练,同时通过自监督知识迁移提高了训练样本的利用率,可完成装备控制任务。  相似文献   

4.
针对OM-LSA(optimally modified log-spectral amplitude estimator)算法产生的残留噪声,提出了一种结合OM-LSA和小波阈值去噪的语音增强算法。首先,进行语音对数幅度谱估计;然后,估计残留噪声,利用带噪语音第一级小波系数和语音不存在时的增益函数进行估计,解决了常规方法对增强后语音噪声估计不准确的问题;最后,在小波域利用软阈值法对语音信号进行阈值处理。实验结果表明,提出的算法有效地去除了OM-LSA算法中的残余噪声,在分段信噪比(segmental signal-to-noise ratio,SegSNR)和对数谱失真(log-spectral distortion,LSD)等指标评价上有较大的提高。  相似文献   

5.
维纳滤波算法是改善噪声环境下听障患者语音理解度的常用算法之一。针对传统维纳滤波算法噪声谱估计偏差大的问题,提出一种基于改进的多通道维纳滤波算法的助听器语音降噪算法。算法首先结合人耳听觉特性和助听器响度补偿的特点,将语音信号进行Gammatone分解为多路子带信号。然后在每个子带内用基于先验信噪比估计的维纳滤波器进行语音增强处理。最后通过综合子带信号,得到增强的语音。此外,为了改善维纳滤波算法噪声谱估计的问题,提出一种基于包络估计的语音活动检测算法,并用于改善维纳滤波性能。实验结果表明,与传统维纳滤波法相比,该方法能更有效地抑制残留噪声,提高语音可懂度,具有较高的实用价值。  相似文献   

6.
提出了一种新的基于阈值的小波域语音降噪算法。采用小波包对含噪语音进行分解,克服了传统的正交小波变换的缺陷。采用自适应阈值的方法,对每一尺度上的噪声最大量进行去噪,保留有用信号,可以进一步提高信噪比,仿真实验表明,该方法有更好的去噪效果。  相似文献   

7.
为解决噪声环境下语音识别率降低以及传统波束形成算法难以处理空间噪声的问题,基于双微阵列结构提出了一种改进的最小方差无畸变响应(MVDR)波束形成方法。首先,采用对角加载提高双微阵列增益,并利用递归矩阵求逆降低计算复杂度;然后,通过后置调制域谱减法对语音作进一步处理,解决了一般谱减法容易产生音乐噪声的问题,有效减小了语音畸变,获得了良好的噪声抑制效果;最后,采用卷积神经网络(CNN)进行语音模型的训练,提取语音深层次的特征,有效地解决了语音信号多样性问题。实验结果表明,提出的方法在经CNN训练的语音识别系统模型中取得了较好的识别效果,在信噪比为10 dB的F16噪声环境下的语音识别率达到了92.3%,具有良好的稳健性。  相似文献   

8.
We present a new speech enhancement scheme for a single-microphone system to meet the demand for quality noise reduction algorithms capable of operating at a very low signal-to-noise ratio. A psychoacoustic model is incorporated into the generalized perceptual wavelet denoising method to reduce the residual noise and improve the intelligibility of speech. The proposed method is a generalized time-frequency subtraction algorithm, which advantageously exploits the wavelet multirate signal representation to preserve the critical transient information. Simultaneous masking and temporal masking of the human auditory system are modeled by the perceptual wavelet packet transform via the frequency and temporal localization of speech components. The wavelet coefficients are used to calculate the Bark spreading energy and temporal spreading energy, from which a time-frequency masking threshold is deduced to adaptively adjust the subtraction parameters of the proposed method. An unvoiced speech enhancement algorithm is also integrated into the system to improve the intelligibility of speech. Through rigorous objective and subjective evaluations, it is shown that the proposed speech enhancement system is capable of reducing noise with little speech degradation in adverse noise environments and the overall performance is superior to several competitive methods.  相似文献   

9.
We present a new speech enhancement scheme for a single-microphone system to meet the demand for quality noise reduction algorithms capable of operating at a very low signal-to-noise ratio. A psychoacoustic model is incorporated into the generalized perceptual wavelet denoising method to reduce the residual noise and improve the intelligibility of speech. The proposed method is a generalized time-frequency subtraction algorithm, which advantageously exploits the wavelet multirate signal representation to preserve the critical transient information. Simultaneous masking and temporal masking of the human auditory system are modeled by the perceptual wavelet packet transform via the frequency and temporal localization of speech components. The wavelet coefficients are used to calculate the Bark spreading energy and temporal spreading energy, from which a time-frequency masking threshold is deduced to adaptively adjust the subtraction parameters of the proposed method. An unvoiced speech enhancement algorithm is also integrated into the system to improve the intelligibility of speech. Through rigorous objective and subjective evaluations, it is shown that the proposed speech enhancement system is capable of reducing noise with little speech degradation in adverse noise environments and the overall performance is superior to several competitive methods.  相似文献   

10.
联合听觉掩蔽效应的子空间语音增强算法   总被引:1,自引:0,他引:1       下载免费PDF全文
在经典子空间语音增强算法中,因语音特征值估计偏差会造成语音失真和音乐噪声。针对该问题,提出一种联合听觉掩蔽效应的语音增强算法。该算法联合掩蔽阈值自适应调节噪声特征值的抑制系数,并利用维纳滤波对音乐噪声的抑制性,对该特征值并行修正,最终还原出纯净的语音。实验结果证明,该算法在白噪声和有色噪声的背景下,与经典子空间的语音增强算法相比,能提高信噪比,减少语音失真和音乐噪声。  相似文献   

11.
This paper proposes a speech enhancement approach to suppress the interference of car noise. A linear microphone array is adopted for far-talking speech acquisition and delay-and-sum beamforming noise reduction. We present an effective time delay estimator using the coherence function between the reference microphone and the beamformed speech. To further enhance the beamformed speech, we exploit an improved Wiener filter where the resulting noise correlation in microphone array is relatively small so that the performance of optimal Wiener filtering could be achieved. Also, due to the serious degradation in low frequency car speech, we develop a spectral weighting function to compensate the low frequency filtering. These two processing units serve as the post filters to attain the desirable enhancement performance. In the experiments on microphone array speech in presence of real and simulated car noises, we find that the proposed algorithm performs well. Performance is measured in terms of the signal-to-noise ratio and the word error rate. The combined delay-and-sum beamformer and two post filters obtain the best results compared to other methods.  相似文献   

12.
含噪语音实时迭代维纳滤波   总被引:1,自引:1,他引:0       下载免费PDF全文
针对传统去噪方法在强背景噪声情况下,提取声音信号的能力变弱甚至失效与对不同噪声环境适应性差,提出了迭代维纳滤波声音信号特征提取方法。给出了语音噪声频谱与功率谱信噪比迭代更新机制与具体实施方案。实验仿真表明,该算法能有效地去噪滤波,显著地提高语音识别系统性能,且在不同的噪声环境和信噪比条件下具有鲁棒性。该算法计算代价小,简单易实现,适用于嵌入式语音识别系统。  相似文献   

13.
基于语音增强失真补偿的抗噪声语音识别技术   总被引:1,自引:0,他引:1  
本文提出了一种基于语音增强失真补偿的抗噪声语音识别算法。在前端,语音增强有效地抑制背景噪声;语音增强带来的频谱失真和剩余噪声是对语音识别不利的因素,其影响将通过识别阶段的并行模型合并或特征提取阶段的倒谱均值归一化得到补偿。实验结果表明,此算法能够在非常宽的信噪比范围内显著的提高语音识别系统在噪声环境下的识别精度,在低信噪比情况下的效果尤其明显,如对-5dB的白噪声,相对于基线识别器,该算法可使误识率下降67.4%。  相似文献   

14.
噪声环境下的语音识别性能研究   总被引:7,自引:1,他引:6  
在变强噪音的情况下,语音识别的正确率会迅速下降;当噪声较强并且强度不断发生变化的时候,端点检测是一个难题;提出了两种方法保证噪声较强而且强度不断发生改变情况下的语音识别率:基于LPC距离的端点检测算法和白适应的抗噪语音特征参数提取算法;经过实验证明,采用了这两种不同于传统算法的新方法后,变强噪音情况下的语音识别率得到了明显的改善。  相似文献   

15.
We propose a new method for estimating directions of arrival (DOAs) of sound sources, both in azimuthal and elevation angle, using two directional microphones. This method adopts weighted Wiener gain (WWG) for DOA estimation. WWG is an estimate of the Wiener gain that we proposed for use in automatic gain control to enhance speech that is degraded by additive noise. Angular resolution of WWG arises from spectral subtraction (SS)-based noise reduction involved in the WWG calculation, which enhances the signal from the look direction while suppressing signals from other directions. Because WWG involves two-channel SS, which can deal with instantaneous noise, noise sources need not to be stationary, as they must be with ordinary single-channel SS. We further propose the exploitation of a pair of directional microphones whose front directions are arranged in rotational symmetry. The time difference and amplitude difference between the two-channel signal provided by the microphones are utilized to yield a two-dimensional resolution of DOA. We evaluated the proposed method through computer simulations and compared it to three DOA estimation methods that are based on a cross-correlation function and two popular high-resolution methods of multiple signal classification and minimum variance method. Evaluation results of the source detection rate and estimation accuracy demonstrate the remarkable superiority of our method compared to the other methods in conditions where multiple speech sources exist  相似文献   

16.
In this paper, we propose a speech enhancement method where the front-end decomposition of the input speech is performed by temporally processing using a filterbank. The proposed method incorporates a perceptually motivated stationary wavelet packet filterbank (PM-SWPFB) and an improved spectral over-subtraction (I-SOS) algorithm for the enhancement of speech in various noise environments. The stationary wavelet packet transform (SWPT) is a shift invariant transform. The PM-SWPFB is obtained by selecting the stationary wavelet packet tree in such a manner that it matches closely the non-linear resolution of the critical band structure of the psychoacoustic model. After the decomposition of the input speech, the I-SOS algorithm is applied in each subband, separately for the estimation of speech. The I-SOS uses a continuous noise estimation approach and estimate noise power from each subband without the need of explicit speech silence detection. The subband noise power is estimated and updated by adaptively smoothing the noisy signal power. The smoothing parameter in each subband is controlled by a function of the estimated signal-to-noise ratio (SNR). The performance of the proposed speech enhancement method is tested on speech signals degraded by various real-world noises. Using objective speech quality measures (SNR, segmental SNR (SegSNR), perceptual evaluation of speech quality (PESQ) score), and spectrograms with informal listening tests, we show that the proposed speech enhancement method outperforms than the spectral subtractive-type algorithms and improves quality and intelligibility of the enhanced speech.  相似文献   

17.
提出了一种基于声音场景分类的噪声抑制算法。算法使用调制滤波法对纯语音、纯噪音和含噪语音3种场景进行分类,并根据分类结果调整噪声抑制算法参数集,得到不同的抑制系数。本文方法在助听器测试系统中取得了良好的实验效果,场景分类正确率在95%以上。在不同噪声类型情况下,经过本文算法处理的输出语音信号取得了良好的信噪比和MOS评分的提升。本文算法可以有效地提高数字助听器输出语音质量。  相似文献   

18.
Speech recognition systems intended for everyday use must be able to cope with a large variety of noise types and levels, including highly non-stationary multi-source mixtures. This study applies spectral factorisation algorithms and long temporal context for separating speech and noise from mixed signals. To adapt the system to varying environments, noise models are acquired from the context, or learnt from the mixture itself without prior information. We also propose methods for reducing the size of the bases used for speech and noise modelling by 20–40 times for better practical applicability. We evaluate the performance of the methods both as a standalone classifier and as a signal-enhancing front-end for external recognisers. For the CHiME noisy speech corpus containing non-stationary multi-source household noises at signal-to-noise ratios ranging from +9 to ?6 dB, we report average keyword recognition rates up to 87.8% using a single-stream sparse classification algorithm.  相似文献   

19.
杨海燕  吴雷  周萍 《测控技术》2019,38(5):88-93
在连续语音识别系统中,针对强噪声环境下传统双门限语音检测方法出现的误检问题,提出了一种结合压缩感知理论和MFCC倒谱系数的端点检测算法。该算法采用Hadamard随机观测矩阵和改进的OMP重构算法对语音信号进行压缩感知与重构,利用语音信号在离散余弦基上的近似稀疏性,提取重构信号的MFCC倒谱系数来检测语音信号的端点。仿真结果表明,提出的改进算法具有较强的鲁棒性,能满足在强噪声环境下对连续语音信号进行有效端点检测的要求。  相似文献   

20.
A speech pre-processing algorithm is presented that improves the speech intelligibility in noise for the near-end listener. The algorithm improves intelligibility by optimally redistributing the speech energy over time and frequency according to a perceptual distortion measure, which is based on a spectro-temporal auditory model. Since this auditory model takes into account short-time information, transients will receive more amplification than stationary vowels, which has been shown to be beneficial for intelligibility of speech in noise. The proposed method is compared to unprocessed speech and two reference methods using an intelligibility listening test. Results show that the proposed method leads to significant intelligibility gains while still preserving quality. Although one of the methods used as a reference obtained higher intelligibility gains, this happened at the cost of decreased quality. Matlab code is provided.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号