首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到16条相似文献,搜索用时 171 毫秒
1.
提出了一种适应复杂环境下的高效的实时语音端点检测算法,给出了每帧声信号在滤波中的噪声功率谱的推算方法。先将每帧语音的频谱进行迭代维纳滤波,再将它划分成若干个子带并计算出每个子带的频谱熵,然后把相继若干帧的子带频谱熵经过一组中值滤波器获得每帧的频谱熵,根据频谱熵的值对输入的语音进行分类。实验结果表明,该算法能够有效地区分语音和噪声,可以显著地提高语音识别系统的性能,在不同的噪声环境条件下具有鲁棒性。该算法计算代价小,简单易实现,适合实时语音识别系统的应用。  相似文献   

2.
为了提高低信噪比环境下语音增强的效果、算法的鲁棒性.在基于维纳滤波算法的基础上,结合基于频域特征的语音端点检查算法,提出了一种新的语音增强算法.端点检测算法使用小波包ERB子带的谱熵和改进的频域能量的能熵比法.其中,小波包ERB子带的谱熵考虑了人耳听觉掩蔽模型和语音与噪声信号之间的频率分布之间的不同;频域能量利用了有语音帧和无语音帧的能量不同.维纳滤波算法实时采集语音数据并使用新的参数来区别无语音段和有语音段,并在无语音段平滑更新噪声谱.实验结果表明,该端点检测算法能够很好的区分有语音段和无语音段,这就使得在低信噪比的情况下语音增强效果得到了提升,同时算法的鲁棒性和实时性也得到了保障.在与其他两种算法对比中,得到了更好的语音增强效果.  相似文献   

3.
刘艳  倪万顺 《计算机应用》2015,35(3):868-871
前端噪声处理直接关系着语音识别的准确性和稳定性,针对小波去噪算法所分离出的信号不是原始信号的最佳估计,提出一种基于子带谱熵的仿生小波变换(BWT)去噪算法。充分利用子带谱熵端点检测的精确性,区分含噪语音部分和噪声部分,实时更新仿生小波变换中的阈值,精确地区分出噪声信号小波系数,达到语音增强目的。实验结果表明,提出的基于子带谱熵的仿生小波语音增强方法与维纳滤波方法相比,信噪比(SNR)平均提高约8%,所提方法对噪声环境下语音信号有显著的增强效果。  相似文献   

4.
针对带噪面罩语音识别率低的问题,结合语音增强算法,对面罩语音进行噪声抑制处理,提高信噪比,在语音增强中提出了一种改进的维纳滤波法,通过谱熵法检测有话帧和无话帧来更新噪声功率谱,同时引入参数控制增益函数;提取面罩语音信号的Mel频率倒谱系数(MFCC)作为特征参数;通过卷积神经网络(CNN)进行训练和识别,并在每个池化层后经局部响应归一化(LRN)进行优化.实验结果表明:该识别系统能够在很大程度上提高带噪面罩语音的识别率.  相似文献   

5.
一种基于自适应谱熵的端点检测改进方法   总被引:1,自引:0,他引:1  
在低信噪比的环境下,为增强与噪声的区分度,提出了一种适应于低信噪比环境的语音端点检测方法.通过改进语音端点检测的特征参数,更好地区分语音信号与噪声信号,提高在低信噪比环境下的端点检测正确率.基于子带谱熵,引入正值常量对基本谱熵参数进行算法改进,得到改良的负谱熵特征,并结合自适应子带选择方法,得到一种新颖的特征参数--自适应子带常量负谱熵.特征在低信噪比的情况下有较强的抗噪能力,并能够准确地检测出语音端点.实验结果表明,不仅快速有效,具有较强的鲁棒性,而且适合低信噪比的语音端点检测.  相似文献   

6.
提出一种用于语音识别的鲁棒特征提取算法。该算法基于子带主频率信息,实现子带主频率信息与子带能量信息相结合,在特征参数中保留语谱中子带峰值位置信息。使用该算法设计抗噪孤立词语音识别系统,分别在白高斯噪声和背景语音噪声环境下,与传统特征算法做多种信噪比对比实验。试验结果表明该特征算法在2种噪声环境下的识别率有不同程度提高,具有良好的噪声鲁棒性。  相似文献   

7.
基于频谱方差的抗噪声语音端点检测算法   总被引:1,自引:0,他引:1  
在语音识别系统中,对识别的准确性有很重要的作用.对于纯净语音信号,传统的端点检测算法能够很好地检测语音部分的起止点.由于在有噪声干扰的情况下,算法的检测准确度往往会急剧下降.为了改善噪声环境下的端点检测效果,从语音信号和噪声信号频域分布特性的差异出发,用频谱方差数值来区分语音和噪声,提出了基于频谱方差的端点检测算法,并进行了无噪声和噪声环境下的仿真,证明了这种算法在强噪声干扰的情况下也能够取得很好的效果.同时将新算法和传统的基于LPCC的端点检测算法进行了对比试验,实验结果表明,在噪声环境下,新算法的检测精度有明显提高.  相似文献   

8.
为了在复杂的噪声环境中区分出语音信号和非语音信号(噪声),提出了一种基于小波及能量熵的带噪语音端点检测方法.该方法利用小波的多分辨率特性以及它对非平稳信号局部特征的表现能力,对含噪语音信号进行小波变换,用各层能量熵值的平均值来有效地区分语音段和非语音段.不同背景噪声及不同信噪比下的实验结果表明,提出的带噪语音端点检测算法获得了较高的检测正确率.  相似文献   

9.
提出一种可适应非平稳噪声环境的基于码本学习的改进谱减语音增强算法。该算法分为训练阶段和增强阶段。训练阶段,使用自回归模型对语音和噪声的频谱形状进行建模并构造语音和噪声码本;增强阶段,采用对数谱最小化算法估计出语音和噪声的频谱,通过谱相减消除噪声。算法在每个时间帧估计语音和噪声频谱,即使在语音存在时仍能够有效跟踪快速变化的非平稳噪声;采用自回归模型能得到噪声频谱的平滑估计,减少了音乐噪声。实验仿真表明,相比于传统谱减法和多带谱减法,改进的谱减法具有更好的噪声抑制性能并且语音失真更小。  相似文献   

10.
针对语音识别系统对抗环境噪声的实际需求,提出一种二次组合抗噪技术,研究并设计了一种以数字信号处理器(DSP)为硬件平台,以隐马尔可夫模型(HMM)为算法的抗噪声嵌入式语音识别系统.DSP采用型号为TMS320VC5509A的芯片,配以外围硬件电路构成语音识别系统的硬件平台.软件设计以离散隐马尔可夫模型(DHMM)为识别算法进行编程,系统软件主要有识别、训练、学习和USB四个主要模块.实验结果表明:基于二次组合去噪技术的语音识别系统有更好的抗噪声效果.  相似文献   

11.
In this paper, we propose a speech enhancement method where the front-end decomposition of the input speech is performed by temporally processing using a filterbank. The proposed method incorporates a perceptually motivated stationary wavelet packet filterbank (PM-SWPFB) and an improved spectral over-subtraction (I-SOS) algorithm for the enhancement of speech in various noise environments. The stationary wavelet packet transform (SWPT) is a shift invariant transform. The PM-SWPFB is obtained by selecting the stationary wavelet packet tree in such a manner that it matches closely the non-linear resolution of the critical band structure of the psychoacoustic model. After the decomposition of the input speech, the I-SOS algorithm is applied in each subband, separately for the estimation of speech. The I-SOS uses a continuous noise estimation approach and estimate noise power from each subband without the need of explicit speech silence detection. The subband noise power is estimated and updated by adaptively smoothing the noisy signal power. The smoothing parameter in each subband is controlled by a function of the estimated signal-to-noise ratio (SNR). The performance of the proposed speech enhancement method is tested on speech signals degraded by various real-world noises. Using objective speech quality measures (SNR, segmental SNR (SegSNR), perceptual evaluation of speech quality (PESQ) score), and spectrograms with informal listening tests, we show that the proposed speech enhancement method outperforms than the spectral subtractive-type algorithms and improves quality and intelligibility of the enhanced speech.  相似文献   

12.
刘金刚  周翊  马永保  刘宏清 《计算机应用》2016,36(12):3369-3373
针对语音识别系统在噪声环境下不能保持很好鲁棒性的问题,提出了一种切换语音功率谱估计算法。该算法假设语音的幅度谱服从Chi分布,提出了一种改进的基于最小均方误差(MMSE)的语音功率谱估计算法。然后,结合语音存在的概率(SPP),推导出改进的基于语音存在概率的MMSE估计器。接下来,将改进的MSME估计器与传统的维纳滤波器结合。在噪声干扰比较大时,使用改进的MMSE估计器来估计纯净语音的功率谱,当噪声干扰较小时,改用传统的维纳滤波器以减少计算量,最终得到用于识别系统的切换语音功率谱估计算法。实验结果表明,所提算法相比传统的瑞利分布下的MMSE估计器在各种噪声的情况下识别率平均提高在8个百分点左右,在去除噪声干扰、提高识别系统鲁棒性的同时,减小了语音识别系统的功耗。  相似文献   

13.
Current automatic speech recognition (ASR) works in off-line mode and needs prior knowledge of the stationary or quasi-stationary test conditions for expected word recognition accuracy. These requirements limit the application of ASR for real-world applications where test conditions are highly non-stationary and are not known a priori. This paper presents an innovative frame dynamic rapid adaptation and noise compensation technique for tracking highly non-stationary noises and its application for on-line ASR. The proposed algorithm is based on a soft computing model using Bayesian on-line inference for spectral change point detection (BOSCPD) in unknown non-stationary noises. BOSCPD is tested with the MCRA noise tracking technique for on-line rapid environmental change learning in different non-stationary noise scenarios. The test results show that the proposed BOSCPD technique reduces the delay in spectral change point detection significantly compared to the baseline MCRA and its derivatives. The proposed BOSCPD soft computing model is tested for joint additive and channel distortions compensation (JAC)-based on-line ASR in unknown test conditions using non-stationary noisy speech samples from the Aurora 2 speech database. The simulation results for the on-line AR show significant improvement in recognition accuracy compared to the baseline Aurora 2 distributed speech recognition (DSR) in batch-mode.  相似文献   

14.
This paper addresses the problem of acoustic noise reduction and speech enhancement by adaptive filtering algorithms. Most speech enhancement methods and algorithms which use adaptive filtering structure are generally expressed in fullband form. One of these widespread structures is the Forward Blind Source Separation Structure (FBSS). This FBSS structure is often used to separate speech form noise and therefore enhance the speech signal at the processing output. In this paper, we propose a new subband implementation of this FBSS structure. In order to give more robustness to the proposed structure, we adapt then we apply to this subband structure a new combination of criteria based on the system mismatch and the smoothing filtering errors minimizations. The combination between this proposed subband structure with this optimal criteria allows to obtain a new two-channel subband forward (2CSF) algorithm that improves the convergence speed of the cross adaptive filters which are used to separate speech from noise. Objective tests under various environments are presented showing the good behavior of the proposed 2CSF algorithm.  相似文献   

15.
In this paper, we propose a novel front-end speech parameterization technique for automatic speech recognition (ASR) that is less sensitive towards ambient noise and pitch variations. First, using variational mode decomposition (VMD), we break up the short-time magnitude spectrum obtained by discrete Fourier transform into several components. In order to suppress the ill-effects of noise and pitch variations, the spectrum is then sufficiently smoothed. The desired spectral smoothing is achieved by discarding the higher-order variational mode functions and reconstructing the spectrum using the first-two modes only. As a result, the smoothed spectrum closely resembles the spectral envelope. Next, the Mel-frequency cepstral coefficients (MFCC) are extracted using the VMD-based smoothed spectra. The proposed front-end acoustic features are observed to be more robust towards ambient noise and pitch variations than the conventional MFCC features as demonstrated by the experimental evaluations presented in this study. For this purpose, we developed an ASR system using speech data from adult speakers collected under relatively clean recording conditions. State-of-the-art acoustic modeling techniques based on deep neural networks (DNN) and long short-term memory recurrent neural networks (LSTM-RNN) were employed. The ASR systems were then evaluated under noisy test conditions for assessing the noise robustness of the proposed features. To assess robustness towards pitch variations, experimental evaluations were performed on another test set consisting of speech data from child speakers. Transcribing children's speech helps in simulating an ASR task where pitch differences between training and test data are significantly large. The signal domain analyses as well as the experimental evaluations presented in this paper support our claims.  相似文献   

16.
基于子带能熵比的语音端点检测算法   总被引:1,自引:0,他引:1  
张毅  王可佳  席兵  颜博 《计算机科学》2017,44(5):304-307, 319
准确地识别语音端点是语音识别过程中的一个重要步骤。在低信噪比环境下,为更好地增强语音和噪声的区分度,提高语音端点检测系统的准确率,在分析了常规子带谱熵端点检测算法的基础上结合子带能量,提出了一种基于子带能熵比的语音端点检测算法。该算法将子带能量和子带谱熵的比值作为端点检测的重要参数,以此设定阈值进行语音端点的检测。实验表明,该算法快速高效,具有较高的鲁棒性,在较低的信噪比环境下能准确地进行语音端点检测。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号