共查询到16条相似文献,搜索用时 171 毫秒
1.
王景芳 《计算机工程与应用》2011,47(20):147-150
提出了一种适应复杂环境下的高效的实时语音端点检测算法,给出了每帧声信号在滤波中的噪声功率谱的推算方法。先将每帧语音的频谱进行迭代维纳滤波,再将它划分成若干个子带并计算出每个子带的频谱熵,然后把相继若干帧的子带频谱熵经过一组中值滤波器获得每帧的频谱熵,根据频谱熵的值对输入的语音进行分类。实验结果表明,该算法能够有效地区分语音和噪声,可以显著地提高语音识别系统的性能,在不同的噪声环境条件下具有鲁棒性。该算法计算代价小,简单易实现,适合实时语音识别系统的应用。 相似文献
2.
为了提高低信噪比环境下语音增强的效果、算法的鲁棒性.在基于维纳滤波算法的基础上,结合基于频域特征的语音端点检查算法,提出了一种新的语音增强算法.端点检测算法使用小波包ERB子带的谱熵和改进的频域能量的能熵比法.其中,小波包ERB子带的谱熵考虑了人耳听觉掩蔽模型和语音与噪声信号之间的频率分布之间的不同;频域能量利用了有语音帧和无语音帧的能量不同.维纳滤波算法实时采集语音数据并使用新的参数来区别无语音段和有语音段,并在无语音段平滑更新噪声谱.实验结果表明,该端点检测算法能够很好的区分有语音段和无语音段,这就使得在低信噪比的情况下语音增强效果得到了提升,同时算法的鲁棒性和实时性也得到了保障.在与其他两种算法对比中,得到了更好的语音增强效果. 相似文献
3.
前端噪声处理直接关系着语音识别的准确性和稳定性,针对小波去噪算法所分离出的信号不是原始信号的最佳估计,提出一种基于子带谱熵的仿生小波变换(BWT)去噪算法。充分利用子带谱熵端点检测的精确性,区分含噪语音部分和噪声部分,实时更新仿生小波变换中的阈值,精确地区分出噪声信号小波系数,达到语音增强目的。实验结果表明,提出的基于子带谱熵的仿生小波语音增强方法与维纳滤波方法相比,信噪比(SNR)平均提高约8%,所提方法对噪声环境下语音信号有显著的增强效果。 相似文献
4.
5.
一种基于自适应谱熵的端点检测改进方法 总被引:1,自引:0,他引:1
在低信噪比的环境下,为增强与噪声的区分度,提出了一种适应于低信噪比环境的语音端点检测方法.通过改进语音端点检测的特征参数,更好地区分语音信号与噪声信号,提高在低信噪比环境下的端点检测正确率.基于子带谱熵,引入正值常量对基本谱熵参数进行算法改进,得到改良的负谱熵特征,并结合自适应子带选择方法,得到一种新颖的特征参数--自适应子带常量负谱熵.特征在低信噪比的情况下有较强的抗噪能力,并能够准确地检测出语音端点.实验结果表明,不仅快速有效,具有较强的鲁棒性,而且适合低信噪比的语音端点检测. 相似文献
6.
7.
基于频谱方差的抗噪声语音端点检测算法 总被引:1,自引:0,他引:1
在语音识别系统中,对识别的准确性有很重要的作用.对于纯净语音信号,传统的端点检测算法能够很好地检测语音部分的起止点.由于在有噪声干扰的情况下,算法的检测准确度往往会急剧下降.为了改善噪声环境下的端点检测效果,从语音信号和噪声信号频域分布特性的差异出发,用频谱方差数值来区分语音和噪声,提出了基于频谱方差的端点检测算法,并进行了无噪声和噪声环境下的仿真,证明了这种算法在强噪声干扰的情况下也能够取得很好的效果.同时将新算法和传统的基于LPCC的端点检测算法进行了对比试验,实验结果表明,在噪声环境下,新算法的检测精度有明显提高. 相似文献
8.
为了在复杂的噪声环境中区分出语音信号和非语音信号(噪声),提出了一种基于小波及能量熵的带噪语音端点检测方法.该方法利用小波的多分辨率特性以及它对非平稳信号局部特征的表现能力,对含噪语音信号进行小波变换,用各层能量熵值的平均值来有效地区分语音段和非语音段.不同背景噪声及不同信噪比下的实验结果表明,提出的带噪语音端点检测算法获得了较高的检测正确率. 相似文献
9.
提出一种可适应非平稳噪声环境的基于码本学习的改进谱减语音增强算法。该算法分为训练阶段和增强阶段。训练阶段,使用自回归模型对语音和噪声的频谱形状进行建模并构造语音和噪声码本;增强阶段,采用对数谱最小化算法估计出语音和噪声的频谱,通过谱相减消除噪声。算法在每个时间帧估计语音和噪声频谱,即使在语音存在时仍能够有效跟踪快速变化的非平稳噪声;采用自回归模型能得到噪声频谱的平滑估计,减少了音乐噪声。实验仿真表明,相比于传统谱减法和多带谱减法,改进的谱减法具有更好的噪声抑制性能并且语音失真更小。 相似文献
10.
针对语音识别系统对抗环境噪声的实际需求,提出一种二次组合抗噪技术,研究并设计了一种以数字信号处理器(DSP)为硬件平台,以隐马尔可夫模型(HMM)为算法的抗噪声嵌入式语音识别系统.DSP采用型号为TMS320VC5509A的芯片,配以外围硬件电路构成语音识别系统的硬件平台.软件设计以离散隐马尔可夫模型(DHMM)为识别算法进行编程,系统软件主要有识别、训练、学习和USB四个主要模块.实验结果表明:基于二次组合去噪技术的语音识别系统有更好的抗噪声效果. 相似文献
11.
In this paper, we propose a speech enhancement method where the front-end decomposition of the input speech is performed by temporally processing using a filterbank. The proposed method incorporates a perceptually motivated stationary wavelet packet filterbank (PM-SWPFB) and an improved spectral over-subtraction (I-SOS) algorithm for the enhancement of speech in various noise environments. The stationary wavelet packet transform (SWPT) is a shift invariant transform. The PM-SWPFB is obtained by selecting the stationary wavelet packet tree in such a manner that it matches closely the non-linear resolution of the critical band structure of the psychoacoustic model. After the decomposition of the input speech, the I-SOS algorithm is applied in each subband, separately for the estimation of speech. The I-SOS uses a continuous noise estimation approach and estimate noise power from each subband without the need of explicit speech silence detection. The subband noise power is estimated and updated by adaptively smoothing the noisy signal power. The smoothing parameter in each subband is controlled by a function of the estimated signal-to-noise ratio (SNR). The performance of the proposed speech enhancement method is tested on speech signals degraded by various real-world noises. Using objective speech quality measures (SNR, segmental SNR (SegSNR), perceptual evaluation of speech quality (PESQ) score), and spectrograms with informal listening tests, we show that the proposed speech enhancement method outperforms than the spectral subtractive-type algorithms and improves quality and intelligibility of the enhanced speech. 相似文献
12.
针对语音识别系统在噪声环境下不能保持很好鲁棒性的问题,提出了一种切换语音功率谱估计算法。该算法假设语音的幅度谱服从Chi分布,提出了一种改进的基于最小均方误差(MMSE)的语音功率谱估计算法。然后,结合语音存在的概率(SPP),推导出改进的基于语音存在概率的MMSE估计器。接下来,将改进的MSME估计器与传统的维纳滤波器结合。在噪声干扰比较大时,使用改进的MMSE估计器来估计纯净语音的功率谱,当噪声干扰较小时,改用传统的维纳滤波器以减少计算量,最终得到用于识别系统的切换语音功率谱估计算法。实验结果表明,所提算法相比传统的瑞利分布下的MMSE估计器在各种噪声的情况下识别率平均提高在8个百分点左右,在去除噪声干扰、提高识别系统鲁棒性的同时,减小了语音识别系统的功耗。 相似文献
13.
M. F. R. Chowdhury S.-A. Selouani D. O’Shaughnessy 《International Journal of Speech Technology》2012,15(1):5-23
Current automatic speech recognition (ASR) works in off-line mode and needs prior knowledge of the stationary or quasi-stationary
test conditions for expected word recognition accuracy. These requirements limit the application of ASR for real-world applications
where test conditions are highly non-stationary and are not known a priori. This paper presents an innovative frame dynamic rapid adaptation and noise compensation technique for tracking highly non-stationary
noises and its application for on-line ASR. The proposed algorithm is based on a soft computing model using Bayesian on-line
inference for spectral change point detection (BOSCPD) in unknown non-stationary noises. BOSCPD is tested with the MCRA noise
tracking technique for on-line rapid environmental change learning in different non-stationary noise scenarios. The test results
show that the proposed BOSCPD technique reduces the delay in spectral change point detection significantly compared to the
baseline MCRA and its derivatives. The proposed BOSCPD soft computing model is tested for joint additive and channel distortions
compensation (JAC)-based on-line ASR in unknown test conditions using non-stationary noisy speech samples from the Aurora
2 speech database. The simulation results for the on-line AR show significant improvement in recognition accuracy compared
to the baseline Aurora 2 distributed speech recognition (DSR) in batch-mode. 相似文献
14.
This paper addresses the problem of acoustic noise reduction and speech enhancement by adaptive filtering algorithms. Most speech enhancement methods and algorithms which use adaptive filtering structure are generally expressed in fullband form. One of these widespread structures is the Forward Blind Source Separation Structure (FBSS). This FBSS structure is often used to separate speech form noise and therefore enhance the speech signal at the processing output. In this paper, we propose a new subband implementation of this FBSS structure. In order to give more robustness to the proposed structure, we adapt then we apply to this subband structure a new combination of criteria based on the system mismatch and the smoothing filtering errors minimizations. The combination between this proposed subband structure with this optimal criteria allows to obtain a new two-channel subband forward (2CSF) algorithm that improves the convergence speed of the cross adaptive filters which are used to separate speech from noise. Objective tests under various environments are presented showing the good behavior of the proposed 2CSF algorithm. 相似文献
15.
In this paper, we propose a novel front-end speech parameterization technique for automatic speech recognition (ASR) that is less sensitive towards ambient noise and pitch variations. First, using variational mode decomposition (VMD), we break up the short-time magnitude spectrum obtained by discrete Fourier transform into several components. In order to suppress the ill-effects of noise and pitch variations, the spectrum is then sufficiently smoothed. The desired spectral smoothing is achieved by discarding the higher-order variational mode functions and reconstructing the spectrum using the first-two modes only. As a result, the smoothed spectrum closely resembles the spectral envelope. Next, the Mel-frequency cepstral coefficients (MFCC) are extracted using the VMD-based smoothed spectra. The proposed front-end acoustic features are observed to be more robust towards ambient noise and pitch variations than the conventional MFCC features as demonstrated by the experimental evaluations presented in this study. For this purpose, we developed an ASR system using speech data from adult speakers collected under relatively clean recording conditions. State-of-the-art acoustic modeling techniques based on deep neural networks (DNN) and long short-term memory recurrent neural networks (LSTM-RNN) were employed. The ASR systems were then evaluated under noisy test conditions for assessing the noise robustness of the proposed features. To assess robustness towards pitch variations, experimental evaluations were performed on another test set consisting of speech data from child speakers. Transcribing children's speech helps in simulating an ASR task where pitch differences between training and test data are significantly large. The signal domain analyses as well as the experimental evaluations presented in this paper support our claims. 相似文献