共查询到20条相似文献,搜索用时 265 毫秒
1.
2.
为了提高低信噪比下语音端点检测的性能,提出了一种改进的基于谱减法和自适应子带谱熵的语音端点检测方法。该方法先利用谱减法对带噪语音消除加性噪声,及时更新背景噪声估计,再对增强后的语音信号利用改进的自适应子带谱熵进行端点检测。实验结果表明,该方法具有良好的检测性能,相对传统方法提高了端点检测的准确率,在低信噪比环境下仍能比较准确地检测到语音的端点。 相似文献
3.
一种基于自适应谱熵的端点检测改进方法 总被引:1,自引:0,他引:1
在低信噪比的环境下,为增强与噪声的区分度,提出了一种适应于低信噪比环境的语音端点检测方法.通过改进语音端点检测的特征参数,更好地区分语音信号与噪声信号,提高在低信噪比环境下的端点检测正确率.基于子带谱熵,引入正值常量对基本谱熵参数进行算法改进,得到改良的负谱熵特征,并结合自适应子带选择方法,得到一种新颖的特征参数--自适应子带常量负谱熵.特征在低信噪比的情况下有较强的抗噪能力,并能够准确地检测出语音端点.实验结果表明,不仅快速有效,具有较强的鲁棒性,而且适合低信噪比的语音端点检测. 相似文献
4.
5.
为了进行有效的语音信号处理,降低语音信号的冗余度,通常采用端点检测技术来提取语音信号中的有效部分。而传统谱熵端点检测算法由于判定门限为固定值,其在低信噪比条件下检测性能急剧下降,提出了一种基于动态加权门限的检测方法,对每个判定的噪音帧的谱熵与无声段噪音谱熵进行加权平均,得到新的噪音谱熵作为更新后的门限值;在判定过程中引入谱减法提高信噪比,进一步降低噪声干扰。仿真实验结果证明,相对于传统谱熵端点检测方法,该方法在低信噪比的条件下仍然能够更为准确地检测到语音的端点。 相似文献
6.
《计算机应用与软件》2017,(11)
语音端点检测对于构建实际语音识别系统具有重要的意义。为了提升在低信噪比条件下语音端点检测算法的性能,提出一种基于最大熵谱和时频特性的端点检测算法。对分帧后的语音信号通过最大熵估算出功率谱,并根据带噪语音信号时频域上的特性进行特征捕捉,从而进行端点检测。实验结果表明,此方法在较低的信噪比下(-9~0 dB)能够比较准确地捕捉语音信号的特征,明显地提高了端点检测的准确性。 相似文献
7.
8.
基于临界频带及能量熵的语音端点检测 总被引:1,自引:0,他引:1
语音端点检测的准确性直接关系着语音识别、合成、增强等语音领域的准确性,为了提高语音端点检测的有效性,提出了一种基于临界频带及能量熵的语音端点检测算法。算法充分利用人耳听觉特性的频率分布,将含噪语音信号进行临界频带划分,并结合各频带内信号的能量熵值在语音段和噪声段的不同分布,实现不同背景噪声下语音端点检测。实验结果表明,提出的语音端点检测算法与传统的短时能量法相比,检测正确率平均高1.6个百分点。所提方法在不同噪声的低信噪比(SNR)环境下均能实现语音端点检测。 相似文献
9.
10.
语音端点检测是将采集到的语音信号从复杂的噪声背景中提取出来,确定每段语音的开始和结束,是后续处理的基础。对于语音端点检测在低信噪比的复杂噪声环境下准确率低的问题,提出了一种多窗谱估计减噪和子带能熵比法结合的语音端点检测算法。该算法通过改进多窗谱谱减法对语音信号进行减噪,在分析了常规谱熵端点检测算法的基础上结合对数能量,以改进的子带能熵比作为阈值进行端点检测。实验表明,该算法在不同环境的低信噪环境下,准确率高,具有较高的鲁棒性。 相似文献
11.
基于矢量量化技术的钢水连铸下渣检测方法的研究 总被引:1,自引:0,他引:1
在钢水浇注后期,为了提高钢材质量,需要判断钢水浇铸是否下渣。矢量量化技术作为一种非参数的模式识别方法,已经成功应用于语音编码、语音合成、语音识别和说话人识别方面。在分析大量浇铸机构振动信号的基础上,创造性地把矢量量化技术引入钢水浇铸的下渣识别中。实验结果表明,这种方法是有效的。 相似文献
12.
基于独立分量分析特征提取的带噪信号端点检测 总被引:2,自引:0,他引:2
运用独立分量分析(ICA)提取信号高阶统计特征的方法,提出一种新的利用信号自身统计特性的信噪区分方法,由于ICA变换可以增大语音和噪声的统计性差别,故在ICA域内可以有效区分语音和噪声。在此基础上提出了ICA能量(ICAE)和滤波ICAE(FICAE)特征来进行端点检测。实验表明,结合FICAE与ICAE的端点检测方法在不同信噪比时具有一定的稳定性,在很低信噪比下也能有效检测出语音的端点,显示了良好的抗噪性能,为强背景噪声下弱信号的端点检测提供了新的途径。 相似文献
13.
14.
15.
16.
Ergun Erçelebi Author Vitae 《Computers & Electrical Engineering》2004,30(2):79-95
This paper presents a new approach to speech enhancement based on modified least mean square-multi notch adaptive digital filter (MNADF). This approach differs from traditional speech enhancement methods since no a priori knowledge of the noise source statistics is required. Specifically, the proposed method is applied to the case where speech quality and intelligibility deteriorates in the presence of background noise. Speech coders and automatic speech recognition systems are designed to act on clean speech signals. Therefore, corrupted speech signals by the noise must be enhanced before their processing. The proposed method uses a primary input containing the corrupted speech signal and a reference input containing noise only. The new computationally efficient algorithm is developed here based on tracking significant frequencies of the noise and implementing MNADF at those frequencies. To track frequencies of the noise time-frequency analysis method such as short time frequency transform is used. Different types of noises from Noisex-92 database are used to degrade real speech signals. Objective measures, the study of the speech spectrograms and global signal-to-noise ratio (SNR), segmental SNR (segSNR) as well as subjective listing test demonstrate consistently superior enhancement performance of the proposed method over tradition speech enhancement method such as spectral subtraction. 相似文献
17.
Emotion recognition in speech signals is currently a very active research topic and has attracted much attention within the engineering application area. This paper presents a new approach of robust emotion recognition in speech signals in noisy environment. By using a weighted sparse representation model based on the maximum likelihood estimation, an enhanced sparse representation classifier is proposed for robust emotion recognition in noisy speech. The effectiveness and robustness of the proposed method is investigated on clean and noisy emotional speech. The proposed method is compared with six typical classifiers, including linear discriminant classifier, K-nearest neighbor, C4.5 decision tree, radial basis function neural networks, support vector machines as well as sparse representation classifier. Experimental results on two publicly available emotional speech databases, that is, the Berlin database and the Polish database, demonstrate the promising performance of the proposed method on the task of robust emotion recognition in noisy speech, outperforming the other used methods. 相似文献
18.
《Expert systems with applications》2007,32(2):485-498
Speech and speaker recognition is an important topic to be performed by a computer system. In this paper, an expert speaker recognition system based on optimum wavelet packet entropy is proposed for speaker recognition by using real speech/voice signal. This study contains both the combination of the new feature extraction and classification approach by using optimum wavelet packet entropy parameter values. These optimum wavelet packet entropy values are obtained from measured real English language speech/voice signal waveforms using speech experimental set. A genetic-wavelet packet-neural network (GWPNN) model is developed in this study. GWPNN includes three layers which are genetic algorithm, wavelet packet and multi-layer perception. The genetic algorithm layer of GWPNN is used for selecting the feature extraction method and obtaining the optimum wavelet entropy parameter values. In this study, one of the four different feature extraction methods is selected by using genetic algorithm. Alternative feature extraction methods are wavelet packet decomposition, wavelet packet decomposition – short-time Fourier transform, wavelet packet decomposition – Born–Jordan time–frequency representation, wavelet packet decomposition – Choi–Williams time–frequency representation. The wavelet packet layer is used for optimum feature extraction in the time–frequency domain and is composed of wavelet packet decomposition and wavelet packet entropies. The multi-layer perceptron of GWPNN, which is a feed-forward neural network, is used for evaluating the fitness function of the genetic algorithm and for classification speakers. The performance of the developed system has been evaluated by using noisy English speech/voice signals. The test results showed that this system was effective in detecting real speech signals. The correct classification rate was about 85% for speaker classification. 相似文献
19.
This paper describes a new algorithm for automatically detecting creak in speech signals. Detection is made by utilising two new acoustic parameters which are designed to characterise creaky excitations following previous evidence in the literature combined with new insights from observations in the current work. In particular the new method focuses on features in the Linear Prediction (LP) residual signal including the presence of secondary peaks as well as prominent impulse-like excitation peaks. These parameters are used as input features to a decision tree classifier for identifying creaky regions. The algorithm was evaluated on a range of read and conversational speech databases and was shown to clearly outperform the state-of-the-art. Further experiments involving degradations of the speech signal demonstrated robustness to both white and babble noise, providing better results than the state-of-the-art down to at least 20 dB signal to noise ratio. 相似文献