期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

《计算机工程》2017,(5):268-274

语音端点检测是语音信号处理的一个重要环节,在低信噪比下,端点检测的准确度和鲁棒性较低。为此,提出一种小波能量熵与基音周期相结合的混合端点检测算法。该算法通过分析语音信号的小波能量和小波能量熵,构造不同语者的小波能量熵端点检测参数,针对不同语者的发音特性运用小波能量熵和基音周期检测语音端点。实验结果表明,在不同噪声背景下,当信噪比为5 dB时,该算法的端点检测平均准确率达到84.375%,相对于小波能量和小波能量熵算法均有明显提高。相似文献

2.

改进的语音端点检测技术 总被引：1，自引：0，他引：1

下载免费PDF全文

李晋刘甫王玲许慧燕《计算机工程与应用》2009,45(24):133-135

为了提高低信噪比下语音端点检测的性能,提出了一种改进的基于谱减法和自适应子带谱熵的语音端点检测方法。该方法先利用谱减法对带噪语音消除加性噪声,及时更新背景噪声估计,再对增强后的语音信号利用改进的自适应子带谱熵进行端点检测。实验结果表明,该方法具有良好的检测性能,相对传统方法提高了端点检测的准确率,在低信噪比环境下仍能比较准确地检测到语音的端点。相似文献

3.

一种基于自适应谱熵的端点检测改进方法 总被引：1，自引：0，他引：1

王琳李成荣《计算机仿真》2010,27(12)

在低信噪比的环境下,为增强与噪声的区分度,提出了一种适应于低信噪比环境的语音端点检测方法.通过改进语音端点检测的特征参数,更好地区分语音信号与噪声信号,提高在低信噪比环境下的端点检测正确率.基于子带谱熵,引入正值常量对基本谱熵参数进行算法改进,得到改良的负谱熵特征,并结合自适应子带选择方法,得到一种新颖的特征参数--自适应子带常量负谱熵.特征在低信噪比的情况下有较强的抗噪能力,并能够准确地检测出语音端点.实验结果表明,不仅快速有效,具有较强的鲁棒性,而且适合低信噪比的语音端点检测. 相似文献

4.

基于子带能熵比的语音端点检测算法 总被引：1，自引：0，他引：1

张毅王可佳席兵颜博《计算机科学》2017,44(5):304-307, 319

准确地识别语音端点是语音识别过程中的一个重要步骤。在低信噪比环境下,为更好地增强语音和噪声的区分度,提高语音端点检测系统的准确率,在分析了常规子带谱熵端点检测算法的基础上结合子带能量,提出了一种基于子带能熵比的语音端点检测算法。该算法将子带能量和子带谱熵的比值作为端点检测的重要参数,以此设定阈值进行语音端点的检测。实验表明,该算法快速高效,具有较高的鲁棒性,在较低的信噪比环境下能准确地进行语音端点检测。相似文献

5.

基于加权门限谱熵的改进端点检测方法

下载免费PDF全文

冯璐陈威兵《计算机工程与应用》2013,49(9):207-210

为了进行有效的语音信号处理,降低语音信号的冗余度,通常采用端点检测技术来提取语音信号中的有效部分。而传统谱熵端点检测算法由于判定门限为固定值,其在低信噪比条件下检测性能急剧下降,提出了一种基于动态加权门限的检测方法,对每个判定的噪音帧的谱熵与无声段噪音谱熵进行加权平均,得到新的噪音谱熵作为更新后的门限值;在判定过程中引入谱减法提高信噪比,进一步降低噪声干扰。仿真实验结果证明,相对于传统谱熵端点检测方法,该方法在低信噪比的条件下仍然能够更为准确地检测到语音的端点。相似文献

6.

基于最大熵谱估计和时频特性的语音端点检测

《计算机应用与软件》2017,(11)

语音端点检测对于构建实际语音识别系统具有重要的意义。为了提升在低信噪比条件下语音端点检测算法的性能,提出一种基于最大熵谱和时频特性的端点检测算法。对分帧后的语音信号通过最大熵估算出功率谱,并根据带噪语音信号时频域上的特性进行特征捕捉,从而进行端点检测。实验结果表明,此方法在较低的信噪比下(-9~0 dB)能够比较准确地捕捉语音信号的特征,明显地提高了端点检测的准确性。相似文献

7.

一种强噪音环境中的语音端点检测算法

《电子技术应用》2013,(12):135-137

针对强噪音环境中语音端点检测准确率较低的问题,提出了一种应用在强噪音环境中的语音端点检测算法,结合先验信噪比估计语音增强和改进子带谱熵的算法实现了强噪音中的端点检测。实验结果表明,相比传统端点检测算法,该算法在不同噪声环境下具有较高的鲁棒性,特别是在低信噪比下具有较高的端点检测准确率和较低的误检率。相似文献

8.

基于临界频带及能量熵的语音端点检测 总被引：1，自引：0，他引：1

张婷何凌黄华刘肖珩《计算机应用》2013,33(1):175-178

语音端点检测的准确性直接关系着语音识别、合成、增强等语音领域的准确性,为了提高语音端点检测的有效性,提出了一种基于临界频带及能量熵的语音端点检测算法。算法充分利用人耳听觉特性的频率分布,将含噪语音信号进行临界频带划分,并结合各频带内信号的能量熵值在语音段和噪声段的不同分布,实现不同背景噪声下语音端点检测。实验结果表明,提出的语音端点检测算法与传统的短时能量法相比,检测正确率平均高1.6个百分点。所提方法在不同噪声的低信噪比(SNR)环境下均能实现语音端点检测。相似文献

9.

一种改进的语音端点检测方法

刘培谢维波赵东世《计算机时代》2010,(12):4-7

为了提高低信噪比下语音端点检测的性能,提出了一种改进形式的谱减法与改进的功率谱熵法相结合的语音端点检测算法。该算法首先利用改进的谱减法有效地降低背景噪声,然后再用短时平均幅度加权的方法改进功率谱熵,从而判定去噪后语音的端点位置。仿真结果表明,该方法具有良好的检测能力,在低信噪比环境下能比较准确地检测到语音的端点。相似文献

10.

低信噪比环境下子带能熵比端点检测算法

沈钰瑞李文钧金伟杰岳克强《计算技术与自动化》2020,39(2):109-113

语音端点检测是将采集到的语音信号从复杂的噪声背景中提取出来，确定每段语音的开始和结束，是后续处理的基础。对于语音端点检测在低信噪比的复杂噪声环境下准确率低的问题，提出了一种多窗谱估计减噪和子带能熵比法结合的语音端点检测算法。该算法通过改进多窗谱谱减法对语音信号进行减噪，在分析了常规谱熵端点检测算法的基础上结合对数能量，以改进的子带能熵比作为阈值进行端点检测。实验表明，该算法在不同环境的低信噪环境下，准确率高，具有较高的鲁棒性。相似文献

11.

基于矢量量化技术的钢水连铸下渣检测方法的研究 总被引：1，自引：0，他引：1

李培玉邹福星《计算机测量与控制》2005,13(6):514-516

在钢水浇注后期,为了提高钢材质量,需要判断钢水浇铸是否下渣。矢量量化技术作为一种非参数的模式识别方法,已经成功应用于语音编码、语音合成、语音识别和说话人识别方面。在分析大量浇铸机构振动信号的基础上,创造性地把矢量量化技术引入钢水浇铸的下渣识别中。实验结果表明,这种方法是有效的。相似文献

12.

基于独立分量分析特征提取的带噪信号端点检测 总被引：2，自引：0，他引：2

何清波孔凡让王建平刘永斌《数据采集与处理》2007,22(1):25-30

运用独立分量分析(ICA)提取信号高阶统计特征的方法,提出一种新的利用信号自身统计特性的信噪区分方法,由于ICA变换可以增大语音和噪声的统计性差别,故在ICA域内可以有效区分语音和噪声。在此基础上提出了ICA能量(ICAE)和滤波ICAE(FICAE)特征来进行端点检测。实验表明,结合FICAE与ICAE的端点检测方法在不同信噪比时具有一定的稳定性,在很低信噪比下也能有效检测出语音的端点,显示了良好的抗噪性能,为强背景噪声下弱信号的端点检测提供了新的途径。相似文献

13.

基于MUSIC/MNM谱估计的鲁棒语音特征提取

张毅汪培培罗元《信息与控制》2016,45(3):355-360

针对语音识别系统受噪声干扰识别率急剧下降的问题,通过分析传统的鲁棒语音特征提取方法在语音信号谱估计方面的不足,提出一种在不同信噪比下都具有较好鲁棒性和识别性能的语音特征提取算法．该算法结合多信号分类法（MUSIC）和最小模法（minimum-norm method,MNM）来进行谱估计．接着在移动机器人平台上进行验证实验,结果表明：该算法能有效的提高语音识别率,增强语音识别鲁棒性能．相似文献

14.

弱稀疏性下的欠定语音盲分离方法

下载免费PDF全文

王国鹏刘郁林罗颖光《计算机工程》2009,35(18):246-248

针对语音信号的弱稀疏性,提出一种新的基于混合矩阵估计的欠定语音盲分离方法。该方法通过主成分分析检测只有一个源信号存在时的时频点并用于估计混合矩阵,从而克服语音信号稀疏性变弱时的影响,提高混合矩阵估计精度。结合子空间法重构源信号,进一步提高分离性能,并从几何角度证明子空间方法,仿真结果表明该方法的分离性能优于Cluster-UBSS,且鲁棒性更好。相似文献

15.

基于循环谱的MPSK信号盲检测

郑鹏张鑫刘锋陶然《计算机仿真》2012,(4):130-133,137

针对MPSK信号在低信噪比、非合作条件下存在检测概率低的问题,提出了基于循环谱分析器的检测方法。方法根据MPSK信号循环平稳特性,通过在频率和循环频率双频率平面内的循环谱峰值特征搜索完成检测,不需要知道信号的编码序列、载频、码元宽度等先验参数,采用时域平滑实现循环谱估计,减小了计算量。仿真结果表明方法对MPSK信号具有较高的检测概率,而且证明数据长度的增加能够显著提高检测性能。相似文献

16.

Multi notch adaptive digital filter design for enhancement of speech signals embedded in non-stationary noise

Ergun Erçelebi^{Author Vitae} 《Computers & Electrical Engineering》2004,30(2):79-95

This paper presents a new approach to speech enhancement based on modified least mean square-multi notch adaptive digital filter (MNADF). This approach differs from traditional speech enhancement methods since no a priori knowledge of the noise source statistics is required. Specifically, the proposed method is applied to the case where speech quality and intelligibility deteriorates in the presence of background noise. Speech coders and automatic speech recognition systems are designed to act on clean speech signals. Therefore, corrupted speech signals by the noise must be enhanced before their processing. The proposed method uses a primary input containing the corrupted speech signal and a reference input containing noise only. The new computationally efficient algorithm is developed here based on tracking significant frequencies of the noise and implementing MNADF at those frequencies. To track frequencies of the noise time-frequency analysis method such as short time frequency transform is used. Different types of noises from Noisex-92 database are used to degrade real speech signals. Objective measures, the study of the speech spectrograms and global signal-to-noise ratio (SNR), segmental SNR (segSNR) as well as subjective listing test demonstrate consistently superior enhancement performance of the proposed method over tradition speech enhancement method such as spectral subtraction. 相似文献

17.

Robust emotion recognition in noisy speech via sparse representation

Xiaoming Zhao Shiqing Zhang Bicheng Lei 《Neural computing & applications》2014,24(7-8):1539-1553

Emotion recognition in speech signals is currently a very active research topic and has attracted much attention within the engineering application area. This paper presents a new approach of robust emotion recognition in speech signals in noisy environment. By using a weighted sparse representation model based on the maximum likelihood estimation, an enhanced sparse representation classifier is proposed for robust emotion recognition in noisy speech. The effectiveness and robustness of the proposed method is investigated on clean and noisy emotional speech. The proposed method is compared with six typical classifiers, including linear discriminant classifier, K-nearest neighbor, C4.5 decision tree, radial basis function neural networks, support vector machines as well as sparse representation classifier. Experimental results on two publicly available emotional speech databases, that is, the Berlin database and the Polish database, demonstrate the promising performance of the proposed method on the task of robust emotion recognition in noisy speech, outperforming the other used methods. 相似文献

18.

A new optimum feature extraction and classification method for speaker recognition: GWPNN

《Expert systems with applications》2007,32(2):485-498

Speech and speaker recognition is an important topic to be performed by a computer system. In this paper, an expert speaker recognition system based on optimum wavelet packet entropy is proposed for speaker recognition by using real speech/voice signal. This study contains both the combination of the new feature extraction and classification approach by using optimum wavelet packet entropy parameter values. These optimum wavelet packet entropy values are obtained from measured real English language speech/voice signal waveforms using speech experimental set. A genetic-wavelet packet-neural network (GWPNN) model is developed in this study. GWPNN includes three layers which are genetic algorithm, wavelet packet and multi-layer perception. The genetic algorithm layer of GWPNN is used for selecting the feature extraction method and obtaining the optimum wavelet entropy parameter values. In this study, one of the four different feature extraction methods is selected by using genetic algorithm. Alternative feature extraction methods are wavelet packet decomposition, wavelet packet decomposition – short-time Fourier transform, wavelet packet decomposition – Born–Jordan time–frequency representation, wavelet packet decomposition – Choi–Williams time–frequency representation. The wavelet packet layer is used for optimum feature extraction in the time–frequency domain and is composed of wavelet packet decomposition and wavelet packet entropies. The multi-layer perceptron of GWPNN, which is a feed-forward neural network, is used for evaluating the fitness function of the genetic algorithm and for classification speakers. The performance of the developed system has been evaluated by using noisy English speech/voice signals. The test results showed that this system was effective in detecting real speech signals. The correct classification rate was about 85% for speaker classification. 相似文献

19.

Improved automatic detection of creak

John Kane Thomas Drugman Christer Gobl 《Computer Speech and Language》2013,27(4):1028-1047

This paper describes a new algorithm for automatically detecting creak in speech signals. Detection is made by utilising two new acoustic parameters which are designed to characterise creaky excitations following previous evidence in the literature combined with new insights from observations in the current work. In particular the new method focuses on features in the Linear Prediction (LP) residual signal including the presence of secondary peaks as well as prominent impulse-like excitation peaks. These parameters are used as input features to a decision tree classifier for identifying creaky regions. The algorithm was evaluated on a range of read and conversational speech databases and was shown to clearly outperform the state-of-the-art. Further experiments involving degradations of the speech signal demonstrated robustness to both white and babble noise, providing better results than the state-of-the-art down to at least 20 dB signal to noise ratio. 相似文献

20.

基于稀疏性的欠定语音盲分离方法研究 总被引：1，自引：0，他引：1

王国鹏刘郁林罗颖光《计算机应用》2009,29(4):1056-1058

针对源信号增多导致语音信号稀疏性变差的问题,提出一种新的基于稀疏性的混合矩阵估计方法,利用主分量分析(PCA)检测只有一个源信号存在的时频点并用于估计混合矩阵,从而提高了估计性能,特别适用于欠定语音盲分离。同时指出了影响基于稀疏性语音盲分离方法性能的因素。仿真结果验证了上述结论。相似文献