首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
基于发音特征的声效相关鲁棒语音识别算法   总被引:1,自引:0,他引:1  
晁浩  宋成  彭维平 《计算机应用》2015,35(1):257-261
针对声效(VE)相关的语音识别鲁棒性问题,提出了基于多模型框架的语音识别算法.首先,分析了不同声效模式下语音信号的声学特性以及声效变化对语音识别精度的影响;然后,提出了基于高斯混合模型(GMM)的声效模式检测方法;最后,根据声效检测的结果,训练专门的声学模型用于耳语音识别,而将发音特征与传统的谱特征一起用于其余4种声效模式的语音识别.基于孤立词识别的实验结果显示,采用所提方法后语音识别准确率有了明显的提高:与基线系统相比,所提方法5种声效的平均字错误率降低了26.69%;与声学模型混合语料训练方法相比,平均字错误率降低了14.51%;与最大似然线性回归(MLLR)自适应方法相比,平均字错误率降低了15.30%.实验结果表明:与传统谱特征相比发音特征对于声效变化更具鲁棒性,而多模型框架是解决声效相关的语音识别鲁棒性问题的有效方法.  相似文献   

2.
Speech recognizers achieve high recognition accuracy under quiet acoustic environments, but their performance degrades drastically when they are deployed in real environments, where the speech is degraded by additive ambient noise. This paper advocates a two phase approach for robust speech recognition in such environment. Firstly, a front end subband speech enhancement with adaptive noise estimation (ANE) approach is used to filter the noisy speech. The whole noisy speech spectrum is portioned into eighteen dissimilar subbands based on Bark scale and noise power from each subband is estimated by the ANE approach, which does not require the speech pause detection. Secondly, the filtered speech spectrum is processed by the non parametric frequency domain algorithm based on human perception along with the back end building a robust classifier to recognize the utterance. A suite of experiments is conducted to evaluate the performance of the speech recognizer in a variety of real environments, with and without the use of a front end speech enhancement stage. Recognition accuracy is evaluated at the word level, and at a wide range of signal to noise ratios for real world noises. Experimental evaluations show that the proposed algorithm attains good recognition performance when signal to noise ratio is lower than 5 dB.  相似文献   

3.
Nowadays, several computational techniques for speech recognition have been proposed. These techniques suppose an important improvement in real time applications where speaker interacts with speech recognition systems. Although researchers proposed many methods, none of them solve the high false alarm problem when far-field speakers interfere in a human-machine conversation. This paper presents a two-class (speech and non-speech classes) decision-tree based approach for combining new speech pulse features in a VAD (Voice Activity Detector) for rejecting far-field speech in speech recognition systems. This Decision Tree is applied over the speech pulses obtained by a baseline VAD composed of a frame feature extractor, a HMM-based (Hidden Markov Model) segmentation module and a pulse detector. The paper also presents a detailed analysis of a great amount of features for discriminating between close and far-field speech. The detection error obtained with the proposed VAD is the lowest compared to other well-known VADs.  相似文献   

4.
针对目前语音谎言检测识别效果差、特征提取不充分等问题,提出了一种基于注意力机制的欺骗语音识别网络。首先,将双向长短时记忆与帧级声学特征相结合,其中帧级声学特征的维数随语音长度的变化而变化,从而有效提取声学特征。其次,采用基于时间注意增强卷积双向长短时记忆模型作为分类算法,使分类器能够从输入中学习与任务相关的深层信息,提高识别性能。最后,采用跳跃连接机制将时间注意增强卷积双向长短时记忆模型的底层输出直接连接到全连接层,从而充分利用了学习到的特征,避免了消失梯度的问题。实验阶段,与LSTM以及其他基准模型进行对比,所提模型性能最优。仿真结果进一步验证了所提模型对语音谎言检测领域发展及提升识别率提供了一定借鉴作用。  相似文献   

5.
This paper shows an improved statistical test for voice activity detection in noise adverse environments. The method is based on a revised contextual likelihood ratio test (LRT) defined over a multiple observation window. The motivations for revising the original multiple observation LRT (MO-LRT) are found in its artificially added hangover mechanism that exhibits an incorrect behavior under different signal-to-noise ratio (SNR) conditions. The new approach defines a maximum a posteriori (MAP) statistical test in which all the global hypotheses on the multiple observation window containing up to one speech-to-nonspeech or nonspeech-to-speech transitions are considered. Thus, the implicit hangover mechanism artificially added by the original method was not found in the revised method so its design can be further improved. With these and other innovations, the proposed method showed a higher speech/nonspeech discrimination accuracy over a wide range of SNR conditions when compared to the original MO-LRT voice activity detector (VAD). Experiments conducted on the AURORA databases and tasks showed that the revised method yields significant improvements in speech recognition performance over standardized VADs such as ITU T G.729 and ETSI AMR for discontinuous voice transmission and the ETSI AFE for distributed speech recognition (DSR), as well as over recently reported methods.  相似文献   

6.
提出了一种适应复杂环境下的高效的实时语音端点检测算法,给出了每帧声信号在滤波中的噪声功率谱的推算方法。先将每帧语音的频谱进行迭代维纳滤波,再将它划分成若干个子带并计算出每个子带的频谱熵,然后把相继若干帧的子带频谱熵经过一组中值滤波器获得每帧的频谱熵,根据频谱熵的值对输入的语音进行分类。实验结果表明,该算法能够有效地区分语音和噪声,可以显著地提高语音识别系统的性能,在不同的噪声环境条件下具有鲁棒性。该算法计算代价小,简单易实现,适合实时语音识别系统的应用。  相似文献   

7.
低信噪比下基于功率谱熵的语音端点检测算法   总被引:2,自引:1,他引:2       下载免费PDF全文
为了解决短波通信中语音检测的问题,针对短波语音信噪比低,噪声复杂的特点,对幅度谱熵算法进行了修正,选取功率谱熵作为VAD特征,加入谱熵平滑和hangover设计,研究了基于功率谱熵的语音端点检测算法。实验证明,算法对几种典型的短波语音均有比较理想的性能。  相似文献   

8.
李宇  郭雷勇  谭洪舟 《计算机应用》2011,31(5):1447-1449
为了提高统计模型似然比测试的语音活动检测(VAD)的检测性能,利用前后语音帧间存在的统计相关特性,提出一种改进VAD算法。通过前帧语音频谱分量对先验信噪比进行递归估计,然后利用前一帧的语音检测状态来设计判决阈值,建立了双阈值隐马尔可夫模型语音活动判决规则。实验表明,此帧间相关性VAD算法的检测指标值优于Sohn算法。  相似文献   

9.
几种无语音检测噪音估计方法的比较研究   总被引:1,自引:0,他引:1  
噪音谱的估计是谱相减方法中关键的一环。传统的噪声谱的估计是通过对输入语音作语音检测,区分出纯噪声段,根据噪声段的频谱估计出噪声谱。该方法的准确性局限于语音检测算法的性能,在信噪比较低时,性能下降很快。近年来人们提出了多种不用语音检测的噪声估计方法,这些方法不区分语音和非语音段,在每一帧都进行噪声谱的更新。评估了几种无语音检测的噪音估计方法,比较了它们用于谱相减时在语音识别中的性能,提出了一种新的基于能量聚类的无语音检测噪音估计方法,通过实验验证了它的优良性能。  相似文献   

10.
This paper proposes an improved voice activity detection (VAD) algorithm using wavelet and support vector machine (SVM) for European Telecommunication Standards Institution (ETSI) adaptive multi-rate (AMR) narrow-band (NB) and wide-band (WB) speech codecs. First, based on the wavelet transform, the original IIR filter bank and pitch/tone detector are implemented, respectively, via the wavelet filter bank and the wavelet-based pitch/tone detection algorithm. The wavelet filter bank can divide input speech signal into several frequency bands so that the signal power level at each sub-band can be calculated. In addition, the background noise level can be estimated in each sub-band by using the wavelet de-noising method. The wavelet filter bank is also derived to detect correlated complex signals like music. Then the proposed algorithm can apply SVM to train an optimized non-linear VAD decision rule involving the sub-band power, noise level, pitch period, tone flag, and complex signals warning flag of input speech signals. By the use of the trained SVM, the proposed VAD algorithm can produce more accurate detection results. Various experimental results carried out from the Aurora speech database with different noise conditions show that the proposed algorithm gives considerable VAD performances superior to the AMR-NB VAD Options 1 and 2, and AMR-WB VAD.  相似文献   

11.
基于顺序统计滤波的实时语音端点检测算法   总被引:1,自引:0,他引:1  
针对嵌入式语音识别系统,提出了一种高效的实时语音端点检测算法. 算法以子带频谱熵为语音/噪声的区分特征, 首先将每帧语音的频谱划分成若干个子带, 计算出每个子带的频谱熵, 然后把相继若干帧的子带频谱熵经过一组顺序统计滤波器获得每帧的频谱熵, 根据频谱熵的值对输入的语音进行分类. 实验结果表明, 该算法能够有效地区分语音和噪声, 可以显著地提高语音识别系统的性能. 在不同的噪声环境和信噪比条件下具有鲁棒性. 此外, 本文提出的算法计算代价小, 简单易实现, 适合实时嵌入式语音识别系统的应用.  相似文献   

12.
This article considers the algorithm “Voice activity detection” and the using VAD algorithm in the system of Kazakh speech recognition. The paper presents a mathematical model VAD and methods for detecting voice data: pauses between sentences, words, individual sounds. VAD algorithm is adapted to the recognition of Kazakh speech counting the basic properties of Kazakh language. Voice activity detection researches in Kazakh speech are being conducted for the first time. The results of the spectral analysis are displayed on the picture.  相似文献   

13.
端点检测是语音数字信号处理中一个重要的环节.在前人研究的基础上提出了一种新的基于临界带特征矢量距离的端点检测方法,由计算得到的每帧各临界带中的功率谱之和作为特征矢量,并且通过计算各帧之间的矢量距离得到其距离轨迹,以此设定门限进行语音端点的检测.对比实验表明,相对于基于谱熵的算法及基于倒谱距离的算法,本方法具有更好的鲁棒性和较高的正确率.  相似文献   

14.
针对传统的似然比语音活动检测的计算语音与噪声统计模型复杂度高,提出结合倒谱阈值估计噪声频谱与瑞利统计模型的语音活动检测方法。该方法先用倒谱阈值估计噪声的频谱,再利用UMPT获得基于瑞利模型的语音判决阈值更新准则。评估了4种不同方法组合的语音活动检测(voice activity detection,VAD)。实验表明:在非平稳噪声环境下该方法的正确检测率优于其它组合的VAD方法。  相似文献   

15.
基于小波变分辨率频谱特征的静音检测   总被引:1,自引:0,他引:1       下载免费PDF全文
薛卫  都思丹  叶迎宪 《计算机工程》2009,35(13):232-233
针对静音检测提出基于小波变分辨率频谱特征的检测算法。算法采用多门限过零率对静音进行初判,并提取多个语音感觉特征与基于小波变分辨率频谱的Mel频率倒谱系数(MFCC)组合成语音特征,通过二分类支持向量机对该特征进行分类实现静音检测。测试结果表明,该算法在不同信噪比下语音识别正确率高于G.729b,MFCC特征静音检测算法,基于该算法的视频会议服务器运算量低于使用G.729b静音检测算法的视频系统。  相似文献   

16.
In last 10 years, several noise reduction (NR) algorithms have been proposed to be combined with the blind source separation techniques to separate speech and noise signals from blind noisy observations. More often, techniques use voice activity detector (VAD) systems for the optimal solution. In this paper, we propose a new backward blind source separation (BBSS) structure that uses the input correlation properties to provide: (i) high convergence rates and good tracking capabilities, since the acoustic environments imply long and time-variant noise paths, and (ii) low misalignment and robustness against different noise type variations and double-talk. The proposed algorithm has an automatic behavior to enhance noisy speech signals, and do not need any VAD systems to separate speech and noise signals. The obtained results in terms of several objective criteria show the good performance properties of the proposed algorithm in comparison with state-of-the-art algorithms.  相似文献   

17.
语音有声/无声检测是影响语音增强和识别性能的一个关键因素,提出一种鲁棒的基于四阶统计量的语音有声/无声检测技术。利用语音信号的振幅谱是超高斯分布的特性,对每一帧语音信号的振幅谱,计算其四阶统计量,用来度量其超高斯性。结合该帧语音信号的能量,使用一个简单的阈值分类器,实现语音“无声”和“有声”期的检测。所提出的语音有声/无声检测技术,经实验证明具有很好的效果。  相似文献   

18.
汉语连续语音识别系统与知识导引的搜索策略研究   总被引:1,自引:0,他引:1  
从整体上介绍了汉语连续语音识别系统的基本原理,并重点对声学和语言两个层面 的建模与搜索策略进行了分析.在对传统帧同步搜索算法进行研究的基础上,提出了基于统 计知识的帧同步搜索算法SKB-FSS.它包含了三个主要的功能层次:基于归并的音节切分自 动机产生确定的搜索边界点,由统计得到的差分状态驻留信息控制搜索过程中的状态转移, 利用词搜索树控制音节候选的扩展规模并根据动态前向预测的方法进行合理而及时的路径 剪枝.实验结果验证了该搜索策略的有效性.  相似文献   

19.
针对带噪面罩语音识别率低的问题,结合语音增强算法,对面罩语音进行噪声抑制处理,提高信噪比,在语音增强中提出了一种改进的维纳滤波法,通过谱熵法检测有话帧和无话帧来更新噪声功率谱,同时引入参数控制增益函数;提取面罩语音信号的Mel频率倒谱系数(MFCC)作为特征参数;通过卷积神经网络(CNN)进行训练和识别,并在每个池化层后经局部响应归一化(LRN)进行优化.实验结果表明:该识别系统能够在很大程度上提高带噪面罩语音的识别率.  相似文献   

20.
张玲  顾彦飞  何伟 《计算机应用》2010,30(5):1262-1265
为了降低噪声及决策导向(DD)参数估计算法的帧延迟特性对语音活动检测(VAD)算法鲁棒性的影响,首先采用两步降噪(TSNR)技术估计算法提高语音瞬变时刻参数估计准确性,并针对语音噪声的频率选择性,通过频带分割,将噪声污染限制到孤立子频带中,构建了由子频带特征与可靠性因子结合提供判别结果的子频带加权VAD算法。实验表明,此子频带加权算法优于Sohn算法、Cho算法以及G.729B等全频带算法。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号