共查询到20条相似文献,搜索用时 15 毫秒
1.
基于发音特征的声效相关鲁棒语音识别算法 总被引:1,自引:0,他引:1
针对声效(VE)相关的语音识别鲁棒性问题,提出了基于多模型框架的语音识别算法.首先,分析了不同声效模式下语音信号的声学特性以及声效变化对语音识别精度的影响;然后,提出了基于高斯混合模型(GMM)的声效模式检测方法;最后,根据声效检测的结果,训练专门的声学模型用于耳语音识别,而将发音特征与传统的谱特征一起用于其余4种声效模式的语音识别.基于孤立词识别的实验结果显示,采用所提方法后语音识别准确率有了明显的提高:与基线系统相比,所提方法5种声效的平均字错误率降低了26.69%;与声学模型混合语料训练方法相比,平均字错误率降低了14.51%;与最大似然线性回归(MLLR)自适应方法相比,平均字错误率降低了15.30%.实验结果表明:与传统谱特征相比发音特征对于声效变化更具鲁棒性,而多模型框架是解决声效相关的语音识别鲁棒性问题的有效方法. 相似文献
2.
Navneet Upadhyay Hamurabi Gamboa Rosales 《International Journal of Speech Technology》2016,19(4):869-880
Speech recognizers achieve high recognition accuracy under quiet acoustic environments, but their performance degrades drastically when they are deployed in real environments, where the speech is degraded by additive ambient noise. This paper advocates a two phase approach for robust speech recognition in such environment. Firstly, a front end subband speech enhancement with adaptive noise estimation (ANE) approach is used to filter the noisy speech. The whole noisy speech spectrum is portioned into eighteen dissimilar subbands based on Bark scale and noise power from each subband is estimated by the ANE approach, which does not require the speech pause detection. Secondly, the filtered speech spectrum is processed by the non parametric frequency domain algorithm based on human perception along with the back end building a robust classifier to recognize the utterance. A suite of experiments is conducted to evaluate the performance of the speech recognizer in a variety of real environments, with and without the use of a front end speech enhancement stage. Recognition accuracy is evaluated at the word level, and at a wide range of signal to noise ratios for real world noises. Experimental evaluations show that the proposed algorithm attains good recognition performance when signal to noise ratio is lower than 5 dB. 相似文献
3.
Combining pulse-based features for rejecting far-field speech in a HMM-based Voice Activity Detector
Óscar Varela Rubén San-Segundo Luís A. HernándezAuthor vitae 《Computers & Electrical Engineering》2011,37(4):589-600
Nowadays, several computational techniques for speech recognition have been proposed. These techniques suppose an important improvement in real time applications where speaker interacts with speech recognition systems. Although researchers proposed many methods, none of them solve the high false alarm problem when far-field speakers interfere in a human-machine conversation. This paper presents a two-class (speech and non-speech classes) decision-tree based approach for combining new speech pulse features in a VAD (Voice Activity Detector) for rejecting far-field speech in speech recognition systems. This Decision Tree is applied over the speech pulses obtained by a baseline VAD composed of a frame feature extractor, a HMM-based (Hidden Markov Model) segmentation module and a pulse detector. The paper also presents a detailed analysis of a great amount of features for discriminating between close and far-field speech. The detection error obtained with the proposed VAD is the lowest compared to other well-known VADs. 相似文献
4.
针对目前语音谎言检测识别效果差、特征提取不充分等问题,提出了一种基于注意力机制的欺骗语音识别网络。首先,将双向长短时记忆与帧级声学特征相结合,其中帧级声学特征的维数随语音长度的变化而变化,从而有效提取声学特征。其次,采用基于时间注意增强卷积双向长短时记忆模型作为分类算法,使分类器能够从输入中学习与任务相关的深层信息,提高识别性能。最后,采用跳跃连接机制将时间注意增强卷积双向长短时记忆模型的底层输出直接连接到全连接层,从而充分利用了学习到的特征,避免了消失梯度的问题。实验阶段,与LSTM以及其他基准模型进行对比,所提模型性能最优。仿真结果进一步验证了所提模型对语音谎言检测领域发展及提升识别率提供了一定借鉴作用。 相似文献
5.
Ramirez J. Segura J.C. Gorriz J.M. Garcia L. 《IEEE transactions on audio, speech, and language processing》2007,15(8):2177-2189
This paper shows an improved statistical test for voice activity detection in noise adverse environments. The method is based on a revised contextual likelihood ratio test (LRT) defined over a multiple observation window. The motivations for revising the original multiple observation LRT (MO-LRT) are found in its artificially added hangover mechanism that exhibits an incorrect behavior under different signal-to-noise ratio (SNR) conditions. The new approach defines a maximum a posteriori (MAP) statistical test in which all the global hypotheses on the multiple observation window containing up to one speech-to-nonspeech or nonspeech-to-speech transitions are considered. Thus, the implicit hangover mechanism artificially added by the original method was not found in the revised method so its design can be further improved. With these and other innovations, the proposed method showed a higher speech/nonspeech discrimination accuracy over a wide range of SNR conditions when compared to the original MO-LRT voice activity detector (VAD). Experiments conducted on the AURORA databases and tasks showed that the revised method yields significant improvements in speech recognition performance over standardized VADs such as ITU T G.729 and ETSI AMR for discontinuous voice transmission and the ETSI AFE for distributed speech recognition (DSR), as well as over recently reported methods. 相似文献
6.
王景芳 《计算机工程与应用》2011,47(20):147-150
提出了一种适应复杂环境下的高效的实时语音端点检测算法,给出了每帧声信号在滤波中的噪声功率谱的推算方法。先将每帧语音的频谱进行迭代维纳滤波,再将它划分成若干个子带并计算出每个子带的频谱熵,然后把相继若干帧的子带频谱熵经过一组中值滤波器获得每帧的频谱熵,根据频谱熵的值对输入的语音进行分类。实验结果表明,该算法能够有效地区分语音和噪声,可以显著地提高语音识别系统的性能,在不同的噪声环境条件下具有鲁棒性。该算法计算代价小,简单易实现,适合实时语音识别系统的应用。 相似文献
7.
为了解决短波通信中语音检测的问题,针对短波语音信噪比低,噪声复杂的特点,对幅度谱熵算法进行了修正,选取功率谱熵作为VAD特征,加入谱熵平滑和hangover设计,研究了基于功率谱熵的语音端点检测算法。实验证明,算法对几种典型的短波语音均有比较理想的性能。 相似文献
8.
9.
几种无语音检测噪音估计方法的比较研究 总被引:1,自引:0,他引:1
噪音谱的估计是谱相减方法中关键的一环。传统的噪声谱的估计是通过对输入语音作语音检测,区分出纯噪声段,根据噪声段的频谱估计出噪声谱。该方法的准确性局限于语音检测算法的性能,在信噪比较低时,性能下降很快。近年来人们提出了多种不用语音检测的噪声估计方法,这些方法不区分语音和非语音段,在每一帧都进行噪声谱的更新。评估了几种无语音检测的噪音估计方法,比较了它们用于谱相减时在语音识别中的性能,提出了一种新的基于能量聚类的无语音检测噪音估计方法,通过实验验证了它的优良性能。 相似文献
10.
Shi-Huang Chen Rodrigo Capobianco Guido Trieu-Kien Truong Yaotsu Chang 《Computer Speech and Language》2010,24(3):531-543
This paper proposes an improved voice activity detection (VAD) algorithm using wavelet and support vector machine (SVM) for European Telecommunication Standards Institution (ETSI) adaptive multi-rate (AMR) narrow-band (NB) and wide-band (WB) speech codecs. First, based on the wavelet transform, the original IIR filter bank and pitch/tone detector are implemented, respectively, via the wavelet filter bank and the wavelet-based pitch/tone detection algorithm. The wavelet filter bank can divide input speech signal into several frequency bands so that the signal power level at each sub-band can be calculated. In addition, the background noise level can be estimated in each sub-band by using the wavelet de-noising method. The wavelet filter bank is also derived to detect correlated complex signals like music. Then the proposed algorithm can apply SVM to train an optimized non-linear VAD decision rule involving the sub-band power, noise level, pitch period, tone flag, and complex signals warning flag of input speech signals. By the use of the trained SVM, the proposed VAD algorithm can produce more accurate detection results. Various experimental results carried out from the Aurora speech database with different noise conditions show that the proposed algorithm gives considerable VAD performances superior to the AMR-NB VAD Options 1 and 2, and AMR-WB VAD. 相似文献
11.
基于顺序统计滤波的实时语音端点检测算法 总被引:1,自引:0,他引:1
针对嵌入式语音识别系统,提出了一种高效的实时语音端点检测算法. 算法以子带频谱熵为语音/噪声的区分特征, 首先将每帧语音的频谱划分成若干个子带, 计算出每个子带的频谱熵, 然后把相继若干帧的子带频谱熵经过一组顺序统计滤波器获得每帧的频谱熵, 根据频谱熵的值对输入的语音进行分类. 实验结果表明, 该算法能够有效地区分语音和噪声, 可以显著地提高语音识别系统的性能. 在不同的噪声环境和信噪比条件下具有鲁棒性. 此外, 本文提出的算法计算代价小, 简单易实现, 适合实时嵌入式语音识别系统的应用. 相似文献
12.
Maxat N. Kalimoldayev Keylan Alimhan Orken J. Mamyrbayev 《International Journal of Speech Technology》2014,17(2):199-204
This article considers the algorithm “Voice activity detection” and the using VAD algorithm in the system of Kazakh speech recognition. The paper presents a mathematical model VAD and methods for detecting voice data: pauses between sentences, words, individual sounds. VAD algorithm is adapted to the recognition of Kazakh speech counting the basic properties of Kazakh language. Voice activity detection researches in Kazakh speech are being conducted for the first time. The results of the spectral analysis are displayed on the picture. 相似文献
13.
端点检测是语音数字信号处理中一个重要的环节.在前人研究的基础上提出了一种新的基于临界带特征矢量距离的端点检测方法,由计算得到的每帧各临界带中的功率谱之和作为特征矢量,并且通过计算各帧之间的矢量距离得到其距离轨迹,以此设定门限进行语音端点的检测.对比实验表明,相对于基于谱熵的算法及基于倒谱距离的算法,本方法具有更好的鲁棒性和较高的正确率. 相似文献
14.
针对传统的似然比语音活动检测的计算语音与噪声统计模型复杂度高,提出结合倒谱阈值估计噪声频谱与瑞利统计模型的语音活动检测方法。该方法先用倒谱阈值估计噪声的频谱,再利用UMPT获得基于瑞利模型的语音判决阈值更新准则。评估了4种不同方法组合的语音活动检测(voice activity detection,VAD)。实验表明:在非平稳噪声环境下该方法的正确检测率优于其它组合的VAD方法。 相似文献
15.
16.
In last 10 years, several noise reduction (NR) algorithms have been proposed to be combined with the blind source separation techniques to separate speech and noise signals from blind noisy observations. More often, techniques use voice activity detector (VAD) systems for the optimal solution. In this paper, we propose a new backward blind source separation (BBSS) structure that uses the input correlation properties to provide: (i) high convergence rates and good tracking capabilities, since the acoustic environments imply long and time-variant noise paths, and (ii) low misalignment and robustness against different noise type variations and double-talk. The proposed algorithm has an automatic behavior to enhance noisy speech signals, and do not need any VAD systems to separate speech and noise signals. The obtained results in terms of several objective criteria show the good performance properties of the proposed algorithm in comparison with state-of-the-art algorithms. 相似文献
17.
语音有声/无声检测是影响语音增强和识别性能的一个关键因素,提出一种鲁棒的基于四阶统计量的语音有声/无声检测技术。利用语音信号的振幅谱是超高斯分布的特性,对每一帧语音信号的振幅谱,计算其四阶统计量,用来度量其超高斯性。结合该帧语音信号的能量,使用一个简单的阈值分类器,实现语音“无声”和“有声”期的检测。所提出的语音有声/无声检测技术,经实验证明具有很好的效果。 相似文献
18.
汉语连续语音识别系统与知识导引的搜索策略研究 总被引:1,自引:0,他引:1
从整体上介绍了汉语连续语音识别系统的基本原理,并重点对声学和语言两个层面
的建模与搜索策略进行了分析.在对传统帧同步搜索算法进行研究的基础上,提出了基于统
计知识的帧同步搜索算法SKB-FSS.它包含了三个主要的功能层次:基于归并的音节切分自
动机产生确定的搜索边界点,由统计得到的差分状态驻留信息控制搜索过程中的状态转移,
利用词搜索树控制音节候选的扩展规模并根据动态前向预测的方法进行合理而及时的路径
剪枝.实验结果验证了该搜索策略的有效性. 相似文献
19.