期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

An adaptive speech endpoint detection method in low SNR environments

Linhui Sun Min Su Zhenzhen Yang 《International Journal of Speech Technology》2017,20(3):651-658

Endpoint detection of speech has been shown prosperous for speech recognition and speech enhancement. But the traditional endpoint detection methods lose efficiency in either low signal-to-noise ratio (SNR) environments or nonstationary noise environments. To improve the accuracy of speech endpoint detection in low SNR environments, an endpoint detection method based on an adaptive algorithm for thresholds adjustment is put forward in this paper. The spectral subtraction of multitaper spectrum estimation is performed to enhance the speech. During the process of detection, the cepstral distance of Mel frequency cepstrum coefficient (MFCC) is utilized and the thresholds are adaptively adjusted to different environments. Simulation experiments indicate that in different noise environments with different SNRs, our algorithm has a better endpoint detection accuracy compared with other detection algorithms. Besides that, the algorithm also exhibits strong robustness in low SNR environments. 相似文献

2.

基于片上系统的孤立词语音识别算法设计 总被引：1，自引：0，他引：1

下载免费PDF全文

刘金伟 HUANG Zhangqin 侯义斌《计算机工程》2007,33(13):25-27,55

介绍了孤立词语音识别系统,针对片上系统进行了语音识别算法的选择。对基于语音帧的端点检测算法、线性预测编码倒谱系数LPCC算法和动态时间规整DTW算法进行了分析和设计。对于新型语音识别SoC芯片的开发研制和推动片上可编程系统(SoPC)的研究与发展具有一定的理论和实践意义。相似文献

3.

基于奇异性的语音端点检测方法

吴跃前杜明辉《计算机工程与设计》2008,29(10):2591-2594

噪声信号对于语音信号是相对奇异的.小波变换是分析信号奇异性的有利工具.在利用小波对含噪语音进行分析研究的基础上,提出了一种新的端点检测方法.该算法利用了基于信号奇异性的统计特征和高低频能量比特征.实验结果表明,在低信噪比的情况下,该算法依然能有效地进行语音分割. 相似文献

4.

一种简单的噪声鲁棒性语音端点检测方法

韦国刚周萍杨青《测控技术》2015,34(2):31-34

语音端点检测是语音识别系统非常重要的组成部分,一种理想的语音端点检测方法,在噪声环境中要具有较强的鲁棒性.为了提高检测方法在噪声环境中的鲁棒性,在短时能量的基础上,结合谱平度和幅度谱的主频率特征,分别进行判决,再采用投票决策机制确定端点检测结果,提出了一种比较理想的语音端点检测方法.实验结果表明,与传统的短时能量法和短时TEO能量法相比,该算法在各种加性噪声下具有良好的鲁棒性,在较低信噪比下仍能准确地区分有用信号和噪声,验证了该算法的有效性. 相似文献

5.

Robust speech recognition based on joint model and feature spaceoptimization of hidden Markov models

Seokyong Moon Jenq-Neng Hwang 《Neural Networks, IEEE Transactions on》1997,8(2):194-204

The hidden Markov model (HMM) inversion algorithm, based on either the gradient search or the Baum-Welch reestimation of input speech features, is proposed and applied to the robust speech recognition tasks under general types of mismatch conditions. This algorithm stems from the gradient-based inversion algorithm of an artificial neural network (ANN) by viewing an HMM as a special type of ANN. Given input speech features s, the forward training of an HMM finds the model parameters lambda subject to an optimization criterion. On the other hand, the inversion of an HMM finds speech features, s, subject to an optimization criterion with given model parameters lambda. The gradient-based HMM inversion and the Baum-Welch HMM inversion algorithms can be successfully integrated with the model space optimization techniques, such as the robust MINIMAX technique, to compensate the mismatch in the joint model and feature space. The joint space mismatch compensation technique achieves better performance than the single space, i.e. either the model space or the feature space alone, mismatch compensation techniques. It is also demonstrated that approximately 10-dB signal-to-noise ratio (SNR) gain is obtained in the low SNR environments when the joint model and feature space mismatch compensation technique is used. 相似文献

6.

Joint evaluation of multiple speech patterns for speech recognition and training

Nishanth Ulhas Nair T.V. Sreenivas 《Computer Speech and Language》2010,24(2):307-340

We are addressing the novel problem of jointly evaluating multiple speech patterns for automatic speech recognition and training. We propose solutions based on both the non-parametric dynamic time warping (DTW) algorithm, and the parametric hidden Markov model (HMM). We show that a hybrid approach is quite effective for the application of noisy speech recognition. We extend the concept to HMM training wherein some patterns may be noisy or distorted. Utilizing the concept of “virtual pattern” developed for joint evaluation, we propose selective iterative training of HMMs. Evaluating these algorithms for burst/transient noisy speech and isolated word recognition, significant improvement in recognition accuracy is obtained using the new algorithms over those which do not utilize the joint evaluation strategy. 相似文献

7.

带噪语音端点检测方法研究 总被引：2，自引：0，他引：2

朴春俊马静霞徐鹏《计算机应用》2006,26(11):2685-2686

影响语音识别性能的一个关键因素是端点检测的准确性。实际应用中信噪比较低,使得某些高信噪比下性能好的检测算法不能有效工作,影响系统的识别率。提出了一种基于时频方差和的语音端点检测算法。实验证明该算法能够在低信噪比的情况下,准确地检测出语音信号。通过对三种不同的端点检测算法的比较,发现基于时频方差和的端点检测算法的端点检测的准确率较高。相似文献

8.

一种精确检测语音端点的方法 总被引：1，自引：0，他引：1

朱淑琴裘雪红《计算机仿真》2005,22(3):214-216

端点检测是语音识别中的一项关键技术,端点检测的准确性对语音识别的性能有很大影响,特别是对端点检测比较敏感的语音识别算法。本文引用窗长动态变化的端点检测技术,并将传统的双门限端点检测算法和窗长动态改变的端点检测技术结合起来用于语音端点检测。大量实验表明这种技术可以比较精确的检测语音端点,特别是地检测语音的起始端点中有很大的优势。使用改进后的语音端点检测技术,可以有效地提高语音识别率。相似文献

9.

复杂噪声中基于MFCC距离的语音端点检测算法

韩云霄邵清符玉襄郭庆《计算机工程》2020,46(3):309-314

为提高复杂噪声环境下语音信号端点检测的准确率,提出一种基于梅尔频谱倒谱系数(MFCC)距离的多维特征语音信号端点检测算法。通过计算语音信号的MFCC距离,结合短时能量和短时过零率对特征距离进行修正,并更新其阈值,建立自适应噪声模型,实现复杂噪声中语音信号端点的准确检测。实验结果表明,与基于双门限能量和基于倒谱距离的2种经典检测算法相比,在计算效率相同的条件下,该算法的检测准确率更高。相似文献

10.

细胞膜离子单通道电流重构的计算机仿真

下载免费PDF全文

乔晓艳吴晋芝耿晓勇董有尔《计算机工程与应用》2011,47(16):218-220

细胞膜离子单通道电流十分微弱(PA级),用膜片钳技术测量离子电流往往淹没在强噪声背景中。目前,采用阈值检测方法恢复通道电流信号。但是,通道开放和关闭的电流阈值需要人为设定,并且阈值法在较低信噪比时失效。采用隐马尔可夫模型(HMM)重构离子单通道电流并估计模型参数。对离子通道HMM进行描述和分析;运用Baum-Welch迭代算法训练HMM并估计模型参数;利用Viterbi算法重构通道电流最佳状态序列。将HMM与阈值法进行比较,对不同信噪比和不同转移概率情况下HMM恢复算法进行计算机仿真。结果表明:HMM与阈值法相比,具有较强抗噪能力。在较低信噪比情况下,该模型恢复信号精度高,参数收敛速度快,且电流重构误差主要出现在状态突变点。相似文献

11.

Modular fuzzy-neuro controller driven by spoken language commands.

Koliya Pulasinghe Keigo Watanabe Kiyotaka Izumi Kazuo Kiguchi 《IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics》2004,34(1):293-302

We present a methodology of controlling machines using spoken language commands. The two major problems relating to the speech interfaces for machines, namely, the interpretation of words with fuzzy implications and the out-of-vocabulary (OOV) words in natural conversation, are investigated. The system proposed in this paper is designed to overcome the above two problems in controlling machines using spoken language commands. The present system consists of a hidden Markov model (HMM) based automatic speech recognizer (ASR), with a keyword spotting system to capture the machine sensitive words from the running utterances and a fuzzy-neural network (FNN) based controller to represent the words with fuzzy implications in spoken language commands. Significance of the words, i.e., the contextual meaning of the words according to the machine's current state, is introduced to the system to obtain more realistic output equivalent to users' desire. Modularity of the system is also considered to provide a generalization of the methodology for systems having heterogeneous functions without diminishing the performance of the system. The proposed system is experimentally tested by navigating a mobile robot in real time using spoken language commands. 相似文献

12.

Spoken query based word spotting in digitized Tamil documents

AN. Sigappi S. Palanivel 《AI & Society》2014,29(1):113-121

This paper presents an integrated approach to spot the spoken keywords in digitized Tamil documents by combining word image matching and spoken word recognition techniques. The work involves the segmentation of document images into words, creation of an index of keywords, and construction of word image hidden Markov model (HMM) and speech HMM for each keyword. The word image HMMs are constructed using seven dimensional profile and statistical moment features and used to recognize a segmented word image for possible inclusion of the keyword in the index. The spoken query word is recognized using the most likelihood of the speech HMMs using the 39 dimensional mel frequency cepstral coefficients derived from the speech samples of the keywords. The positional details of the search keyword obtained from the automatically updated index retrieve the relevant portion of text from the document during word spotting. The performance measures such as recall, precision, and F-measure are calculated for 40 test words from the four groups of literary documents to illustrate the ability of the proposed scheme and highlight its worthiness in the emerging multilingual information retrieval scenario. 相似文献

13.

短时TEO能量在带噪语音端点检测中的应用

下载免费PDF全文

李杰周萍杜志然《计算机工程与应用》2013,49(12):144-147

语音端点检测是语音识别系统的一个重要组成部分,特别是在噪声环境下,其准确性直接影响到语音识别系统的计算复杂度和识别性能。提出了一种在噪声环境下基于短时TEO能量的语音信号端点检测方法,采用了双门限-三态转换判决机制以保证算法在噪声环境下的端点检测准确性和对信号绝对幅度变化的稳健性。实验结果表明,与传统的短时能量法和谱熵法相比,该算法在低信噪比情况下具有更好的端点检测能力,显示了算法的优越性。相似文献

14.

An efficient speech recognition system in adverse conditions using the nonparametric regression

Abderrahmane Amrouche Mohamed Debyeche Abdelmalik Taleb-Ahmed Jean Michel Rouvaen Mustapha C.E. Yagoub 《Engineering Applications of Artificial Intelligence》2010,23(1):85-94

General Regression Neural Networks (GRNN) have been applied to phoneme identification and isolated word recognition in clean speech. In this paper, the authors extended this approach to Arabic spoken word recognition in adverse conditions. In fact, noise robustness is one of the most challenging problems in Automatic Speech Recognition (ASR) and most of the existing recognition methods, which have shown to be highly efficient under noise-free conditions, fail drastically in noisy environments. The proposed system was tested for Arabic digit recognition at different Signal-to-Noise Ratio (SNR) levels and under four noisy conditions: multispeakers babble background, car production hall (factory), military vehicle (leopard tank) and fighter jet cockpit (buccaneer) issued from NOISEX-92 database. The proposed scheme was successfully compared to the similar recognizers based on the Multilayer Perceptrons (MLP), the Elman Recurrent Neural Network (RNN) and the discrete Hidden Markov Model (HMM). The experimental results showed that the use of nonparametric regression with an appropriate smoothing factor (spread) improved the generalization power of the neural network and the global performance of the speech recognizer in noisy environments. 相似文献

15.

Stochastic automata for language modeling

Giuseppe Riccardi Roberto Pieraccini Enrico Bocchieri 《Computer Speech and Language》1996,10(4):265-293

相似文献

16.

Using geometric spectral subtraction approach for feature extraction for DSR front-end Arabic system

Zied Sakka Elhem Techini MedSalim Bouhlel 《International Journal of Speech Technology》2017,20(3):645-650

Noise robustness and Arabic language are still considered as the main challenges for speech recognition over mobile environments. This paper contributed to these trends by proposing a new robust Distributed Speech Recognition (DSR) system for Arabic language. A speech enhancement algorithm was applied to the noisy speech as a robust front-end pre-processing stage to improve the recognition performance. While an isolated Arabic word engine was designed, and developed using HMM Model to perform the recognition process at the back-end. To test the engine, several conditions including clean, noisy and enhanced noisy speech were investigated together with speaker dependent and speaker independent tasks. With the experiments carried out on noisy database, multi-condition training outperforms the clean training mode in all noise types in terms of recognition rate. The results also indicate that using the enhancement method increases the DSR accuracy of our system under severe noisy conditions especially at low SNR down to 10 dB. 相似文献

17.

语音信号端点检测方法综述及展望* 总被引：4，自引：1，他引：3

刘华平李昕徐柏龄姜宁《计算机应用研究》2008,25(8):2278-2283

端点检测是语音信号处理过程中非常重要的一步,它的准确性直接影响语音信号处理的速度和结果,因此端点检测方法的研究,特别是在噪声环境下端点检测的研究,一直是语音信号处理中的热点。从基于时域参数、频域参数、时频参数、模型匹配等方法的角度,较全面地回顾了端点检测方法的发展历程,对各种方法的优缺点进行了比较分析,并给出了这些方法的改进意见,对端点检测未来的研究方向进行了展望。相似文献

18.

A recurrent neural fuzzy network for word boundary detection invariable noise-level environments

Gin-Der Wu Chin-Teng Lin 《IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics》2001,31(1):84-97

This paper discusses the problem of automatic word boundary detection in the presence of variable-level background noise. Commonly used robust word boundary detection algorithms always assume that the background noise level is fixed. In fact, the background noise level may vary during the procedure of recording. This is the major reason that most robust word boundary detection algorithms cannot work well in the condition of variable background noise level. In order to solve this problem, we first propose a refined time-frequency (RTF) parameter for extracting both the time and frequency features of noisy speech signals. The RTF parameter extends the (time-frequency) TF parameter proposed by Junqua et al. from single band to multiband spectrum analysis, where the frequency bands help to make the distinction between speech signal and noise clear. The RTF parameter can extract useful frequency information. Based on this RTF parameter, we further propose a new word boundary detection algorithm by using a recurrent self-organizing neural fuzzy inference network (RSONFIN). Since RSONPIN can process the temporal relations, the proposed RTF-based RSONFIN algorithm can find the variation of the background noise level and detect correct word boundaries in the condition of variable background noise level. As compared to normal neural networks, the RSONFIN can always find itself an economic network size with high-learning speed. Due to the self-learning ability of RSONFIN, this RTF-based RSONFIN algorithm avoids the need for empirically determining ambiguous decision rules in normal word boundary detection algorithms. Experimental results show that this new algorithm achieves higher recognition rate than the TF-based algorithm which has been shown to outperform several commonly used word boundary detection algorithms by about 12% in variable background noise level condition, It also reduces the recognition error rate due to endpoint detection to about 23%, compared to an average of 47% obtained by the TF-based algorithm in the same condition. 相似文献

19.

衡阳方言孤立词识别研究

李荣华赵征鹏《计算机系统应用》2017,26(5):247-252

目前,汉语识别已经取得了一定的研究成果.但由于中国的地域性差异,十里不同音,使得汉语识别系统在进行方言识别时识别率低、性能差.针对语音识别系统在对方言进行识别时的缺陷,构建了基于HTK的衡阳方言孤立词识别系统.该系统使用HTK3.4.1工具箱,以音素为基本识别单元,提取39维梅尔频率倒谱系数（MFCC）语音特征参数,构建隐马尔可夫模型（HMM）,采用Viterbi算法进行模型训练和匹配,实现了衡阳方言孤立词语音识别.通过对比实验,比较了在不同因素模型下和不同高斯混合数下系统的性能.实验结果表明,将39维MFCC和5个高斯混合数与HMM模型结合实验时,系统的性能得到很大的改善. 相似文献

20.

Real-time lip reading system for isolated Korean word recognition

Jongju Shin Author Vitae Author Vitae Daijin Kim^{Author Vitae} 《Pattern recognition》2011,44(3):559-571

This paper proposes a real-time lip reading system (consisting of a lip detector, lip tracker, lip activation detector, and word classifier), which can recognize isolated Korean words. Lip detection is performed in several stages: face detection, eye detection, mouth detection, mouth end-point detection, and active appearance model (AAM) fitting. Lip tracking is then undertaken via a novel two-stage lip tracking method, where the model-based Lucas-Kanade feature tracker is used to track the outer lip, and then a fast block matching algorithm is used to track the inner lip. Lip activation detection is undertaken through a neural network classifier, the input for which being a combination of the lip motion energy function and the first dominant shape feature. In the last step, input words are defined and recognized by three different classifiers: HMM, ANN, and K-NN. We combine the proposed lip reading system with an audio-only automatic speech recognition (ASR) system to improve the word recognition performance in the noisy environments. We then demonstrate the potential applicability of the combined system for use within hands free in-vehicle navigation devices. Results from experiments undertaken on 30 isolated Korean words using the K-NN classifier at a speed of 15 fps demonstrate that the proposed lip reading system achieves a 92.67% word correct rate (WCR) for person-dependent tests, and a 46.50% WCR for person-independent tests. Also, the combined audio-visual ASR system increases the WCR from 0% to 60% in a noisy environment. 相似文献