共查询到20条相似文献,搜索用时 15 毫秒
1.
Jong-Seok Lee Cheol Hoon Park 《Multimedia, IEEE Transactions on》2008,10(5):767-779
2.
3.
4.
5.
基于动静态组合特征参数的语音识别 总被引:1,自引:0,他引:1
基于语音信号的时变特性,本文提出了动静态特征参数结合的语音信号识别方法,首先在特征参数提取中引入了小波包变换,借助MFCC(Mel-Frequency Cepstrum Coefficient)参数的提取方法,用小波包变换代替傅立叶变换和Mel滤波器组,提取了新的静态特征参数DWPTMFCC(Discrete Wavelet Packet Transform Mel-Frequency Coefficient),然后把它与一阶DWPTMFCC差分参数相结合成一个向量,作为一帧语音信号的参数,通过试验和仿真,此参数具有很高的识别率,是一种很好的语音特征参数.并且把混沌特性引入到神经元,构成混沌神经网络,把这种神经网络用于语音识别,并与常用的BP神经网络识别方法进行了比较.试验结果表明,混沌神经网络的平均识别率要高于同等条件下常用的神经网络方法的识别率. 相似文献
6.
基于改进对比散度的GRBM语音识别 总被引:1,自引:0,他引:1
对比散度作为训练受限波尔兹曼机模型的主流技术之一,在实验训练中具有较好的测试效果。通过结合指数平均数指标算法和并行回火的思想,提出一种改进对比散度的训练算法,包括模型参数的更新和样本数据的采样,并将改进后的训练算法应用于高斯伯努利受限玻尔兹曼机( GRBM)中训练语音识别模型参数。在TI-Digits数字语音训练和数字测试数据库上的实验结果表明,采用改进的对比散度训练的GRBM明显优于传统的模型训练算法,语音识别率能够达到80%左右,最高提升7%左右,而且应用改进算法训练的其他GRBM对比模型的语音识别率也都有所提高,具有较好的识别性能。 相似文献
7.
远程命令识别与解析是嵌入式环境中终端-控制台和上-下位机模式实现远程管控的基础和关键.文中分析水下探测智能终端的工作过程,提出了一种基于有限状态自动机的远程命令识别与解析方法,智能终端可以根据工作状态自动机模型对远程命令进行快速、准确地响应,避免了复杂的计算和繁琐的决策过程.实验发现,水下探测智能终端及时识别出控制台发送的管控指令,按要求转入相应的工作状态,该方法有效地提高了水下探测智能终端机的工作性能. 相似文献
8.
日语中谓词语态有不同的词尾变形,其中被动态和可能态具有相同的词尾变化,在统计机器翻译中难以对其正确区分及翻译。因此,该文提出一种利用最大熵模型有效地对日语可能态和被动态进行分类,然后把日语的可能态和被动态特征有效地融合到对数线性模型中改进翻译模型的方法,以提高可能态和被动态翻译规则选择的准确性。实验结果表明,该方法可以有效提升日语可能态和被动态句子的翻译质量,在大规模日汉语料上,最高翻译BLEU值能够由41.50提高到42.01,并在人工评测中,翻译结果的整体可理解度得到了2.71%的提升。 相似文献
9.
Computer speech recognition has been very successful in limited domains and for isolated word recognition. However, widespread use of large-vocabulary continuous-speech recognizers is limited by the speed of current recognizers, which cannot reach acceptable error rates while running in real time. This paper shows how to harness shared memory multiprocessors, which are becoming increasingly common, to increase the speed significantly, and therefore the accuracy or vocabulary size, of a speech recognizer. To cover the necessary background, we begin with a tutorial on speech recognition. We then describe the parallelization of an existing high-quality speech recognizer, achieving a speedup of a factor of 3, 5, and 6 on 4-, 8-, and 12-processors respectively for the benchmark North American business news (NAB) recognition task. 相似文献
10.
In some cases, to make a proper translation of an utterance in a dialogue, different pieces of contextual information are needed. Interpreting such utterances often requires dialogue analysis including speech acts and discourse analysis. In this paper, a statistical dialogue analysis model for Korean–English dialogue machine translation based on speech acts is proposed. The model uses syntactic patterns and n-grams of speech acts. The syntactic patterns include surface syntactic features which are related to the language-dependent expressions of speech acts. Speech-act n-grams are used to approximate the context of utterances. The key feature is the use of speech-act n-grams based on hierarchical recency. Experimental results with trigrams show that the proposed model achieves an accuracy of 66.87% for the top candidate and 82.35% for the top three candidates. It indicates that the proposed model based on hierarchical recency outperforms the model based on linear recency. 相似文献
11.
Mark Seligman 《Machine Translation》2000,15(1-2):149-186
This paper sketches research in nine areas related to spoken language translation: interactive disambiguation (two demonstrations of highly interactive, broad-coverage speech translation are reported); system architecture; data structures; the interface between speech recognition and analysis; the use of natural pauses for segmenting utterances; example-based machine translation; dialogue acts; the tracking of lexical co-occurrences; and the resolution of translation mismatches. 相似文献
12.
抗噪声语音识别及语音增强算法的应用 总被引:1,自引:0,他引:1
提高语音识别系统的鲁棒性是语音识别技术一个重要的研究课题。语音识别系统往往由于训练环境下的数据和识别环境下的数据不匹配造成系统的识别性能下降,为了让语音识别系统在含噪的环境下获得令人满意的工作性能,该文根据人耳听觉特性提出了一种鲁棒语音特征提取方法。在MFCC特征提取之前先对含噪语音特征进行掩蔽特性处理,同时结合语音增强方法对特征进行处理,最后得到鲁棒语音特征。通过4种不同试验结果分析表明,将这种方法用于抗噪声分析可以提高系统的抗噪声能力;同时这种特征的处理方法对不同噪声在不同信噪比有很好的适应性。 相似文献
13.
介绍了一种基于词网的最大似然线性回归(Lattice-MLLR)无监督自适应算法,并进行了改进。Lattice-MLLR是根据解码得到的词网估计MLLR变换参数,词网的潜在误识率远小于识别结果,因此可以使参数估计更为准确。Lattice-MLLR的一个很大缺点是计算量极大,较难实用,对此本文提出了两个改进技术:(1)利用后验概率压缩词网;(2)利用单词的时间信息限制状态统计量的计算范围。实验测定Lattice-MLLR的误识率比传统MLLR相对下降了3.5%,改进技术使Lattice-MLLR计算量下降幅度超过了87.9%。 相似文献
14.
15.
How much does knowledge regarding a certain spoken word or phrase help with its localization? This is a very fundamental question for speech processing, and will be partially addressed in this paper. In particular, this work will utilize prior information regarding the contents of a speech signal in order to improve the artificial localization of it using Time delay of arrival (TDOA) between two microphones. The prior information, which is used to develop a very simple frequency-selective phase transform (FPT), increases the effective SNR by only using a subset of the highest SNR frequencies in the Phase Transform. Simulations in a reverberant environment show that the proposed approach can more robustly and accurately localize speech sources. For 20 ms signal segments, it is shown that using a subset of 45 percent of available speech frequency bins is superior to using 30, 60, or 100, where using 100 corresponds to the standard Phase Transform. 相似文献
16.
17.
18.
De Wachter M. Matton M. Demuynck K. Wambacq P. Cools R. Van Compernolle D. 《IEEE transactions on audio, speech, and language processing》2007,15(4):1377-1390
Despite their known weaknesses, hidden Markov models (HMMs) have been the dominant technique for acoustic modeling in speech recognition for over two decades. Still, the advances in the HMM framework have not solved its key problems: it discards information about time dependencies and is prone to overgeneralization. In this paper, we attempt to overcome these problems by relying on straightforward template matching. The basis for the recognizer is the well-known DTW algorithm. However, classical DTW continuous speech recognition results in an explosion of the search space. The traditional top-down search is therefore complemented with a data-driven selection of candidates for DTW alignment. We also extend the DTW framework with a flexible subword unit mechanism and a class sensitive distance measure-two components suggested by state-of-the-art HMM systems. The added flexibility of the unit selection in the template-based framework leads to new approaches to speaker and environment adaptation. The template matching system reaches a performance somewhat worse than the best published HMM results for the Resource Management benchmark, but thanks to complementarity of errors between the HMM and DTW systems, the combination of both leads to a decrease in word error rate with 17% compared to the HMM results 相似文献
19.