首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
Audio-visual speech recognition (AVSR) using acoustic and visual signals of speech has received attention because of its robustness in noisy environments. In this paper, we present a late integration scheme-based AVSR system whose robustness under various noise conditions is improved by enhancing the performance of the three parts composing the system. First, we improve the performance of the visual subsystem by using the stochastic optimization method for the hidden Markov models as the speech recognizer. Second, we propose a new method of considering dynamic characteristics of speech for improved robustness of the acoustic subsystem. Third, the acoustic and the visual subsystems are effectively integrated to produce final robust recognition results by using neural networks. We demonstrate the performance of the proposed methods via speaker-independent isolated word recognition experiments. The results show that the proposed system improves robustness over the conventional system under various noise conditions without a priori knowledge about the noise contained in the speech.   相似文献   

基于模糊支持向量机的语音识别方法   总被引:11,自引:0,他引:11  
通过计算输入样本的模糊隶属度,探讨了模糊支持向量机(FSVM)的原理,应用其对语音信号进行识别。并和RBF神经网络、支持向量机(SVM)的识别效果进行了比较。在仿真实验中,采用小波分析方法提取语音特征向量,识别结果表明,SVM和FSVM比RBF网络具有较好的泛化性能,训练时间也大大缩减。此外,FSVM比SVM有更强的抵抗噪声的能力。  相似文献   

为了得到较好的语音识别效果,构建了基于线性核函数支持向量机的非特定人孤立词语音识别系统,取得了较高的识别率,并将该实验结果同基于HMM的识别结果进行了比较,显示出了支持向量机在基于有限样本情况下进行语音识别的优势。  相似文献   

基于动静态组合特征参数的语音识别   总被引:1,自引:0,他引:1  
基于语音信号的时变特性,本文提出了动静态特征参数结合的语音信号识别方法,首先在特征参数提取中引入了小波包变换,借助MFCC(Mel-Frequency Cepstrum Coefficient)参数的提取方法,用小波包变换代替傅立叶变换和Mel滤波器组,提取了新的静态特征参数DWPTMFCC(Discrete Wavelet Packet Transform Mel-Frequency Coefficient),然后把它与一阶DWPTMFCC差分参数相结合成一个向量,作为一帧语音信号的参数,通过试验和仿真,此参数具有很高的识别率,是一种很好的语音特征参数.并且把混沌特性引入到神经元,构成混沌神经网络,把这种神经网络用于语音识别,并与常用的BP神经网络识别方法进行了比较.试验结果表明,混沌神经网络的平均识别率要高于同等条件下常用的神经网络方法的识别率.  相似文献   

基于改进对比散度的GRBM语音识别   总被引:1,自引:0,他引:1  
对比散度作为训练受限波尔兹曼机模型的主流技术之一,在实验训练中具有较好的测试效果。通过结合指数平均数指标算法和并行回火的思想,提出一种改进对比散度的训练算法,包括模型参数的更新和样本数据的采样,并将改进后的训练算法应用于高斯伯努利受限玻尔兹曼机( GRBM)中训练语音识别模型参数。在TI-Digits数字语音训练和数字测试数据库上的实验结果表明,采用改进的对比散度训练的GRBM明显优于传统的模型训练算法,语音识别率能够达到80%左右,最高提升7%左右,而且应用改进算法训练的其他GRBM对比模型的语音识别率也都有所提高,具有较好的识别性能。  相似文献   

远程命令识别与解析是嵌入式环境中终端-控制台和上-下位机模式实现远程管控的基础和关键.文中分析水下探测智能终端的工作过程,提出了一种基于有限状态自动机的远程命令识别与解析方法,智能终端可以根据工作状态自动机模型对远程命令进行快速、准确地响应,避免了复杂的计算和繁琐的决策过程.实验发现,水下探测智能终端及时识别出控制台发送的管控指令,按要求转入相应的工作状态,该方法有效地提高了水下探测智能终端机的工作性能.  相似文献   

日语中谓词语态有不同的词尾变形,其中被动态和可能态具有相同的词尾变化,在统计机器翻译中难以对其正确区分及翻译。因此,该文提出一种利用最大熵模型有效地对日语可能态和被动态进行分类,然后把日语的可能态和被动态特征有效地融合到对数线性模型中改进翻译模型的方法,以提高可能态和被动态翻译规则选择的准确性。实验结果表明,该方法可以有效提升日语可能态和被动态句子的翻译质量,在大规模日汉语料上,最高翻译BLEU值能够由41.50提高到42.01,并在人工评测中,翻译结果的整体可理解度得到了2.71%的提升。  相似文献   

Computer speech recognition has been very successful in limited domains and for isolated word recognition. However, widespread use of large-vocabulary continuous-speech recognizers is limited by the speed of current recognizers, which cannot reach acceptable error rates while running in real time. This paper shows how to harness shared memory multiprocessors, which are becoming increasingly common, to increase the speed significantly, and therefore the accuracy or vocabulary size, of a speech recognizer. To cover the necessary background, we begin with a tutorial on speech recognition. We then describe the parallelization of an existing high-quality speech recognizer, achieving a speedup of a factor of 3, 5, and 6 on 4-, 8-, and 12-processors respectively for the benchmark North American business news (NAB) recognition task.  相似文献   

In some cases, to make a proper translation of an utterance in a dialogue, different pieces of contextual information are needed. Interpreting such utterances often requires dialogue analysis including speech acts and discourse analysis. In this paper, a statistical dialogue analysis model for Korean–English dialogue machine translation based on speech acts is proposed. The model uses syntactic patterns and n-grams of speech acts. The syntactic patterns include surface syntactic features which are related to the language-dependent expressions of speech acts. Speech-act n-grams are used to approximate the context of utterances. The key feature is the use of speech-act n-grams based on hierarchical recency. Experimental results with trigrams show that the proposed model achieves an accuracy of 66.87% for the top candidate and 82.35% for the top three candidates. It indicates that the proposed model based on hierarchical recency outperforms the model based on linear recency.  相似文献   

This paper sketches research in nine areas related to spoken language translation: interactive disambiguation (two demonstrations of highly interactive, broad-coverage speech translation are reported); system architecture; data structures; the interface between speech recognition and analysis; the use of natural pauses for segmenting utterances; example-based machine translation; dialogue acts; the tracking of lexical co-occurrences; and the resolution of translation mismatches.  相似文献   

抗噪声语音识别及语音增强算法的应用   总被引:1,自引:0,他引:1  
汤玲  戴斌 《计算机仿真》2006,23(9):80-82,143
提高语音识别系统的鲁棒性是语音识别技术一个重要的研究课题。语音识别系统往往由于训练环境下的数据和识别环境下的数据不匹配造成系统的识别性能下降,为了让语音识别系统在含噪的环境下获得令人满意的工作性能,该文根据人耳听觉特性提出了一种鲁棒语音特征提取方法。在MFCC特征提取之前先对含噪语音特征进行掩蔽特性处理,同时结合语音增强方法对特征进行处理,最后得到鲁棒语音特征。通过4种不同试验结果分析表明,将这种方法用于抗噪声分析可以提高系统的抗噪声能力;同时这种特征的处理方法对不同噪声在不同信噪比有很好的适应性。  相似文献   

介绍了一种基于词网的最大似然线性回归(Lattice-MLLR)无监督自适应算法,并进行了改进。Lattice-MLLR是根据解码得到的词网估计MLLR变换参数,词网的潜在误识率远小于识别结果,因此可以使参数估计更为准确。Lattice-MLLR的一个很大缺点是计算量极大,较难实用,对此本文提出了两个改进技术:(1)利用后验概率压缩词网;(2)利用单词的时间信息限制状态统计量的计算范围。实验测定Lattice-MLLR的误识率比传统MLLR相对下降了3.5%,改进技术使Lattice-MLLR计算量下降幅度超过了87.9%。  相似文献   

语音识别技术分析及展望   总被引:2,自引:0,他引:2  
通过对语音识别原理进行研究与探讨,总结目前语音识别领域的技术,分析市场上的语音识别产品类型并展望语音识别在商业领域应用的发展前景.  相似文献   

How much does knowledge regarding a certain spoken word or phrase help with its localization? This is a very fundamental question for speech processing, and will be partially addressed in this paper. In particular, this work will utilize prior information regarding the contents of a speech signal in order to improve the artificial localization of it using Time delay of arrival (TDOA) between two microphones. The prior information, which is used to develop a very simple frequency-selective phase transform (FPT), increases the effective SNR by only using a subset of the highest SNR frequencies in the Phase Transform. Simulations in a reverberant environment show that the proposed approach can more robustly and accurately localize speech sources. For 20 ms signal segments, it is shown that using a subset of 45 percent of available speech frequency bins is superior to using 30, 60, or 100, where using 100 corresponds to the standard Phase Transform.  相似文献   

语音识别使声音变得"可读",让计算机能够"听懂"人类的语言并做出反应,是人工智能实现人机交互的关键技术之一.本文介绍了语音识别的发展历程,阐述了语音识别的原理概念与基础框架,分析了语音识别领域的研究热点和难点,最后,对语音识别技术进行了总结并就其未来研究进行了展望.  相似文献   

Despite their known weaknesses, hidden Markov models (HMMs) have been the dominant technique for acoustic modeling in speech recognition for over two decades. Still, the advances in the HMM framework have not solved its key problems: it discards information about time dependencies and is prone to overgeneralization. In this paper, we attempt to overcome these problems by relying on straightforward template matching. The basis for the recognizer is the well-known DTW algorithm. However, classical DTW continuous speech recognition results in an explosion of the search space. The traditional top-down search is therefore complemented with a data-driven selection of candidates for DTW alignment. We also extend the DTW framework with a flexible subword unit mechanism and a class sensitive distance measure-two components suggested by state-of-the-art HMM systems. The added flexibility of the unit selection in the template-based framework leads to new approaches to speaker and environment adaptation. The template matching system reaches a performance somewhat worse than the best published HMM results for the Resource Management benchmark, but thanks to complementarity of errors between the HMM and DTW systems, the combination of both leads to a decrease in word error rate with 17% compared to the HMM results  相似文献   

统计机器翻译可以通过统计方法预测出目标词,但没有充分理解原文语义关系,因而得到的译文质量不高。针对该问题,利用一种基于门控单元循环神经网络结构来对蒙汉神经机器翻译系统进行建模,引入注意力机制来获取双语词语的对齐信息,并在构建字典过程中对双语词语进行词性标注来强化语义,以此来缓解因欠训练导致的错译问题。实验结果表明,与RNN的基准系统和传统的统计机器翻译方法相比,该方法BLEU值得到一定的提升。  相似文献   

电话语音识别系统   总被引:5,自引:1,他引:5  
讨论了一个典型的电话语音识别系统的设计与实现。首先提出了系统的整体模型,之后给出了系统的整体结构和系统的主要模块。较详细地介绍了声学层识别的速度优化和电话信道补偿问题,以及语言模型的容错算法和基于语义依存关系的容错理解模型,并对系统各层次模块性能进行了实验测试。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号