首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 109 毫秒
1.
汉语大词汇量连续语音识别系统研究进展   总被引:34,自引:3,他引:34  
刘加 《电子学报》2000,28(1):85-91
本文综述了近年来大词汇量连续语音识别技术进步和发展,描述了大词汇量连续汉语语音识别系统的设计方法.对语音识别系统中的一些关键技术和原理进行了详细地分析和讨论,并对语音识别技术进一步发展中存在的问题和近年语音识别研究发展动向进行了讨论.  相似文献   

2.
汉语语音理解系统的任务之一是把语音识别系统获得的汉语单音节转换成正确的汉字、词,乃至汉语的短语、语句,与语音识别系统一起,完成一个语音到文本(speech to text)的转换系统。本文利用一个闭环反馈方式汉语语音识别理解方案,在汉语词识别理解的基础上,进一步实现对汉语结构性短语的识别理解,获得了预期的结果。最后本文对实验结果和反馈式语音识别理解方案进行了讨论。  相似文献   

3.
马丽静  李红 《电子技术》2012,39(2):13-14,4
论文研究了汉语小词汇表语音识别算法的基本原理,提出了具有鲁棒性的两级端点检测语音识别技术,在语音信号采集时,根据过零率、短时能量对数据进行提取并压缩,采用了多模板匹配算法识别。硬件采用51内核单片机,用较少的存储空间和计算空间实现语音数据处理,不需要额外的器件。实验用20个字的汉语小词汇量系统进行了测试,识别成功率大于90%,显示该算法比通常采用的算法性能更好。  相似文献   

4.
汉语语音理解系统的任务之一是把语音识别系统获得的汉语单单节转换成正确的汉字,词乃至汉语的短语,语句,与误音识别系统一起,完成一个语音到文本(speechtotext)的转换系统,本文利用一个闭环反馈方式汉语语音识别理解方案,在汉语词识别理解的基础上,进一步实现时汉语结构性短的识别理解,获得了预期的结果,最后本文对实验结果和反馈式语音识别理解方案进行了讨论。  相似文献   

5.
基于小词汇表语音识别技术的运动控制系统   总被引:1,自引:1,他引:0  
简要介绍语音识别的原理,基于TMS320LF2407系列16位DSP,设计语音端点检测算法,采用人工耳蜗模型提取语音特征参数,采用改进型动态时间弯折(DTW)算法实现语音参数模板匹配,设计能够实现小词汇表、非特定人、孤立词识别的语音识别系统。并将该语音识别系统应用到运动控制中,试验结果表明,系统正确识别率在93%,具有一定的实用价值。  相似文献   

6.
周利清 《电信科学》1997,13(11):28-31
本语文介绍了一个可以脱离计算机的小词汇表语音识别系统,提出了一种新的神经网络结构并采用了模糊逻辑来实现该系统,使之 较大的实际环境中进行不定人的实时语音识别,识别率为90%。  相似文献   

7.
基于TMS320C54x DSP的实时语音识别系统   总被引:6,自引:0,他引:6  
介绍一个非特定人、小词汇表、孤立词的语音识别系统,它采用基于隐马尔可夫随机模型(HMM)的语音信号端点检测方法和基于VQ/HMM的自学习语音识别算法,同时以高速的TMS320C54xDSP芯片为核心进行硬件设计,实现语音的实时识别。  相似文献   

8.
胡丹  曾庆宁  龙超  黄桂敏 《电视技术》2015,39(24):43-46
针对大词汇量连续语音识别中识别率不高的问题,提出了将语音增强级联在识别系统前端,在语音增强中将谱减法和对数最小均方误差算法(logmmse)与用于噪声估计的最小控制递归平均算法(imcra)相结合。识别系统使用Mel频率倒谱系数(MFCC)提取特征,用隐马尔科夫模型(HMM)训练与识别。实验结果表明,提出的方法最高能使单词识别率提高38.9%,使句子正确率提高21.8%。该方法用于大词汇量连续语音识别是可行的,有效的。  相似文献   

9.
汉语语音理解系统的任务之一是把语音识别系统获得的汉语音节转换成正确的汉字,与汉语语音识别系统一起,完成一个语音-文本(SpeechtoText)的转换系统。本文利用一个有别于语音理解传统方法(1.语言学方法[1],2.统计学方法[2])的新的反馈式语音理解方案进行汉语地名的识别理解,获得了很好的实验结果。本文最后对实验结果和反馈式语音识别理解方案进行了讨论。  相似文献   

10.
语音识别是以语音为研究对象,通过语音信号处理和模式识别让机器自动识别和理解人类口述的语言。语音识别技术就是让机器通过识别和理解过程把语音信号转变为相应的文本或命令的高技术。语音识别是一门涉及面很广的交叉学科,它与声学、语音学、语言学、信息理论、模式识别理论以及神经生物学等学科都有非常密切的关系。语音识别技术正逐步成为计算机信息处理技术中的关键技术,语音技术的应用已经成为一个具有竞争性的新兴高技术产业。  相似文献   

11.
欧智坚  王作英 《电子学报》2003,31(4):608-611
尽管作为当前最为流行的语音识别模型, HMM由于采用状态输出独立同分布假设,忽略了对语音轨迹动态特性的描述.本文基于一个更为灵活的语音描述统计框架—广义DDBHMM,提出了一个具体的多项式拟合语音轨迹模型,以及新的训练和识别算法,更好地刻划了真实的语音特性.本文还给出了一种有效的剪枝算法,得到一个实用化模型.汉语大词汇量非特定人连续语音识别的实验表明,这种剪枝的多项式拟合语音轨迹模型以较少的计算量明显改善了识别系统的性能.  相似文献   

12.
The author presents a study of large-vocabulary continuous Mandarin speech recognition based on a segmental probability model (SPM) approach. The SPM was found to be very suitable for recognition of isolated Mandarin syllables especially considering the monosyllabic structure of the Chinese language. To extend the application of the model to continuous Mandarin speech recognition, a concatenated syllable matching (CSM) algorithm in place of the conventional Viterbi search algorithm is first introduced. Also, to utilise the available training material efficiently, a training procedure is proposed to re-estimate the SPM parameters using the maximum a posteriori (MAP) algorithm. A few special techniques integrating acoustic and linguistic knowledge are developed further to improve the performance step by step. Preliminary experimental results show that the final achievable rate is as high as 91.62%, which indicates a 18.48% error rate reduction and more than three times faster than the well studied subsyllable-based CHMM  相似文献   

13.
Progress in dynamic programming search for LVCSR   总被引:2,自引:0,他引:2  
Initially introduced in the late 1960s and early 1970s, dynamic programming algorithms have become increasingly popular in automatic speech recognition. There are two reasons why this has occurred. First, the dynamic programming strategy can be combined with a very efficient and practical pruning strategy so that very large search spaces can be handled. Second, the dynamic programming strategy has turned out to be extremely flexible in adapting to new requirements. Examples of such requirements are the lexical tree organization of the pronunciation lexicon and the generation of a word graph instead of the single best sentence. We attempt to review the use of dynamic programming search strategies for large-vocabulary continuous speech recognition (LVCSR). The following methods are described in detail: searching using a lexical tree, language-model look-ahead and word-graph generation  相似文献   

14.
This paper introduces an enhanced phoneme-based myoelectric signal (MES) speech recognition system. The system can recognize new words without retraining the phoneme classifier, which is considered to be the main advantage of phoneme-based speech recognition. It is shown that previous systems experience severe performance degradation when new words are added to a testing dataset. To maintain high accuracy with new words, several improvements are proposed. In the proposed MES speech recognition approach, the raw MES is processed by class-specific rotation matrices to spatially decorrelate the data prior to feature extraction in a preprocessing stage. Then, an uncorrelated linear discriminant analysis is used for dimensionality reduction. The resulting data are classified through a hidden Markov model classifier to obtain the phonemic log likelihoods of the phonemes, which are mapped to corresponding words using a word classifier. An average word classification accuracy of 98.533% is achieved over six subjects. The system offers dramatically improved accuracy when expanding a vocabulary, offering promise for robust large-vocabulary myoelectric speech recognition.  相似文献   

15.
Considerable progress has been made in speech-recognition technology and nowhere has this progress been more evident than in the area of large-vocabulary recognition (LVR). Laboratory systems are capable of transcribing continuous speech from any speaker with average word-error rates between 5% and 10%. If speaker adaptation is allowed, then after 2 or 3 minutes of speech, the error rate will drop well below 5% for most speakers. LVR systems had been limited to dictation applications since the systems were speaker dependent and required words to be spoken with a short pause between them. However, the capability to recognize natural continuous-speech input from any speaker opens up many more applications. This article discusses the principles and architecture of LVR systems and identifies the key issues affecting their future deployment. To illustrate the various points raised, the Cambridge University HTK system is described. This system is a modern design that gives state-of-the-art performance, and it is typical of the current generation of recognition systems  相似文献   

16.
This article has described LSM, a data-driven framework for modeling globally meaningful relationships implicit in large volumes of data. LSM generalizes a paradigm originally developed to capture hidden word patterns in a text document corpus. Over the past decade, this paradigm has proven effective in an increasing variety of fields, gradually spreading from query-based information retrieval to word clustering, document/topic clustering, large-vocabulary speech recognition language modeling, automated call routing, semantic inference for spoken interface control, and several other speech processing applications.  相似文献   

17.
Recent advances in the automatic recognition of audiovisual speech   总被引:11,自引:0,他引:11  
Visual speech information from the speaker's mouth region has been successfully shown to improve noise robustness of automatic speech recognizers, thus promising to extend their usability in the human computer interface. In this paper, we review the main components of audiovisual automatic speech recognition (ASR) and present novel contributions in two main areas: first, the visual front-end design, based on a cascade of linear image transforms of an appropriate video region of interest, and subsequently, audiovisual speech integration. On the latter topic, we discuss new work on feature and decision fusion combination, the modeling of audiovisual speech asynchrony, and incorporating modality reliability estimates to the bimodal recognition process. We also briefly touch upon the issue of audiovisual adaptation. We apply our algorithms to three multisubject bimodal databases, ranging from small- to large-vocabulary recognition tasks, recorded in both visually controlled and challenging environments. Our experiments demonstrate that the visual modality improves ASR over all conditions and data considered, though less so for visually challenging environments and large vocabulary tasks.  相似文献   

18.
The authors describe an architecture and search organization for continuous speech recognition. The recognition module is part of the Siemens-Philips-Ipo project on continuous speech recognition and understanding (SPICOS) system for the understanding of database queries spoken in natural language. The goal of this project is a man-machine dialogue system that is able to understand fluently spoken German sentences and thus to provide voice access to a database. The recognition strategy is based on Bayes decision rule and attempts to find the best interpretation of the input speech data in terms of knowledge sources such as a language model, pronunciation lexicon, and inventory of subword units. The implementation of the search has been tested on a continuous speech database comprising up to 4000 words for each of several speakers. The efficiency and robustness of the search organization have been checked and evaluated along many dimensions, such as different speakers, phoneme models, and language models  相似文献   

19.
The authors demonstrate the effectiveness of phonemic hidden Markov models with Gaussian mixture output densities (mixture HMMs) for speaker-dependent large-vocabulary word recognition. Speech recognition experiments show that for almost any reasonable amount of training data, recognizers using mixture HMMs consistently outperform those employing unimodal Gaussian HMMs. With a sufficiently large training set (e.g. more than 2500 words), use of HMMs with 25-component mixture distributions typically reduces recognition errors by about 40%. It is also found that the mixture HMMs outperform a set of unimodal generalized triphone models having the same number of parameters. Previous attempts to employ mixture HMMs for speech recognition proved discouraging because of the high complexity and computational cost in implementing the Baum-Welch training algorithm. It is shown how mixture HMMs can be implemented very simply in unimodal transition-based frameworks by allowing multiple transitions from one state to another  相似文献   

20.
Identification of constituent components of each sign gesture can be beneficial to the improved performance of sign language recognition (SLR), especially for large-vocabulary SLR systems. Aiming at developing such a system using portable accelerometer (ACC) and surface electromyographic (sEMG) sensors, we propose a framework for automatic Chinese SLR at the component level. In the proposed framework, data segmentation, as an important preprocessing operation, is performed to divide a continuous sign language sentence into subword segments. Based on the features extracted from ACC and sEMG data, three basic components of sign subwords, namely the hand shape, orientation, and movement, are further modeled and the corresponding component classifiers are learned. At the decision level, a sequence of subwords can be recognized by fusing the likelihoods at the component level. The overall classification accuracy of 96.5% for a vocabulary of 120 signs and 86.7% for 200 sentences demonstrate the feasibility of interpreting sign components from ACC and sEMG data and clearly show the superior recognition performance of the proposed method when compared with the previous SLR method at the subword level. The proposed method seems promising for implementing large-vocabulary portable SLR systems.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号