共查询到20条相似文献,搜索用时 109 毫秒
1.
汉语大词汇量连续语音识别系统研究进展 总被引:34,自引:3,他引:34
本文综述了近年来大词汇量连续语音识别技术进步和发展,描述了大词汇量连续汉语语音识别系统的设计方法.对语音识别系统中的一些关键技术和原理进行了详细地分析和讨论,并对语音识别技术进一步发展中存在的问题和近年语音识别研究发展动向进行了讨论. 相似文献
2.
汉语语音理解系统的任务之一是把语音识别系统获得的汉语单音节转换成正确的汉字、词,乃至汉语的短语、语句,与语音识别系统一起,完成一个语音到文本(speech to text)的转换系统。本文利用一个闭环反馈方式汉语语音识别理解方案,在汉语词识别理解的基础上,进一步实现对汉语结构性短语的识别理解,获得了预期的结果。最后本文对实验结果和反馈式语音识别理解方案进行了讨论。 相似文献
3.
论文研究了汉语小词汇表语音识别算法的基本原理,提出了具有鲁棒性的两级端点检测语音识别技术,在语音信号采集时,根据过零率、短时能量对数据进行提取并压缩,采用了多模板匹配算法识别。硬件采用51内核单片机,用较少的存储空间和计算空间实现语音数据处理,不需要额外的器件。实验用20个字的汉语小词汇量系统进行了测试,识别成功率大于90%,显示该算法比通常采用的算法性能更好。 相似文献
4.
汉语语音理解系统的任务之一是把语音识别系统获得的汉语单单节转换成正确的汉字,词乃至汉语的短语,语句,与误音识别系统一起,完成一个语音到文本(speechtotext)的转换系统,本文利用一个闭环反馈方式汉语语音识别理解方案,在汉语词识别理解的基础上,进一步实现时汉语结构性短的识别理解,获得了预期的结果,最后本文对实验结果和反馈式语音识别理解方案进行了讨论。 相似文献
5.
6.
本语文介绍了一个可以脱离计算机的小词汇表语音识别系统,提出了一种新的神经网络结构并采用了模糊逻辑来实现该系统,使之 较大的实际环境中进行不定人的实时语音识别,识别率为90%。 相似文献
7.
基于TMS320C54x DSP的实时语音识别系统 总被引:6,自引:0,他引:6
介绍一个非特定人、小词汇表、孤立词的语音识别系统,它采用基于隐马尔可夫随机模型(HMM)的语音信号端点检测方法和基于VQ/HMM的自学习语音识别算法,同时以高速的TMS320C54xDSP芯片为核心进行硬件设计,实现语音的实时识别。 相似文献
8.
9.
10.
《卫星电视与宽带多媒体》2011,(7):36-39
语音识别是以语音为研究对象,通过语音信号处理和模式识别让机器自动识别和理解人类口述的语言。语音识别技术就是让机器通过识别和理解过程把语音信号转变为相应的文本或命令的高技术。语音识别是一门涉及面很广的交叉学科,它与声学、语音学、语言学、信息理论、模式识别理论以及神经生物学等学科都有非常密切的关系。语音识别技术正逐步成为计算机信息处理技术中的关键技术,语音技术的应用已经成为一个具有竞争性的新兴高技术产业。 相似文献
11.
12.
The author presents a study of large-vocabulary continuous Mandarin speech recognition based on a segmental probability model (SPM) approach. The SPM was found to be very suitable for recognition of isolated Mandarin syllables especially considering the monosyllabic structure of the Chinese language. To extend the application of the model to continuous Mandarin speech recognition, a concatenated syllable matching (CSM) algorithm in place of the conventional Viterbi search algorithm is first introduced. Also, to utilise the available training material efficiently, a training procedure is proposed to re-estimate the SPM parameters using the maximum a posteriori (MAP) algorithm. A few special techniques integrating acoustic and linguistic knowledge are developed further to improve the performance step by step. Preliminary experimental results show that the final achievable rate is as high as 91.62%, which indicates a 18.48% error rate reduction and more than three times faster than the well studied subsyllable-based CHMM 相似文献
13.
Progress in dynamic programming search for LVCSR 总被引:2,自引:0,他引:2
Ney H. Ortmanns S. 《Proceedings of the IEEE. Institute of Electrical and Electronics Engineers》2000,88(8):1224-1240
Initially introduced in the late 1960s and early 1970s, dynamic programming algorithms have become increasingly popular in automatic speech recognition. There are two reasons why this has occurred. First, the dynamic programming strategy can be combined with a very efficient and practical pruning strategy so that very large search spaces can be handled. Second, the dynamic programming strategy has turned out to be extremely flexible in adapting to new requirements. Examples of such requirements are the lexical tree organization of the pronunciation lexicon and the generation of a word graph instead of the single best sentence. We attempt to review the use of dynamic programming search strategies for large-vocabulary continuous speech recognition (LVCSR). The following methods are described in detail: searching using a lexical tree, language-model look-ahead and word-graph generation 相似文献
14.
Quan Zhou Ning Jiang Englehart K. Hudgins B. 《IEEE transactions on bio-medical engineering》2009,56(8):2016-2023
This paper introduces an enhanced phoneme-based myoelectric signal (MES) speech recognition system. The system can recognize new words without retraining the phoneme classifier, which is considered to be the main advantage of phoneme-based speech recognition. It is shown that previous systems experience severe performance degradation when new words are added to a testing dataset. To maintain high accuracy with new words, several improvements are proposed. In the proposed MES speech recognition approach, the raw MES is processed by class-specific rotation matrices to spatially decorrelate the data prior to feature extraction in a preprocessing stage. Then, an uncorrelated linear discriminant analysis is used for dimensionality reduction. The resulting data are classified through a hidden Markov model classifier to obtain the phonemic log likelihoods of the phonemes, which are mapped to corresponding words using a word classifier. An average word classification accuracy of 98.533% is achieved over six subjects. The system offers dramatically improved accuracy when expanding a vocabulary, offering promise for robust large-vocabulary myoelectric speech recognition. 相似文献
15.
Steve Young 《Signal Processing Magazine, IEEE》1996,13(5):45
Considerable progress has been made in speech-recognition technology and nowhere has this progress been more evident than in the area of large-vocabulary recognition (LVR). Laboratory systems are capable of transcribing continuous speech from any speaker with average word-error rates between 5% and 10%. If speaker adaptation is allowed, then after 2 or 3 minutes of speech, the error rate will drop well below 5% for most speakers. LVR systems had been limited to dictation applications since the systems were speaker dependent and required words to be spoken with a short pause between them. However, the capability to recognize natural continuous-speech input from any speaker opens up many more applications. This article discusses the principles and architecture of LVR systems and identifies the key issues affecting their future deployment. To illustrate the various points raised, the Cambridge University HTK system is described. This system is a modern design that gives state-of-the-art performance, and it is typical of the current generation of recognition systems 相似文献
16.
《Signal Processing Magazine, IEEE》2005,22(5):70-80
This article has described LSM, a data-driven framework for modeling globally meaningful relationships implicit in large volumes of data. LSM generalizes a paradigm originally developed to capture hidden word patterns in a text document corpus. Over the past decade, this paradigm has proven effective in an increasing variety of fields, gradually spreading from query-based information retrieval to word clustering, document/topic clustering, large-vocabulary speech recognition language modeling, automated call routing, semantic inference for spoken interface control, and several other speech processing applications. 相似文献
17.
Recent advances in the automatic recognition of audiovisual speech 总被引:11,自引:0,他引:11
Potamianos G. Neti C. Gravier G. Garg A. Senior A.W. 《Proceedings of the IEEE. Institute of Electrical and Electronics Engineers》2003,91(9):1306-1326
Visual speech information from the speaker's mouth region has been successfully shown to improve noise robustness of automatic speech recognizers, thus promising to extend their usability in the human computer interface. In this paper, we review the main components of audiovisual automatic speech recognition (ASR) and present novel contributions in two main areas: first, the visual front-end design, based on a cascade of linear image transforms of an appropriate video region of interest, and subsequently, audiovisual speech integration. On the latter topic, we discuss new work on feature and decision fusion combination, the modeling of audiovisual speech asynchrony, and incorporating modality reliability estimates to the bimodal recognition process. We also briefly touch upon the issue of audiovisual adaptation. We apply our algorithms to three multisubject bimodal databases, ranging from small- to large-vocabulary recognition tasks, recorded in both visually controlled and challenging environments. Our experiments demonstrate that the visual modality improves ASR over all conditions and data considered, though less so for visually challenging environments and large vocabulary tasks. 相似文献
18.
The authors describe an architecture and search organization for continuous speech recognition. The recognition module is part of the Siemens-Philips-Ipo project on continuous speech recognition and understanding (SPICOS) system for the understanding of database queries spoken in natural language. The goal of this project is a man-machine dialogue system that is able to understand fluently spoken German sentences and thus to provide voice access to a database. The recognition strategy is based on Bayes decision rule and attempts to find the best interpretation of the input speech data in terms of knowledge sources such as a language model, pronunciation lexicon, and inventory of subword units. The implementation of the search has been tested on a continuous speech database comprising up to 4000 words for each of several speakers. The efficiency and robustness of the search organization have been checked and evaluated along many dimensions, such as different speakers, phoneme models, and language models 相似文献
19.
Deng L. Kenny P. Lennig M. Gupta V. Seitz F. Mermelstein P. 《Signal Processing, IEEE Transactions on》1991,39(7):1677-1681
The authors demonstrate the effectiveness of phonemic hidden Markov models with Gaussian mixture output densities (mixture HMMs) for speaker-dependent large-vocabulary word recognition. Speech recognition experiments show that for almost any reasonable amount of training data, recognizers using mixture HMMs consistently outperform those employing unimodal Gaussian HMMs. With a sufficiently large training set (e.g. more than 2500 words), use of HMMs with 25-component mixture distributions typically reduces recognition errors by about 40%. It is also found that the mixture HMMs outperform a set of unimodal generalized triphone models having the same number of parameters. Previous attempts to employ mixture HMMs for speech recognition proved discouraging because of the high complexity and computational cost in implementing the Baum-Welch training algorithm. It is shown how mixture HMMs can be implemented very simply in unimodal transition-based frameworks by allowing multiple transitions from one state to another 相似文献
20.
Li Y Chen X Zhang X Wang K Wang ZJ 《IEEE transactions on bio-medical engineering》2012,59(10):2695-2704
Identification of constituent components of each sign gesture can be beneficial to the improved performance of sign language recognition (SLR), especially for large-vocabulary SLR systems. Aiming at developing such a system using portable accelerometer (ACC) and surface electromyographic (sEMG) sensors, we propose a framework for automatic Chinese SLR at the component level. In the proposed framework, data segmentation, as an important preprocessing operation, is performed to divide a continuous sign language sentence into subword segments. Based on the features extracted from ACC and sEMG data, three basic components of sign subwords, namely the hand shape, orientation, and movement, are further modeled and the corresponding component classifiers are learned. At the decision level, a sequence of subwords can be recognized by fusing the likelihoods at the component level. The overall classification accuracy of 96.5% for a vocabulary of 120 signs and 86.7% for 200 sentences demonstrate the feasibility of interpreting sign components from ACC and sEMG data and clearly show the superior recognition performance of the proposed method when compared with the previous SLR method at the subword level. The proposed method seems promising for implementing large-vocabulary portable SLR systems. 相似文献