首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Most current attempts at automatic speech recognition are formulated in an artificial intelligence framework. In this paper we approach the problem from an information-theoretic point of view. We describe the overall structure of a linguistic statistical decoder (LSD) for the recognition of continuous speech. The input to the decoder is a string of phonetic symbols estimated by an acoustic processor (AP). For each phonetic string, the decoder finds the most likely input sentence. The decoder consists of four major subparts: 1) a statistical model of the language being recognized; 2) a phonemic dictionary and statistical phonological rules characterizing the speaker; 3) a phonetic matching algorithm that computes the similarity between phonetic strings, using the performance characteristics of the AP; 4) a word level search control. The details of each of the subparts and their interaction during the decoding process are discussed.  相似文献   

2.
就语音识别中所用到的语言模型进行了详细阐述,对语言模型中涉及到的N-gram模型进行了解析,以及对在训练语言模型过程中遇到的零概率问题相应的平滑处理方法进行了讲解。利用N-gram训练的语言模型运用到语音识别中,取得了相当好的效果。  相似文献   

3.
Dynamic programming search for continuous speech recognition   总被引:2,自引:0,他引:2  
The authors gives a unifying view of the dynamic programming approach to the search problem. They review the search problem from the statistical point-of-view and show how the search space results from the acoustic and language models required by the statistical approach. Starting from the baseline one-pass algorithm using a linear organization of the pronunciation lexicon, they have extended the baseline algorithm toward various dimensions. To handle a large vocabulary, they have shown how the search space can be structured in combination with a lexical prefix tree organization of the pronunciation lexicon. In addition, they have shown how this structure of the search space can be combined with a time-synchronous beam search concept and how the search space can be constructed dynamically during the recognition process. In particular, to increase the efficiency of the beam search concept, they have integrated the language model look-ahead into the pruning operation. To produce sentence alternatives rather than only the single best sentence, they have extended the search strategy to generate a word graph. Finally, they have reported experimental results on a 64 k-word task that demonstrate the efficiency of the various search concepts presented  相似文献   

4.
Continuous speech recognition by statistical methods   总被引:3,自引:0,他引:3  
Statistical methods useful in automatic recognition of continuous speech are described. They concern modeling of a speaker and of an acoustic processor, extraction of the models' statistical parameters and hypothesis search procedures and likelihood computations of linguistic decoding. Experimental results are presented that indicate the power of the methods.  相似文献   

5.
The authors describe an architecture and search organization for continuous speech recognition. The recognition module is part of the Siemens-Philips-Ipo project on continuous speech recognition and understanding (SPICOS) system for the understanding of database queries spoken in natural language. The goal of this project is a man-machine dialogue system that is able to understand fluently spoken German sentences and thus to provide voice access to a database. The recognition strategy is based on Bayes decision rule and attempts to find the best interpretation of the input speech data in terms of knowledge sources such as a language model, pronunciation lexicon, and inventory of subword units. The implementation of the search has been tested on a continuous speech database comprising up to 4000 words for each of several speakers. The efficiency and robustness of the search organization have been checked and evaluated along many dimensions, such as different speakers, phoneme models, and language models  相似文献   

6.
Neural networks for vector quantization of speech and images   总被引:6,自引:0,他引:6  
Using neural networks for vector quantization (VQ) is described. The authors show how a collection of neural units can be used efficiently for VQ encoding, with the units performing the bulk of the computation in parallel, and describe two unsupervised neural network learning algorithms for training the vector quantizer. A powerful feature of the new training algorithms is that the VQ codewords are determined in an adaptive manner, compared to the popular LBG training algorithm, which requires that all the training data be processed in a batch mode. The neural network approach allows for the possibility of training the vector quantizer online, thus adapting to the changing statistics of the input data. The authors compare the neural network VQ algorithms to the LBG algorithm for encoding a large database of speech signals and for encoding images  相似文献   

7.
8.
The past decade (1990-2000) has witnessed substantial advances in speech recognition technology, which when combined with the increase in computational power and storage capacity has resulted in a variety of commercial products already or soon to be on the market. The authors review the state of the art in core technology, large vocabulary continuous speech recognition, with a view toward highlighting recent advances. We then highlight issues in moving toward applications, discussing system efficiency, portability across languages and tasks, and enhancing the system output by adding tags and nonlinguistic information. Current performance in speech recognition and outstanding challenges for three classes of applications (dictation, audio indexation, and spoken language dialogue systems), are discussed  相似文献   

9.
In this paper, a Spatiotemporal Probabilistic Neural Network (SPNN) is proposed for spatiotemporal pattern recognition. This new model is developed by applying the concept of Gaussian density function to the network structure of the SPR (Spatiotemporal Pattern Recognition). The main advantages of this model include faster training and recalling process for patterns. In addition, the overall architecture is also simple, modular, regular, locally connected, and suitable for VLSI implementation. One set of independent speaker isolated (Mandarin digit) speech database is used as an example to demonstrate the superiority of the neural networks for spatiotemporal pattern recognition. The testing result with a reduced error rate of 7% shows that the SPNN is very attractive and effective for practical applications. p ]The CMOS current-mode IC technology is used to implement the SPNN to achieve the objective of minimum classification error in a more direct manner. In this design, neural computation is performed in analog circuits while template information is stored in digital circuits. The prototyping speech recognition processor for the 12th LPC calculation is designed by 1.2μm CMOS technology. The HSPICE simulation results are also presented, which verifies the function of the designed neural system.  相似文献   

10.
This paper describes a method for recognizing Chinese tones in continuous speech. The first and second order differentials of the fundamental frequency logarithmically converted are used as feature parameters. A left-to-right hidden Markov modeling with five states, each of which is modeled by a single Gaussian distribution, expresses each of Chinese tones. Non-voiced portions are coded by random values normally distributed to uniformly deal with all the time frames in an utterance. Speaker dependent tone recognition was conducted for ten speakers. The average rate of 81.8% was obtained for these speakers.  相似文献   

11.
详细介绍一种基于神经网络的自学习非特定人语音识别方法,首次介绍一种语音识别知识的自动检验方法——LVV法,给出系统原理图和知识库的自动完善原理;介绍一种LEA判别法,实现梯度牛顿有效结合神经网络快速学习方法,并给出了实验结果。  相似文献   

12.
The evaluation of the degree of speech impairment and the utility of computer recognition of impaired speech are separately and independently performed. Particular attention is paid to the question concerning whether or not there is a relationship between naive listeners' subjective judgments of impaired speech and the performance of a laboratory version of a speech recognition system. It is a difficult task to relate a speech impairment rating with speech recognition accuracy. Towards this end, a statistical causal model is proposed. This model is very appealing in its structure to support inference, and thus can be applied to perform various assessments such as the success of automatic recognition of dysarthric speech. The application of this model is illustrated with a case study of a dysarthric speaker compared against a normal speaker serving as a control  相似文献   

13.
Point process model keyword spotting (KWS) system has attracted considerable attentions in the areas of keyword spotting by its capacity that can generalize from a relatively small numbers of training examples. But unfortunately, the accuracy level of the point process model is not comparable with the state‐of‐the‐art KWS systems because of the poor modeling capacity of the phoneme detector, which are based on Gaussian Mixture Models. In this paper, focus on improving the performance of detector in point process model, we propose an enhanced version of point process model, which is based on hierarchical deep belief networks (DBNs). Hierarchical DBNs are used as the phoneme detector in this system, and they combine the advantages of both the DBN and the hierarchical architecture for capturing complex statistical patterns in speech while overcoming the inherent flaws of conventional hidden Markov models and multilayer layer perceptron. Experiments results on TIMIT database show that the proposed method can yield 2% improvement. Furthermore, in the case when training examples are extremely limited, it can achieve better results over state‐of‐the‐art KWS systems. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

14.
Fallside  F. Brooks  S. 《Electronics letters》1976,12(20):515-516
As an alternative to the spectrograph technique for speech analysis, an areagraph technique is presented in which the instantaneous vocal-tract area function (derived from linear prediction analysis) is plotted against time with distance along the tract as the y-ordinate and area denoted by intensity modulation. Since the display is related to a physical quantity, it has a number of advantages over the spectrograph. An application to speech training is described.  相似文献   

15.
A wide variety of speech recognition distortion measures have been proposed and tested, including some especially effective ones. It is shown that there is a general framework, based on the concepts of information theory, linking most of these measures. The distortion measure between any two speech spectra can be defined in terms of the distortions between the associated probability distributions. This general framework defines three broad families of distortion measures for speech recognition and provides a consistent way of combining the energy and the spectral information of a phonetic event. In addition, the cepstral-domain representation for several distortion measures is derived, allowing comparison of these measures in a domain that also yields convenient equations for their practical implementation  相似文献   

16.
在智能人-机交互系统中,语音信号的情感分类是目前热点的研究领域,并且得到了广泛的应用.本文提出一种基于特征提取和借助支持向量机(support vector machine,SVM)分类器(classifier)的情感互相关性的方法,并应用于情感语音识别.利用这种方法对3种情感语音信号进行情感分类.SVM分类器是利用情感语音信号中情感互相关性的特征提取进行分类的.这种通过 SVM 分类器的情感互相关性的自动分类方法,可以将情感识别率大幅提高,并且在识别愤怒情感时的准确率可以达到95.04%.  相似文献   

17.
The possibility of enhancing speech-recognition efficiency by using the supplemented-vocabulary method is studied. The minimum-information-mismatch criterion is proposed for selecting one; two; or, in a general case, several realizations of recognition words to be added to a working vocabulary. By use a particular practical example, it is shown that the positive effect achieved does not substantially weight the vocabulary and enhance the computational complexity.  相似文献   

18.
Graphical model architectures for speech recognition   总被引:3,自引:0,他引:3  
This article discusses the foundations of the use of graphical models for speech recognition as presented in J. R. Deller et al. (1993), X. D. Huang et al. (2001), F. Jelinek (19970, L. R. Rabiner and B. -H. Juang (1993) and S. Young et al. (1990) giving detailed accounts of some of the more successful cases. Our discussion employs dynamic Bayesian networks (DBNs) and a DBN extension using the Graphical Model Toolkit's (GMTK's) basic template, a dynamic graphical model representation that is more suitable for speech and language systems. While this article concentrates on speech recognition, it should be noted that many of the ideas presented here are also applicable to natural language processing and general time-series analysis.  相似文献   

19.
20.
A multi-model approach for noisy speech recognition is proposed. This approach comprised an SVD-based preprocessing front-end and a multi-model HMM recognition structure. It can provide a high recognition rate over a large range of SNRs for speech recognition in wide-band additive noise  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号