期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Design of a linguistic statistical decoder for the recognition of continuous speech

《IEEE transactions on information theory / Professional Technical Group on Information Theory》1975,21(3):250-256

Most current attempts at automatic speech recognition are formulated in an artificial intelligence framework. In this paper we approach the problem from an information-theoretic point of view. We describe the overall structure of a linguistic statistical decoder (LSD) for the recognition of continuous speech. The input to the decoder is a string of phonetic symbols estimated by an acoustic processor (AP). For each phonetic string, the decoder finds the most likely input sentence. The decoder consists of four major subparts: 1) a statistical model of the language being recognized; 2) a phonemic dictionary and statistical phonological rules characterizing the speaker; 3) a phonetic matching algorithm that computes the similarity between phonetic strings, using the performance characteristics of the AP; 4) a word level search control. The details of each of the subparts and their interaction during the decoding process are discussed. 相似文献

2.

语音识别中的统计语言模型研究

《信息技术》2017,(1)

就语音识别中所用到的语言模型进行了详细阐述,对语言模型中涉及到的N-gram模型进行了解析,以及对在训练语言模型过程中遇到的零概率问题相应的平滑处理方法进行了讲解。利用N-gram训练的语言模型运用到语音识别中,取得了相当好的效果。相似文献

3.

Dynamic programming search for continuous speech recognition 总被引：2，自引：0，他引：2

《Signal Processing Magazine, IEEE》1999,16(5):64-83

The authors gives a unifying view of the dynamic programming approach to the search problem. They review the search problem from the statistical point-of-view and show how the search space results from the acoustic and language models required by the statistical approach. Starting from the baseline one-pass algorithm using a linear organization of the pronunciation lexicon, they have extended the baseline algorithm toward various dimensions. To handle a large vocabulary, they have shown how the search space can be structured in combination with a lexical prefix tree organization of the pronunciation lexicon. In addition, they have shown how this structure of the search space can be combined with a time-synchronous beam search concept and how the search space can be constructed dynamically during the recognition process. In particular, to increase the efficiency of the beam search concept, they have integrated the language model look-ahead into the pruning operation. To produce sentence alternatives rather than only the single best sentence, they have extended the search strategy to generate a word graph. Finally, they have reported experimental results on a 64 k-word task that demonstrate the efficiency of the various search concepts presented 相似文献

4.

Continuous speech recognition by statistical methods 总被引：3，自引：0，他引：3

《Proceedings of the IEEE. Institute of Electrical and Electronics Engineers》1976,64(4):532-556

Statistical methods useful in automatic recognition of continuous speech are described. They concern modeling of a speaker and of an acoustic processor, extraction of the models' statistical parameters and hypothesis search procedures and likelihood computations of linguistic decoding. Experimental results are presented that indicate the power of the methods. 相似文献

5.

Data driven search organization for continuous speech recognition

Ney H. Mergel D. Noll A. Paeseler A. 《Signal Processing, IEEE Transactions on》1992,40(2):272-281

The authors describe an architecture and search organization for continuous speech recognition. The recognition module is part of the Siemens-Philips-Ipo project on continuous speech recognition and understanding (SPICOS) system for the understanding of database queries spoken in natural language. The goal of this project is a man-machine dialogue system that is able to understand fluently spoken German sentences and thus to provide voice access to a database. The recognition strategy is based on Bayes decision rule and attempts to find the best interpretation of the input speech data in terms of knowledge sources such as a language model, pronunciation lexicon, and inventory of subword units. The implementation of the search has been tested on a continuous speech database comprising up to 4000 words for each of several speakers. The efficiency and robustness of the search organization have been checked and evaluated along many dimensions, such as different speakers, phoneme models, and language models 相似文献

6.

Neural networks for vector quantization of speech and images 总被引：6，自引：0，他引：6

Krishnamurthy A.K. Ahalt S.C. Melton D.E. Chen P. 《Selected Areas in Communications, IEEE Journal on》1990,8(8):1449-1457

Using neural networks for vector quantization (VQ) is described. The authors show how a collection of neural units can be used efficiently for VQ encoding, with the units performing the bulk of the computation in parallel, and describe two unsupervised neural network learning algorithms for training the vector quantizer. A powerful feature of the new training algorithms is that the VQ codewords are determined in an adaptive manner, compared to the popular LBG training algorithm, which requires that all the training data be processed in a batch mode. The neural network approach allows for the possibility of training the vector quantizer online, thus adapting to the changing statistics of the input data. The authors compare the neural network VQ algorithms to the LBG algorithm for encoding a large database of speech signals and for encoding images 相似文献

7.

Applications of neural networks to speech recognition

Morgan N. Franco H. 《Signal Processing Magazine, IEEE》1997,14(6):46-48

相似文献

8.

Large-vocabulary continuous speech recognition: advances andapplications

Gauvain J.-L. Lamel L. 《Proceedings of the IEEE. Institute of Electrical and Electronics Engineers》2000,88(8):1181-1200

The past decade (1990-2000) has witnessed substantial advances in speech recognition technology, which when combined with the increase in computational power and storage capacity has resulted in a variety of commercial products already or soon to be on the market. The authors review the state of the art in core technology, large vocabulary continuous speech recognition, with a view toward highlighting recent advances. We then highlight issues in moving toward applications, discussing system efficiency, portability across languages and tasks, and enhancing the system output by adding tags and nonlinguistic information. Current performance in speech recognition and outstanding challenges for three classes of applications (dictation, audio indexation, and spoken language dialogue systems), are discussed 相似文献

9.

CMOS current-mode implementation of spatiotemporal probabilistic neural networks for speech recognition

Chung-Yu Wu Ron-Yi Liu 《The Journal of VLSI Signal Processing》1995,10(1):67-84

In this paper, a Spatiotemporal Probabilistic Neural Network (SPNN) is proposed for spatiotemporal pattern recognition. This new model is developed by applying the concept of Gaussian density function to the network structure of the SPR (Spatiotemporal Pattern Recognition). The main advantages of this model include faster training and recalling process for patterns. In addition, the overall architecture is also simple, modular, regular, locally connected, and suitable for VLSI implementation. One set of independent speaker isolated (Mandarin digit) speech database is used as an example to demonstrate the superiority of the neural networks for spatiotemporal pattern recognition. The testing result with a reduced error rate of 7% shows that the SPNN is very attractive and effective for practical applications. p ]The CMOS current-mode IC technology is used to implement the SPNN to achieve the objective of minimum classification error in a more direct manner. In this design, neural computation is performed in analog circuits while template information is stored in digital circuits. The prototyping speech recognition processor for the 12th LPC calculation is designed by 1.2μm CMOS technology. The HSPICE simulation results are also presented, which verifies the function of the designed neural system. 相似文献

10.

HMM based recognition of Chinese tones in continuous speech

Zhao Li 《电子科学学刊(英文版)》2000,17(1):9-14

This paper describes a method for recognizing Chinese tones in continuous speech. The first and second order differentials of the fundamental frequency logarithmically converted are used as feature parameters. A left-to-right hidden Markov modeling with five states, each of which is modeled by a single Gaussian distribution, expresses each of Chinese tones. Non-voiced portions are coded by random values normally distributed to uniformly deal with all the time frames in an utterance. Speaker dependent tone recognition was conducted for ten speakers. The average rate of 81.8% was obtained for these speakers. 相似文献

11.

基于神经网络的自学习非特定人语音识别研究

徐秀平李柱峰《电声技术》2004,(6):30-32

详细介绍一种基于神经网络的自学习非特定人语音识别方法,首次介绍一种语音识别知识的自动检验方法——LVV法,给出系统原理图和知识库的自动完善原理;介绍一种LEA判别法,实现梯度牛顿有效结合神经网络快速学习方法,并给出了实验结果。相似文献

12.

A statistical causal model for the assessment of dysarthric speechand the utility of computer-based speech recognition

Sy B.K. Horowitz D.M. 《IEEE transactions on bio-medical engineering》1993,40(12):1282-1298

The evaluation of the degree of speech impairment and the utility of computer recognition of impaired speech are separately and independently performed. Particular attention is paid to the question concerning whether or not there is a relationship between naive listeners' subjective judgments of impaired speech and the performance of a laboratory version of a speech recognition system. It is a difficult task to relate a speech impairment rating with speech recognition accuracy. Towards this end, a statistical causal model is proposed. This model is very appealing in its structure to support inference, and thus can be applied to perform various assessments such as the success of automatic recognition of dysarthric speech. The application of this model is illustrated with a case study of a dysarthric speaker compared against a normal speaker serving as a control 相似文献

13.

Hierarchical deep belief networks based point process model for keywords spotting in continuous speech

下载免费PDF全文

Yi Wang Jun‐an Yang Jun Lu Hui Liu Lun‐wu Wang 《International Journal of Communication Systems》2015,28(3):483-496

Point process model keyword spotting (KWS) system has attracted considerable attentions in the areas of keyword spotting by its capacity that can generalize from a relatively small numbers of training examples. But unfortunately, the accuracy level of the point process model is not comparable with the state‐of‐the‐art KWS systems because of the poor modeling capacity of the phoneme detector, which are based on Gaussian Mixture Models. In this paper, focus on improving the performance of detector in point process model, we propose an enhanced version of point process model, which is based on hierarchical deep belief networks (DBNs). Hierarchical DBNs are used as the phoneme detector in this system, and they combine the advantages of both the DBN and the hierarchical architecture for capturing complex statistical patterns in speech while overcoming the inherent flaws of conventional hidden Markov models and multilayer layer perceptron. Experiments results on TIMIT database show that the proposed method can yield 2% improvement. Furthermore, in the case when training examples are extremely limited, it can achieve better results over state‐of‐the‐art KWS systems. Copyright © 2013 John Wiley & Sons, Ltd. 相似文献

14.

Real-time areagraph of continuous speech for analysis and speech training

Fallside F. Brooks S. 《Electronics letters》1976,12(20):515-516

As an alternative to the spectrograph technique for speech analysis, an areagraph technique is presented in which the instantaneous vocal-tract area function (derived from linear prediction analysis) is plotted against time with distance along the tract as the y-ordinate and area denoted by intensity modulation. Since the display is related to a physical quantity, it has a number of advantages over the spectrograph. An application to speech training is described. 相似文献

15.

Information-theoretic distortion measures for speech recognition

Lee Y.-T. 《Signal Processing, IEEE Transactions on》1991,39(2):330-335

A wide variety of speech recognition distortion measures have been proposed and tested, including some especially effective ones. It is shown that there is a general framework, based on the concepts of information theory, linking most of these measures. The distortion measure between any two speech spectra can be defined in terms of the distortions between the associated probability distributions. This general framework defines three broad families of distortion measures for speech recognition and provides a consistent way of combining the energy and the spectral information of a phonetic event. In addition, the cepstral-domain representation for several distortion measures is derived, allowing comparison of these measures in a domain that also yields convenient equations for their practical implementation 相似文献

16.

基于SVM的语音信号情感识别

秦宇强张雪英《电路与系统学报》2012,(5):55-59

在智能人-机交互系统中,语音信号的情感分类是目前热点的研究领域,并且得到了广泛的应用.本文提出一种基于特征提取和借助支持向量机(support vector machine,SVM)分类器(classifier)的情感互相关性的方法,并应用于情感语音识别.利用这种方法对3种情感语音信号进行情感分类.SVM分类器是利用情感语音信号中情感互相关性的特征提取进行分类的.这种通过 SVM 分类器的情感互相关性的自动分类方法,可以将情感识别率大幅提高,并且在识别愤怒情感时的准确率可以达到95.04%. 相似文献

17.

A supplemented-vocabulary method for speech recognition

V. V. Savchenko P. G. Lukin 《Journal of Communications Technology and Electronics》2006,51(2):192-196

The possibility of enhancing speech-recognition efficiency by using the supplemented-vocabulary method is studied. The minimum-information-mismatch criterion is proposed for selecting one; two; or, in a general case, several realizations of recognition words to be added to a working vocabulary. By use a particular practical example, it is shown that the positive effect achieved does not substantially weight the vocabulary and enhance the computational complexity. 相似文献

18.

Graphical model architectures for speech recognition 总被引：3，自引：0，他引：3

《Signal Processing Magazine, IEEE》2005,22(5):89-100

This article discusses the foundations of the use of graphical models for speech recognition as presented in J. R. Deller et al. (1993), X. D. Huang et al. (2001), F. Jelinek (19970, L. R. Rabiner and B. -H. Juang (1993) and S. Young et al. (1990) giving detailed accounts of some of the more successful cases. Our discussion employs dynamic Bayesian networks (DBNs) and a DBN extension using the Graphical Model Toolkit's (GMTK's) basic template, a dynamic graphical model representation that is more suitable for speech and language systems. While this article concentrates on speech recognition, it should be noted that many of the ideas presented here are also applicable to natural language processing and general time-series analysis. 相似文献

19.

Continuous speech recognition

Morgan N. Bourlard H. 《Signal Processing Magazine, IEEE》1995,12(3):24-42

相似文献

20.

Multi-model approach for noisy speech recognition

Cun-Tai Guan Shu-Hung Leung Wing-Hong Lan 《Electronics letters》1998,34(1):30-32

A multi-model approach for noisy speech recognition is proposed. This approach comprised an SVD-based preprocessing front-end and a multi-model HMM recognition structure. It can provide a high recognition rate over a large range of SNRs for speech recognition in wide-band additive noise 相似文献