首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The possibility of enhancing speech-recognition efficiency by using the supplemented-vocabulary method is studied. The minimum-information-mismatch criterion is proposed for selecting one; two; or, in a general case, several realizations of recognition words to be added to a working vocabulary. By use a particular practical example, it is shown that the positive effect achieved does not substantially weight the vocabulary and enhance the computational complexity.  相似文献   

2.
A compiler for constructing optimized syntactic digraphs from easily written grammar specifications is described. These are written in a language called grammar specification language (GSL). The compiler has a preprocessing (macroexpansion) phase, a parse phase, graph code generation and compilation phases, and three optimization phases. Digraphs can also be linked together by a graph linker to form larger digraphs. Language complexity is analyzed in a statistics phase. It is demonstrated that the optimization phase yields graphs with even greater efficiency than previously achieved by hand. Some preliminary speech recognition results of applying these techniques to intermediate and large graphs are discussed. With the introduction of these tools it is now possible to provide a speech recognition user with the ability to define new task grammars in the field. GSL has been used by several untutored users with good results. Experience with GSL indicates that it is a viable medium for quickly and accurately defining grammars for use in connected speech recognition systems  相似文献   

3.
A dynamic programming processor with parallel and pipeline architecture is described. A 2-μm CMOS technology was applied to the DP processor, which is composed of 127309 transistors on a 7.17×8.62-mm2 die and is housed in an 84-pin PLCC (plastic leaded chip carrier) or PGA (pin grid array) package. The clock frequency is 20 MHz, and the instruction cycle time is 100 ns. Precise electrical simulations permitted the safe use of nonstandard logic and area and power reduction. Implementation of a direct access to all internal registers has proven useful for chip test and software development. A system using one DP processor has given very good results on a wide variety of applications and 0.48% error rate on tests with standard NATO tapes. These results are significantly better than those published for other systems on the same tests  相似文献   

4.
The authors outline the science behind speech recognition technology and describe briefly the contributions of engineering, computer science, and mathematics to it. They discuss the state-of-the-art in both technique and performance, including some examples of successful applications. This is followed by a critical evaluation of the technology with respect to technical, commercial, and societal criteria. They conclude that even with today's suboptimum technology, there are types of applications in which useful deployment is possible and desirable but that those applications that will transform our society must wait until speech recognizers have nearly the capabilities of humans  相似文献   

5.
In this paper, a low-power, low-voltage speech processing system is presented. The system is intended to he used in remote speech recognition applications where feature extraction is performed on terminal and high-complexity recognition tasks and moved to a remote server accessed through a radio link. The proposed system is based on a CMOS feature extraction chip for speech recognition that computes 15 cepstrum parameters, each 8 ms, and dissipates 30 μW at 0.9-V supply. Single-cell battery operation is achieved. Processing relies on a novel feature extraction algorithm using 1-bit A/D conversion of the input speech signal. The chip has been implemented as a gate array in a standard 0.5-μm, three-metal CMOS technology. The average energy required to process a single word of the TI46 speech corpus is 10 μJ. It achieves recognition rates over 98% in isolated-word speech recognition tasks  相似文献   

6.
为了提高自发性口语语音识别率及语音解码识别效率,提出了一种新的自发性语音识别方案。实验结果表明,该识别方案不仅能提高语音识别率,而且还能有效准确地切分音频,提高评测系统的解码效率,鲁棒性较强。  相似文献   

7.
噪声下差分复合子带语音识别方法   总被引:4,自引:0,他引:4  
蒋文建  韦岗 《通信学报》2002,23(1):18-24
本文根据子带特征反映语音信号局部特性和全带特征反映语音信号整体特性的事实,提出了 一种差分复合子带语音识别新方法。先用频谱差分减少噪声的干扰,再将多子带特征识别概率与全带特征识别概率相结合进行综合判决,以得到最终识别结果。将新方法应用于TIMIT数据包0-9十个英文数字和E-Set在NoiseX92的白噪声和F16战机噪声下的识别实验。实验结果表明新方法比传统方法识别性能有很大提高。  相似文献   

8.
A wide variety of speech recognition distortion measures have been proposed and tested, including some especially effective ones. It is shown that there is a general framework, based on the concepts of information theory, linking most of these measures. The distortion measure between any two speech spectra can be defined in terms of the distortions between the associated probability distributions. This general framework defines three broad families of distortion measures for speech recognition and provides a consistent way of combining the energy and the spectral information of a phonetic event. In addition, the cepstral-domain representation for several distortion measures is derived, allowing comparison of these measures in a domain that also yields convenient equations for their practical implementation  相似文献   

9.
在智能人-机交互系统中,语音信号的情感分类是目前热点的研究领域,并且得到了广泛的应用.本文提出一种基于特征提取和借助支持向量机(support vector machine,SVM)分类器(classifier)的情感互相关性的方法,并应用于情感语音识别.利用这种方法对3种情感语音信号进行情感分类.SVM分类器是利用情感语音信号中情感互相关性的特征提取进行分类的.这种通过 SVM 分类器的情感互相关性的自动分类方法,可以将情感识别率大幅提高,并且在识别愤怒情感时的准确率可以达到95.04%.  相似文献   

10.
11.
Graphical model architectures for speech recognition   总被引:3,自引:0,他引:3  
This article discusses the foundations of the use of graphical models for speech recognition as presented in J. R. Deller et al. (1993), X. D. Huang et al. (2001), F. Jelinek (19970, L. R. Rabiner and B. -H. Juang (1993) and S. Young et al. (1990) giving detailed accounts of some of the more successful cases. Our discussion employs dynamic Bayesian networks (DBNs) and a DBN extension using the Graphical Model Toolkit's (GMTK's) basic template, a dynamic graphical model representation that is more suitable for speech and language systems. While this article concentrates on speech recognition, it should be noted that many of the ideas presented here are also applicable to natural language processing and general time-series analysis.  相似文献   

12.
A multi-model approach for noisy speech recognition is proposed. This approach comprised an SVD-based preprocessing front-end and a multi-model HMM recognition structure. It can provide a high recognition rate over a large range of SNRs for speech recognition in wide-band additive noise  相似文献   

13.
Lipovac  V. 《Electronics letters》1989,25(2):90-92
The satisfactory estimation of speech autocorrelation by means of generalised zero-crossings indicates that they can be used for efficient feature extraction in speech recognition. In addition, high consistency between the Itakura-Saito distances, calculated before and after clipping, allowed for only a mode-rate degradation of the related recognition performance, which was compensated by including the excitation distortion into the distance measure.<>  相似文献   

14.
Stochastic correlation model for speech recognition   总被引:1,自引:0,他引:1  
Ming  J. Smith  F.J. 《Electronics letters》1996,32(11):970-971
A stochastic model, drawn from the upper bound of the joint probability distributions, is suggested for modelling the spectral correlation in speech. Experiments on a speaker independent E-set database show the effectiveness of this new modelling approach  相似文献   

15.
Jung  H.Y. Kim  D.Y. Un  C.K. 《Electronics letters》1996,32(13):1163-1164
The authors propose a frame decorrelation method to cope with background noise in speech recognition. Since noise is modelled as a stationary perturbation in most cases, it is effective in reducing slow-varying components. One example of using this principle is the highpass scheme. The proposed method has the same property as the highpass scheme. It transforms feature vector sequences into decorrelated sequences and enhances transition regions. Simulation results show that this method is effective for speech with significant noise, and works better than other highpass methods  相似文献   

16.
A simple preprocessor structure is presented that uses external ROM or microprocessor control to implement the Hadamard transform efficiently in CCD devices.  相似文献   

17.
语音情感识别的研究进展   总被引:11,自引:0,他引:11  
情感在人类的感知、决策等过程扮演着重要角色.长期以来情感智能研究只存在于心理学和认知科学领域,近年来随着人工智能的发展,情感智能跟计算机技术结合产生了情感计算这一研究课题,这将大大的促进计算机技术的发展.情感自动识别是通向情感计算的第一步.语音作为人类最重要的交流媒介,携带着丰富的情感信息.如何从语音中自动识别说话者的情感状态近年来受到各领域研究者的广泛关注.本文从语音情感识别所涉及的几个重要问题出发,包括情感理论及情感分类、情感语音数据库、语音中的情感特征和语音情感识别算法等,介绍了当前的研究进展,并讨论了今后研究的几个关键问题.  相似文献   

18.
模型补偿技术已成功应用到噪声环境下的语音识别任务中。流行的模型补偿技术如Log-Add和Log-Normal PMC(并行模型合并)方法对动态特征参数通常只能给出近似的补偿。因此他们的识别率在较低的信噪比条件下变得很低。本文利用静态特征的导函数推导出了一种新的动态模型参数补偿方法。新的方法可以同任何已知的静态模型补偿算法结合产生出新的用于识别的噪声语音模型。实验证明这一新算法的应用,使其识别率比仅使用原有的模型补偿算法有较为明显的提高,并且新算法的复杂度较原有的模型补偿算法只有轻微的增加。  相似文献   

19.
A high-slew integrator for switched-capacitor circuits   总被引:1,自引:0,他引:1  
A new method for improving the slew rate of a switched-capacitor integrator is introduced. A booster circuit is used to measure the integrator input voltage and then inject a proportionate amount of charge at the integrator output. The boosted integrator significantly reduces the settling time due to amplifier slewing. In addition, the booster has no adverse effect on the noise and stability performance of the integrator. The booster stage increases the total static integrator power by 36% and the total die area by 22%  相似文献   

20.
Traditional acoustic speech recognition accuracies have been shown to deteriorate in highly noisy environments. A secondary information source is exploited using surface myoelectric signals (MES) collected from facial articulatory muscles during speech. Words are classified at the phoneme level using a hidden Markov model (HMM) classifier. Acoustic and MES data was collected while the words "zero" through "nine" were spoken. An acoustic expert classified the 18 formative phonemes in low noise levels [signal-to-noise ratio (SNR) of 17.5 dB] with an accuracy of 99%, but deteriorated to approximately 38% under simulations with SNR approaching 0 dB. A fused acoustic-myoelectric multiexpert system, without knowledge of SNR, improved on acoustic classification results at all noise levels. A multiexpert system, incorporating SNR information, obtained accuracies of 99% at low noise levels while maintaining accuracies above 94% during low SNR (0 dB) simulations. Results improve on previous full word MES speech recognition accuracies by almost 10%.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号