首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
A fully integrated speech recognition LSI has been developed. The speech recognition LSI can recognize a speaker-dependent vocabulary of about 200 isolated words with high accuracy in real time, using several memories, which are a phoneme template memory, word dictionary memory, and work memory. This LSI is designed to perform the total speech recognition processing, including the endpoint detection of the input utterance in a self-contained manner. With the pipelined structure of the function blocks, highly efficient parallel operations are achieved. Furthermore, satisfactory testability is assured with a scan path technique. The speech recognition LSI is fabricated with 2 /spl mu/m E/D NMOS process technology, employing two aluminium interconnection layers and a high resistivity poly-Si layer.  相似文献   

2.
基于HMM/VQ的认人的中等词表连续语音识别   总被引:2,自引:2,他引:0  
本文讨论基于隐马尔可夫模型(HMM)和矢量量化(VQ)的连续语音识别方法。用这种方法,对每个单词作成一个HMM,对多个模型组合成的状态转移网络搜索其状态转移的最佳路径,从而实现不预先进行单词切分的连续语音的识别,使用有限态文法约束及其它一些改善识别性能的措施,演示系统能识别特定人的18种英语句式,150个单词,用312个话句(共有2710个单词)进行测试,识别延迟时间为发音时长的62%,发音速度平均为每秒2.32个单词,单词识准率为97.3%。  相似文献   

3.
A speech recognition processor CMOS LSI was developed as the processing element (PE) of a ring array processor previously proposed by the authors as architecture to carry out highly parallel recognition processing with array size flexibility. There are three key features for the LSI: (1) a highly parallel I/O structure of triple buffer with cyclical-mode transition control methods to solve the serious problem of inter-PE data transfer overhead versus the array processing; (2) a control structure with two direct memory access (DMA) controllers to realize inter-PE data I/O processing and intra-PE processing in parallel; and (3) a pipelined recognition processing at a high execution rate realized by a pipelined structure and a balanced clock distribution design technique. These effective designs for the PE LSI allow high-speed recognition processing without any inter-PE data transfer overhead in the ring array processor. Combining the PE-LSI architecture with the proposed array architecture for highly parallel dynamic time warping (DTW) processing, a real-time continuous speech recognition system based on continuous dynamic programming matching using the SPLIT method for a 1000-word vocabulary, can be constructed using a ring array processor consisting of 30 PEs  相似文献   

4.
The authors develop a parallel structure for the time-delay neural network used in some speech recognition applications. The effectiveness of the design is illustrated by: (1) extracting a window computing model from the time-delay neural systems; (2) building its pipelined architecture with parallel or serial processing stages; and (3) applying this parallel window computing to some typical speech recognition systems. An analysis of the complexity of the proposed design shows a greatly reduced complexity while maintaining a high throughput rate  相似文献   

5.
基于电话用户交换机的语音识别系统研究   总被引:3,自引:0,他引:3  
本论文对电话用户交换机研制了一个声控语音命令交换系统,该系统能够实现与特定人无关中小词汇量连续命令语音自动识别,研究中统计了用和命令语句,生成相应识别文法网络,识别系统的训练采用由子词模型构成的复合模型进行强化训练,识别采用令牌传递式改进Viterbi算法,提高系统的识别性能,论文比较了不同语音特征参数以及隐含马尔可夫模型状态数对电话语音识别精度的影响,研究中还开发识别系统拒识系统,在无拒识情况下  相似文献   

6.
The authors describe an architecture and search organization for continuous speech recognition. The recognition module is part of the Siemens-Philips-Ipo project on continuous speech recognition and understanding (SPICOS) system for the understanding of database queries spoken in natural language. The goal of this project is a man-machine dialogue system that is able to understand fluently spoken German sentences and thus to provide voice access to a database. The recognition strategy is based on Bayes decision rule and attempts to find the best interpretation of the input speech data in terms of knowledge sources such as a language model, pronunciation lexicon, and inventory of subword units. The implementation of the search has been tested on a continuous speech database comprising up to 4000 words for each of several speakers. The efficiency and robustness of the search organization have been checked and evaluated along many dimensions, such as different speakers, phoneme models, and language models  相似文献   

7.
本文从语音状态驻留长度分布出发,建立了一个非齐次隐含马尔可夫(Markov)语音识别模型。这个模型更接近语音信号物理实际,训练和识别的时间、空间复杂性比经典的HMM模型有很大的改进。文中描述了新模型的训练和识别算法,介绍了根据这一模型所设计的一个汉语孤立字全字表的实时识别和理解系统。  相似文献   

8.
Hidden control neural networks (HCN networks) are suitable for a variety of pattern recognition techniques. The speech recognizer described here is built for speaker-independent single-word recognition and is intended to implement user interfaces to control devices via simple word-commands. To evaluate the speech recognizer, it has been applied to minimum pairs. Within a minimum pair two words differ only in a single phoneme. It was achieved to increase the recognition rate while taking those periods of time especially into account, that are found to contain the relevant difference.  相似文献   

9.
We have developed a memory access reduced VLSI chip for 5,000 word speaker-independent continuous speech recognition. This chip employs a context-dependent HMM (hidden Markov model) based speech recognition algorithm, and contains parallel and pipelined hardware units for emission probability computation and Viterbi beam search. To maximize the performance, we adopted several memory access reduction techniques such as sub-vector clustering and multi-block processing for the emission probability computation. We also employed a custom DRAM controller for efficient access of consecutive data. Moreover, we analyzed the access pattern of data to minimize the internal SRAM size while maintaining high performance. The experimental results show that the implemented system performs speech recognition 2.4 and 1.8 times faster than real-time utilizing 32-bit DDR SDRAM and SDR SDRAM, respectively.  相似文献   

10.
This paper introduces an enhanced phoneme-based myoelectric signal (MES) speech recognition system. The system can recognize new words without retraining the phoneme classifier, which is considered to be the main advantage of phoneme-based speech recognition. It is shown that previous systems experience severe performance degradation when new words are added to a testing dataset. To maintain high accuracy with new words, several improvements are proposed. In the proposed MES speech recognition approach, the raw MES is processed by class-specific rotation matrices to spatially decorrelate the data prior to feature extraction in a preprocessing stage. Then, an uncorrelated linear discriminant analysis is used for dimensionality reduction. The resulting data are classified through a hidden Markov model classifier to obtain the phonemic log likelihoods of the phonemes, which are mapped to corresponding words using a word classifier. An average word classification accuracy of 98.533% is achieved over six subjects. The system offers dramatically improved accuracy when expanding a vocabulary, offering promise for robust large-vocabulary myoelectric speech recognition.  相似文献   

11.
This paper presents the results on whispered speech recognition using gammatone filterbank cepstral coefficients for speaker dependent mode. The isolated words used for this experiment are taken from the Whi-Spe database. Whispered speech recognition is based on dynamic time warping and hidden Markov models methods. The experiments are focused on the following modes: normal speech, whispered speech and their combinations (normal/whispered and whispered/normal). The results demonstrated an important improvement in recognition after application of cepstral mean subtraction, especially in mixed train/test scenarios.  相似文献   

12.
Talk to the machine   总被引:2,自引:0,他引:2  
《Spectrum, IEEE》2002,39(9):60-64
With better chips and faster algorithms, device makers are putting voice interfaces in PDAs, cellphones, and cars. Philips has streamlined its standard speech recognition engine to run on the Compaq 3600 PDA. This Mandarin language recognizer prototype can distinguish 40 000 words. The basics of today's speech recognizers were first worked out in the early 1970s by researchers at IBM Corp. and Carnegie Mellon University. Since then, assorted companies and university groups have made incremental advances in the science and technology.  相似文献   

13.
A recognition system for connected digits, which uses a statistical classifier to identify words in speaker-independent continuous speech, is described. The system uses the multiple similarity method, a statistical pattern recognition technique. For evaluating word strings, the system uses a scoring method that is independent of the number of words in the strings. It is derived from the a posteriori probability that a subinterval corresponds to a correct word position, giving a word similarity value. The system evaluates a word string using dynamic programming and a parallel search procedure. Experiments for the contextual effect of the training data set, for validation of the search algorithm, and for a large quantity of unspecified speakers including 40 males and 40 females were performed. For connected digits (unknown word lengths test), the string recognition rates were 90.1%-95.1% for two, three, or four connected digits, where the equivalent word (digit) rates were 97.4%-98.4%  相似文献   

14.
In stress speech recognition, a recognition model that is capable of processing multi-stress speech needs to be designed in the view points of accuracy and add-ability. This paper proposes addable stress speech recognition with multiplexing Hidden-Markov model (HMM). To achieve multi-stress speech, we propose a multiplexing topology that combines multiple stress speech models. Since each stress affects a speech in different way, having a speech recognition model that specifically trained to recognize words effected by the stress help improve the recognition rates. However, since each stress speech model gives it own independent recognized word, we need to have an effective decision module to choose the correct word. In each stress speech model, a MFCC is applied to the input speech. The result is fed into a HMM that is segmented into N parts. Each part of the segmentation provides its own tentative recognized word which in turn is an input to the proposed non-training decision module. Based on these tentative recognized words from segments of all stress speech models, the final recognized word is decided using coarse-to-fine concept performed by a majority vote, segment-weighted difference square score and next best score, respectively. Besides neutral speech, the proposed method was verified using three stresses including angry, loud, and Lombard. The results showed that the proposed method achieved 94.7 % recognition rate comparing to 94.2 % of the training-based decision method.  相似文献   

15.
图像中的印刷体英文单词识别是图像识别的一个重要分支,它可以把图片上不可以编辑的字符识别出来,转化为可以编辑的文本字符进行再利用与朗读,以帮助人们阅读的方便。本文研究的处理过程包括图像预处理、基于网格划分的图像特征提取、图像样本训练、基于最短欧式距离的图像识别方法和文本朗读等。本系统利用最短欧式距离算法实现了一个印刷体英文单词识别朗读系统,经实验单词的整体识别率达到90%以上,基本符合实际应用。  相似文献   

16.
陈伟红 《现代电子技术》2006,29(14):44-45,48
研究了3种背景噪声下与说话人有关的孤立词语音识别方法。即语音前端声学处理法、正则相关分析的谱变换补偿方法和同模极点增加法。实验结果表明,这3种方法都有效地提高了噪声环境中语音识别率,其中较好的方法在强噪声环境中(信噪比为0 dB)的语音识别率达到80%以上,为信噪比较低的噪声环境中自动语音识别展现了美好前景。  相似文献   

17.
有序聚类方法及其在神经网络语音识别中的应用   总被引:3,自引:1,他引:2  
本文提出了一种新的网络结构,我们称之为有序聚类网络。这种网络能够对语音信号进行特征提取,很好地解决神经网络语音识别中的时间规整问题。有序聚类网络从输入语音信号的特征矢量序列中撮出一组固定数目的特 矢量,然后将这组特征矢量馈入神经网络分类器进行识别。和其他的神经网络语音识别方法相比较,用这种网络进行前端处理,可以缩短后端神经网络分类器的训练和识别时间,简化经分类器的网络产高的识别率。根据该 们建立了  相似文献   

18.
19.
Peckham  J. 《IEE Review》1988,34(10)
Research has been in progress now for over 40 years to develop machines that will recognise natural speech. The author discusses the problems of speech coding, recognition and synthesis. Research aimed at improving the analysis of the original speech and subsequent synthesis, and progress to date are discussed. The use of intelligent conversational dialogue and future developments are also discussed  相似文献   

20.
By identifying lip movements and characterizing their associations with speech sounds, the performance of speech recognition systems can be improved, particularly when operating in noisy environments. In this paper, we present a geometrical-based automatic lip reading system that extracts the lip region from images using conventional techniques, but the contour itself is extracted using a novel application of a combination of border following and convex hull approaches. Classification is carried out using an enhanced dynamic time warping technique that has the ability to operate in multiple dimensions and a template probability technique that is able to compensate for differences in the way words are uttered in the training set. The performance of the new system has been assessed in recognition of the English digits 0 to 9 as available in the CUAVE database. The experimental results obtained from the new approach compared favorably with those of existing lip reading approaches, achieving a word recognition accuracy of up to 71% with the visual information being obtained from estimates of lip height, width and their ratio.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号