期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Improving discriminability among acoustically similar words bymodified distance metric

Kim H.S. Lin C.-K. 《Electronics letters》1988,24(3):161-163

In a template-matching based speech recognition system excessive weight given to perceptually unimportant spectral variations is undesirable for discriminating among acoustically similar words. By introducing a simple threshold-type nonlinearity applied to the distance metric, the word recognition performance can be improved for a vocabulary with similar sounding words, without modifying the system structure 相似文献

2.

Sub‐word Based Offline Handwritten Farsi Word Recognition Using Recurrent Neural Network

下载免费PDF全文

Mohammad Fazel Younessy Ghadikolaie Ehsanolah Kabir Farbod Razzazi 《ETRI Journal》2016,38(4):703-713

相似文献

3.

Adaptive Channel Normalization Based on Infomax Algorithm for Robust Speech Recognition

Ho‐Young Jung 《ETRI Journal》2007,29(3):300-304

This paper proposes a new data‐driven method for high‐pass approaches, which suppresses slow‐varying noise components. Conventional high‐pass approaches are based on the idea of decorrelating the feature vector sequence, and are trying for adaptability to various conditions. The proposed method is based on temporal local decorrelation using the information‐maximization theory for each utterance. This is performed on an utterance‐by‐utterance basis, which provides an adaptive channel normalization filter for each condition. The performance of the proposed method is evaluated by isolated‐word recognition experiments with channel distortion. Experimental results show that the proposed method yields outstanding improvement for channel‐distorted speech recognition. 相似文献

4.

Cursive handwriting recognition using hidden Markov models and alexicon-driven level building algorithm

Procter S. Illingworth J. Mokhtarian F. 《Vision, Image and Signal Processing, IEE Proceedings -》2000,147(4):332-339

相似文献

5.

A single-chip self-contained speech recognizer

《Solid-State Circuits, IEEE Journal of》1983,18(3):344-349

A fully integrated speech recognition LSI has been developed. The speech recognition LSI can recognize a speaker-dependent vocabulary of about 200 isolated words with high accuracy in real time, using several memories, which are a phoneme template memory, word dictionary memory, and work memory. This LSI is designed to perform the total speech recognition processing, including the endpoint detection of the input utterance in a self-contained manner. With the pipelined structure of the function blocks, highly efficient parallel operations are achieved. Furthermore, satisfactory testability is assured with a scan path technique. The speech recognition LSI is fabricated with 2 /spl mu/m E/D NMOS process technology, employing two aluminium interconnection layers and a high resistivity poly-Si layer. 相似文献

6.

基于深度信念网络的事件识别 总被引：2，自引：0，他引：2

下载免费PDF全文

张亚军刘宗田周文《电子学报》2017,45(6):1415

事件识别是信息抽取的重要基础.为了克服现有事件识别方法的缺陷,本文提出一种基于深度学习的事件识别模型.首先,我们通过分词系统获得候选词并将它们分为五种类型.然后选择六种识别特征并制定相应的特征表示规则用来将词转化为向量样例.最后我们通过深度信念网络抽取词的深层语义信息,并由Back-Propagation(BP)神经网络识别事件.实验显示模型最高F值达85.17%.同时,本文还提出了一种融合无监督和有监督两种学习方式的混合监督深度信念网络,该网络能够提高识别效果(F值达89.2%)并控制训练时间(增加27.50%). 相似文献

7.

一种基于倒排索引的音频检索方法

张雪源* 贺前华李艳雄叶婉玲《电子与信息学报》2012,34(11):2561-2567

传统的基于实例的音频检索算法采用顺序索引,检索时需遍历数据库并导致难以忍受的等待时间。针对传统的顺序的索引方法,该文提出基于倒排索引的音频检索算法。该方法首先利用多种音频特征构成的超向量,通过多层音频分割方法将连续音频流分割为特征数值波动幅度小的短时音频段;然后利用事先训练好的音频字典,将短时音频段序列转换为可以表征音频内容的音频字序列,并建立倒排索引;检索时,将用户提交的查询转换为音频字后利用倒排索引无须遍历数据库即可直接定位候选段落,并根据候选段落与查询的内容相似度大小对候选段落进行排序,将排好序的列表作为检索结果。仿真实验以匹配项排名、同类检索结果比例、定位准确性和检索用时4个方面作为评价指标,实验结果显示,该算法能够在平均1.101 s时间内实现92.58%的检索准确率。相似文献

8.

Modified Mean and Variance Normalization: Transforming to Utterance-Specific Estimates

Vikas Joshi N. Vishnu Prasad S. Umesh 《Circuits, Systems, and Signal Processing》2016,35(5):1593-1609

Cepstral mean and variance normalization (CMVN) is an efficient noise compensation technique popularly used in many speech applications. CMVN eliminates the mismatch between training and test utterances by transforming them to zero mean and unit variance. In this work, we argue that some amount of useful information is lost during normalization as every utterance is forced to have the same first- and second-order statistics, i.e., zero mean and unit variance. We propose to modify CMVN methodology to retain the useful information and yet compensate for noise. The proposed normalization approach transforms every test utterance to utterance-specific clean mean (i.e., utterance mean if the noise was absent) and clean variance, instead of zero mean and unit variance. We derive expressions to estimate the clean mean and variance from a noisy utterance. The proposed normalization is effective in the recognizing voice commands that are typically short (single words or short phrases), where more advanced methods [such as histogram equalization (HEQ)] are not effective. Recognition results show a relative improvement (RI) of \(21\,\%\) in word error rate over conventional CMVN on the Aurora-2 database and a RI of 20 and \(11\,\%\) over CMVN and HEQ on short utterances of the Aurora-2 database. 相似文献

9.

基于自动构建语料库的词汇级复述研究

赵世奇刘挺李生《电子学报》2009,37(5):975-980

本文针对词汇级复述问题提出了一种新的方法.该方法首先利用翻译引擎将双语平行语料库自动转换为单语平行语料库,以此构建复述语料库并用于候选复述的抽取.在此基础上,本文提出了一种新的统计模型.该模型根据特定的上下文为待复述词选择最为合适的复述.实验结果表明自动构建的复述语料库对于词汇级复述的抽取是有效的.同时,本文提出的模型明显优于两种传统模型,在准确率和召回率上分别提高10%左右. 相似文献

10.

A speaker-independent connected digit recognition systemconcatenating statistically discriminated words

Ukita T. Saito E. Nitta T. Watanabe S. 《Signal Processing, IEEE Transactions on》1992,40(10):2414-2424

A recognition system for connected digits, which uses a statistical classifier to identify words in speaker-independent continuous speech, is described. The system uses the multiple similarity method, a statistical pattern recognition technique. For evaluating word strings, the system uses a scoring method that is independent of the number of words in the strings. It is derived from the a posteriori probability that a subinterval corresponds to a correct word position, giving a word similarity value. The system evaluates a word string using dynamic programming and a parallel search procedure. Experiments for the contextual effect of the training data set, for validation of the search algorithm, and for a large quantity of unspecified speakers including 40 males and 40 females were performed. For connected digits (unknown word lengths test), the string recognition rates were 90.1%-95.1% for two, three, or four connected digits, where the equivalent word (digit) rates were 97.4%-98.4% 相似文献

11.

Automatic word recognition

Clapper G. L. 《Spectrum, IEEE》1971,8(8):57-69

Separately spoken individual words can be automatically recognized using a two-dimensional pattern of spectral density versus a nonlinear time base. The pattern for a given word differs from person to person and must be adaptively learned by the machine for each speaker. Simple circuitry is described that learns a word with a single utterance and recognizes it thereafter. The scheme is potentially economical for the spoken equivalent of key entry of data and data inquiry, and a limited vocabulary of commands to the equipment. 相似文献

12.

基于词义消歧的语义查询扩展研究

罗俊丽李慧娜路凯《微电子学与计算机》2012,29(1):71-75

为了解决传统查询扩展时查准率低下的问题,基于词义消歧技术提出一种综合扩展语义树和词频共现率的语义查询扩展方法.针对查询词歧义所带来的查询主题漂移现象,利用WordNet知识源及其领域信息进行查询词义消歧,进而根据WordNet的层次结构生成扩展语义树,产生候选扩展词,并根据待扩展词与用户查询的整体最大相关性原则最终确定扩展词及其权重,使得扩展词能够充分表达用户查询请求,提高查询匹配准确率.实验表明,该方法在保证查全率的同时获得了较高的查准率. 相似文献

13.

基于HMM／VQ的认人的中等词表连续语音识别 总被引：2，自引：2，他引：0

林道发罗万伯《电子学报》1992,20(7):59-65

本文讨论基于隐马尔可夫模型(HMM)和矢量量化(VQ)的连续语音识别方法。用这种方法,对每个单词作成一个HMM,对多个模型组合成的状态转移网络搜索其状态转移的最佳路径,从而实现不预先进行单词切分的连续语音的识别,使用有限态文法约束及其它一些改善识别性能的措施,演示系统能识别特定人的18种英语句式,150个单词,用312个话句(共有2710个单词)进行测试,识别延迟时间为发音时长的62％,发音速度平均为每秒2.32个单词,单词识准率为97.3％。相似文献

14.

Detecting the Presence of Speech Using ADPCM Coding

Schafer R. Jackson K. Dubnowski J. Rabiner L. 《Communications, IEEE Transactions on》1976,24(5):563-567

When speech is coded using a differential pulse-code modulation system with an adaptive quantizer, the digital code words exhibit considerable variation among all quantization levels during both voiced and unvoiced speech intervals. However, because of limits on the range of step sizes, during silent intervals the code words vary only slightly among the smallest quantization steps. Based on this principle, a simple algorithm for locating the beginning and end of a speech utterance has been developed. This algorithm has been tested in computer simulations and has been constructed with standard integrated circuit technology. 相似文献

15.

科技项目申请书关键词提取方法

罗灏徐小良吕跃华《电子科技》2013,26(7):7-10

关键词提取在文本相似度计算得到应用。传统的关键词提取方法忽略文本中的未登录词以及缺乏对词语语义的理解。针对科技项目申请书,研究提出一种基于未登录词识别与语义的关键词提取方法。应用Lucene和统计相融合的方法进行分词,并识别未登录词作为申请书关键词的一部分;依据社会网络理论构建词语语义相似度网络,并计算词语关联度提取申请书其他关键词。实验结果表明,与传统的关键词提取方法相比,新方法能提取更准确的关键词,有更好的科技项目相似性检查效果。相似文献

16.

Utilisation d’un analyseur syntaxique pour la reconnaissance de la parole continue

Patrice Quinton 《电信纪事》1977,32(9-10):323-336

The author describes the syntactic analyzer which is used in the system keal for continuous speech recognition. After detection of the words in an utterance by a lexical analyzer, the syntactic analyzer builds all the possible syntactic structures according to a context free grammar previously defined by means of a compiled metalanguage. This analyzer allows, in some cases, to correct some errors such as omission and insertion of phonemes by the phonemic analyzer, or non-detection of short words by the lexical analyzer. This program enables presently the recognition of 65 % of utterances in simple dialogs. A few seconds are enough to recognize a sentence. 相似文献

17.

Improved Phoneme-Based Myoelectric Speech Recognition

Quan Zhou Ning Jiang Englehart K. Hudgins B. 《IEEE transactions on bio-medical engineering》2009,56(8):2016-2023

This paper introduces an enhanced phoneme-based myoelectric signal (MES) speech recognition system. The system can recognize new words without retraining the phoneme classifier, which is considered to be the main advantage of phoneme-based speech recognition. It is shown that previous systems experience severe performance degradation when new words are added to a testing dataset. To maintain high accuracy with new words, several improvements are proposed. In the proposed MES speech recognition approach, the raw MES is processed by class-specific rotation matrices to spatially decorrelate the data prior to feature extraction in a preprocessing stage. Then, an uncorrelated linear discriminant analysis is used for dimensionality reduction. The resulting data are classified through a hidden Markov model classifier to obtain the phonemic log likelihoods of the phonemes, which are mapped to corresponding words using a word classifier. An average word classification accuracy of 98.533% is achieved over six subjects. The system offers dramatically improved accuracy when expanding a vocabulary, offering promise for robust large-vocabulary myoelectric speech recognition. 相似文献

18.

Addable Stress Speech Recognition with Multiplexing HMM: Training and Non-training Decision

Pakapong Amornkul Kosin Chamnongthai Punnarumol Temdee 《Wireless Personal Communications》2014,76(3):503-521

In stress speech recognition, a recognition model that is capable of processing multi-stress speech needs to be designed in the view points of accuracy and add-ability. This paper proposes addable stress speech recognition with multiplexing Hidden-Markov model (HMM). To achieve multi-stress speech, we propose a multiplexing topology that combines multiple stress speech models. Since each stress affects a speech in different way, having a speech recognition model that specifically trained to recognize words effected by the stress help improve the recognition rates. However, since each stress speech model gives it own independent recognized word, we need to have an effective decision module to choose the correct word. In each stress speech model, a MFCC is applied to the input speech. The result is fed into a HMM that is segmented into N parts. Each part of the segmentation provides its own tentative recognized word which in turn is an input to the proposed non-training decision module. Based on these tentative recognized words from segments of all stress speech models, the final recognized word is decided using coarse-to-fine concept performed by a majority vote, segment-weighted difference square score and next best score, respectively. Besides neutral speech, the proposed method was verified using three stresses including angry, loud, and Lombard. The results showed that the proposed method achieved 94.7 % recognition rate comparing to 94.2 % of the training-based decision method. 相似文献

19.

Speaker-independent isolated word recognition using multiple hiddenMarkov models

Zhang Y. Desilva C.J.S. Togneri A. Alder M. Attikiouzel Y. 《Vision, Image and Signal Processing, IEE Proceedings -》1994,141(3):197-202

A multi-HMM speaker-independent isolated word recognition system is described. In this system, three vector quantisation methods, the LBG algorithm, the EM algorithm, and a new MGC algorithm, are used for the classification of the speech space. These quantisations of the speech space are then used to produce three HMMs for each word in the vocabulary. In the recognition step, the Viterbi algorithm is used in the three subrecognisers. The log probabilities of the observation sequences matching-the models are multiplied by the weights determined by the recognition accuracies of individual subrecognisers and summed to give the log probability that the utterance is of a particular word in the vocabulary. This multi-HMM system results in a reduction of about 50% in the error rate in comparison with the single model system 相似文献

20.

Phonetic encoding method in the isolated words recognition problem

A. V. Savchenko 《Journal of Communications Technology and Electronics》2014,59(4):310-315

A phonetic approach to the problem of automatic recognition of isolated words is investigated. The phonetic encoding method whereby each word from a vocabulary is associated with the code sequence of stable phonemes is proposed. The information-theoretical estimate of vocabulary confusability, the calculations of which rely on the phonetic database of a speaker and the communications channel SNR, is synthesized using the Kullback-Leibler divergence properties. In the experimental study of the proposed method, the mutual influence between the recognition quality and the proposed estimate of confusability is demonstrated by solving the problem of recognition of words in the Russian speech. It is established that the introduced requirement to isolated syllable pronunciation makes it possible to attain the 90–95% accuracy of recognition for vocabularies containing 2000 words. 相似文献