期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

一种基于SDTS的HMM训练算法 总被引：7，自引：0，他引：7

王新民姚天任《信号处理》2003,19(1):40-43

用传统的BW算法训练语音识别系统的HMM需要大量的语音数据。本文在假设声学模型系统的子空间捆绑结构(SDTS)为己知的前提下,提出了一种新的训练算法,可以有效地减少系统对训练数据的需求。理论分析和仿真表明,与传统的BW算法比较,新的训练算法(IBW)可压缩模型参数15倍,从而可大量地减少训练数据。尽管新算法要用到系统的先验知识,但它还是显示了许多优越性。相似文献

2.

正反向隐马尔可夫模型及其在连续语音识别中的应用 总被引：1，自引：0，他引：1

王仁华江辉《电子学报》1996,(10)

本文针对语音信号中客观存在的正、反向依赖特性，明确提出了用条件概率的概念来定量表述语音信号的这种正、反向的马尔可大依赖关系，提出了描述语音信号这种正反向依赖关系的正反向隐马尔可夫模型（ＨＭＭ），并用实验证明了仅仅利用语音反向依赖关系语音识别同样也能获得相当可观的识别性能。接着，本文针对孤立字和连续语音两种不同的识别任务，研究了在语音识别中同时利用这两种依赖信息的方法，并提出了一种连续语音识别中的新的搜索算法──正反向分半混合搜索。这种方法利用基于正向ＨＭＭ的正向Ｖｉｔｅｒｂｉ搜索和基于反向ＨＭＭ的反向Ｖｉｔｅｒｂｉ搜索的中间结果来有效地结合正反向依赖信息，实验证明正反向分半混合搜索方法确实一致地优于单用任何一种依赖信息的单向搜索识别方法。相似文献

3.

Addable Stress Speech Recognition with Multiplexing HMM: Training and Non-training Decision

Pakapong Amornkul Kosin Chamnongthai Punnarumol Temdee 《Wireless Personal Communications》2014,76(3):503-521

In stress speech recognition, a recognition model that is capable of processing multi-stress speech needs to be designed in the view points of accuracy and add-ability. This paper proposes addable stress speech recognition with multiplexing Hidden-Markov model (HMM). To achieve multi-stress speech, we propose a multiplexing topology that combines multiple stress speech models. Since each stress affects a speech in different way, having a speech recognition model that specifically trained to recognize words effected by the stress help improve the recognition rates. However, since each stress speech model gives it own independent recognized word, we need to have an effective decision module to choose the correct word. In each stress speech model, a MFCC is applied to the input speech. The result is fed into a HMM that is segmented into N parts. Each part of the segmentation provides its own tentative recognized word which in turn is an input to the proposed non-training decision module. Based on these tentative recognized words from segments of all stress speech models, the final recognized word is decided using coarse-to-fine concept performed by a majority vote, segment-weighted difference square score and next best score, respectively. Besides neutral speech, the proposed method was verified using three stresses including angry, loud, and Lombard. The results showed that the proposed method achieved 94.7 % recognition rate comparing to 94.2 % of the training-based decision method. 相似文献

4.

用反馈式语音识别理解方案进行汉语短语的识别理解

傅秋良袁保宗《电子与信息学报》1998,20(2):194-198

汉语语音理解系统的任务之一是把语音识别系统获得的汉语单音节转换成正确的汉字、词,乃至汉语的短语、语句,与语音识别系统一起,完成一个语音到文本(speech to text)的转换系统。本文利用一个闭环反馈方式汉语语音识别理解方案,在汉语词识别理解的基础上,进一步实现对汉语结构性短语的识别理解,获得了预期的结果。最后本文对实验结果和反馈式语音识别理解方案进行了讨论。相似文献

5.

A perspective on speech recognition

Levinson S.E. Roe D.B. 《Communications Magazine, IEEE》1990,28(1):28-34

The authors outline the science behind speech recognition technology and describe briefly the contributions of engineering, computer science, and mathematics to it. They discuss the state-of-the-art in both technique and performance, including some examples of successful applications. This is followed by a critical evaluation of the technology with respect to technical, commercial, and societal criteria. They conclude that even with today's suboptimum technology, there are types of applications in which useful deployment is possible and desirable but that those applications that will transform our society must wait until speech recognizers have nearly the capabilities of humans 相似文献

6.

孤立词语音识别中端点检测加速器的设计与实现

下载免费PDF全文

冯国友戴扬沈海斌时晓东《电子器件》2007,30(3):1098-1101

传统的语音端点检测方法以信号的短时能量、过零率等简单特征作为判决特征参数.这些方法在实际应用中,尤其当信号信噪比比较低时,无法满足系统的需要.文中利用零能积差作为判决采样信号帧是否为语音信号的依据,并通过了硬件来实现.结果表明,该模块较传统方法在保证高识别率的同时,提高了模块的速率,减小了面积,具有一定的实用价值. 相似文献

7.

基于DTW改进算法的孤立词识别系统的仿真与分析 总被引：5，自引：0，他引：5

林波吕明《信息技术》2006,30(4):56-59

传统的DTW算法在进行孤立词语音识别时着重于时间规整和语音测度的计算，而没有对数据的可靠性和有效性进行分析。本文提出了一种改进的端点检测算法，并采用一种改进的DTW算法，在计算机上进行了仿真。实验结果表明采用改进后的DTW算法有效的降低了识别时间和存储数据量，提高了系统性能。相似文献

8.

语音帧间相关信息对基于HMM系统识别精度的影响

戴加宁《电子学报》1997,25(7):75-77

本文探讨经短时信号处理后的语音信号帧间相关信息对基于隐马尔可夫模型（ＨＭＭ）的语音识别系统识虽精度的影响，鉴于ＨＭＭ的输出独立假设导致语音帧间相关信息的损失，本文提出了一种描述帧间相关信息的统计模型－马尔可夫链（ＭＣＭ）用来弥补ＨＭＭ在这方面的缺陷；经非特定人和多话者孤立字实验表明，用ＭＣＭ作为ＨＭＭ的辅助模型，可将原有ＨＭＭ系统的识别率提高约１～６个百分点。相似文献

9.

基于HMM／VQ的认人的中等词表连续语音识别 总被引：2，自引：2，他引：0

林道发罗万伯《电子学报》1992,20(7):59-65

本文讨论基于隐马尔可夫模型(HMM)和矢量量化(VQ)的连续语音识别方法。用这种方法,对每个单词作成一个HMM,对多个模型组合成的状态转移网络搜索其状态转移的最佳路径,从而实现不预先进行单词切分的连续语音的识别,使用有限态文法约束及其它一些改善识别性能的措施,演示系统能识别特定人的18种英语句式,150个单词,用312个话句(共有2710个单词)进行测试,识别延迟时间为发音时长的62％,发音速度平均为每秒2.32个单词,单词识准率为97.3％。相似文献

10.

一种采用机器学习的氦语音识别方法

李冬梅李明郭莉莉张士兵《电讯技术》2022,(9)

为了解决传统氦语音处理技术存在的处理速度慢、计算复杂、操作困难等问题,提出了一种采用机器学习的氦语音识别方法,通过深层网络学习高维信息、提取多种特征,不但解决了过拟合问题,同时也具备了字错率(Word Error Rate,WER)低、收敛速度快的优点。首先自建氦语音孤立词和连续氦语音数据库,对氦语音数据预处理,提取的语音特征主要包括共振峰特征、基音周期特征和FBank(Filter Bank)特征。之后将语音特征输入到由深度卷积神经网络(Deep Convolutional Neural Network,DCNN)和连接时序分类(Connectionist Temporal Classification,CTC)组成的声学模型进行语音到拼音的建模,最后应用Transformer语言模型得到汉字输出。提取共振峰特征、基音周期特征和FBank特征的氦语音孤立词识别模型相比于仅提取FBank特征的识别模型的WER降低了7.91%,连续氦语音识别模型的WER降低了14.95%。氦语音孤立词识别模型的最优WER为1.53%,连续氦语音识别模型的最优WER为36.89%。结果表明,所提方法可有效识别氦语音。相似文献

11.

有序聚类方法及其在神经网络语音识别中的应用 总被引：3，自引：1，他引：2

史笑兴顾明亮王太君何振亚《电路与系统学报》2000,5(2):99-103

本文提出了一种新的网络结构,我们称之为有序聚类网络。这种网络能够对语音信号进行特征提取,很好地解决神经网络语音识别中的时间规整问题。有序聚类网络从输入语音信号的特征矢量序列中撮出一组固定数目的特矢量,然后将这组特征矢量馈入神经网络分类器进行识别。和其他的神经网络语音识别方法相比较,用这种网络进行前端处理,可以缩短后端神经网络分类器的训练和识别时间,简化经分类器的网络产高的识别率。根据该们建立了相似文献

12.

语音信号端点检测的程序实现

范瑜《电讯技术》1989,29(2):21-23

在孤立字识别中,精确地判别语言信号的起始点和终止点是相当重要的。确定出语音信号范围的方案可以用来减少大量非实时系统的计算和提高识别精确度。本文在利用语音的某些特征参数——短时平均幅度或能量和短时平均过零率的基础上,提出了利用上述特征参数进行语音端点检测的IBM/PC机实现程序。相似文献

13.

基于CHMM的语音识别仿真系统实现

李浩亮靳双燕贾伟伟《电声技术》2013,(12):75-78

介绍了一种基于连续M元高斯混合密度的隐马尔可夫模型（HMM）的非特定人孤立词语音识别仿真系统。通过研究模型状态数、训练时间以及特征参数选取对语音识别率的影响,得出HMM状态数取4,训练次数为20次,特征参数选取48维LPCC和MFCC的混合参数,可使语音识别系统对于汉语孤立词的识别率达到90％。相似文献

14.

Improved Phoneme-Based Myoelectric Speech Recognition

Quan Zhou Ning Jiang Englehart K. Hudgins B. 《IEEE transactions on bio-medical engineering》2009,56(8):2016-2023

This paper introduces an enhanced phoneme-based myoelectric signal (MES) speech recognition system. The system can recognize new words without retraining the phoneme classifier, which is considered to be the main advantage of phoneme-based speech recognition. It is shown that previous systems experience severe performance degradation when new words are added to a testing dataset. To maintain high accuracy with new words, several improvements are proposed. In the proposed MES speech recognition approach, the raw MES is processed by class-specific rotation matrices to spatially decorrelate the data prior to feature extraction in a preprocessing stage. Then, an uncorrelated linear discriminant analysis is used for dimensionality reduction. The resulting data are classified through a hidden Markov model classifier to obtain the phonemic log likelihoods of the phonemes, which are mapped to corresponding words using a word classifier. An average word classification accuracy of 98.533% is achieved over six subjects. The system offers dramatically improved accuracy when expanding a vocabulary, offering promise for robust large-vocabulary myoelectric speech recognition. 相似文献

15.

基于DTW算法语音识别系统的仿真及DSP实现

陈锡锻王瑞肖雄洪涛《电声技术》2013,(12):66-69

DTW（DynamicTimeWarping）算法的实现简单有效,在孤立词语音识别系统中得到了广泛的应用。采用谱减法进行前端去噪处理,利用Matlab对语音识别系统进行了仿真,并设计了一种以16位数字信号处理器TMS320VC5509为核心的孤立词语音识别系统。实验结果表明,系统能满足实时性能要求,识别效果良好。相似文献

16.

ADSP-BF531在嵌入式语音识别系统中的应用

王维强《电子设计工程》2012,20(12):186-189

设计了一个嵌入式语音识别系统,该系统硬件平台以ADSP-BF531为核心,采用离散隐马尔可夫模型(DHMM)检测和识别算法完成了对非特定人的孤立词语音识别。试验结果表明,该系统对非特定人短词汇的综合识别率在90%以上。该系统具有小型、高速、可靠以及扩展性好等特点;可应用于许多特定场合,有很好的市场前景。文中讲述了该系统CODEC、片外RAM、ROM以及CPLD等与DSP的接口设计,语音识别运用的矢量量化、Mel倒谱参数、Viterbi等有关算法及其实际应用效果。相似文献

17.

Methods of Controlling the Word Rate of Recorded Speech

Emerson Foulke 《The Journal of communication》1970,20(3):305-314

Six methods for increasing speech rate are presented. They are as follows. 1. Speech at a rate that is faster than normal may be obtained by pacing an oral reader at a rate that is faster than his normal reading rate. 2. The word rate of recorded speech may be increased by reproducing a tape or record at a speed that is faster than the speed used during recording. 3. The word rate of recorded speech may be increased by an electromechanical device that reproduces consecutive samples of a recorded tape. 4. Consecutive sampling may also be accomplished by a computer. 5. The word rate of synthesized speech may be manipulated by instructions in the program followed by a speech synthesizer. 6. The harmonic compressor increases word rate by a method of frequency division without temporal alteration, and frequency restoration with temporal alteration. 相似文献

18.

一个自训练的数字声控系统

周利清《数字通信》1999,26(3):6-7,10

介绍一个抗噪声,脱离计算机的实时话音识别系统,由于该系统具有使用者自行训练功能,因此能够在不认人的情况下达到很高的识别率,该系统采用了先进的智能算法,并以高速数字信号处理器（ＤＳＰ）为核心部件研制而成,不但可以用于电话机进行话音拨号,还可用于其它设备作为声控装置投入实际应用。相似文献

19.

Talking with Computers: Synthesis and Recognition of Speech by Machines

Flanagen James L. 《IEEE transactions on bio-medical engineering》1982,(4):223-232

Humans find speech a convenient and efficient means for communicating infonnation. Machines, in contrast, prefer the symbols of assemblers and compilers-exchanged, typically, in printed form through a computer terminal. If computers could be given human-like abilities for voice communication, their value and ease of use for humans would increase. The ubiquitous telephone would take on more of the capabilities of a computer terminal. Making machines talk and listen to humans depends upon economical implementation of speech synthesis and speech recognition. Heretofore the complexities and costs of these functions have deterred wide application. But now, fueled by the advances in integrated electronics, opportunities for expanded and enhanced telephone services are emerging. This paper assesses the progress in synthesis and recognition of speech by computer techniques, and it outlines potential applications in voice-communication services. 相似文献

20.

Scalable architecture for word HMM-based speech recognition and VLSI implementation in complete system

Yoshizawa S. Wada N. Hayasaka N. Miyanaga Y. 《IEEE transactions on circuits and systems. I, Regular papers》2006,53(1):70-77

This paper describes a scalable architecture for real-time speech recognizers based on word hidden Markov models (HMMs) that provide high recognition accuracy for word recognition tasks. However, the size of their recognition vocabulary is small because its extremely high computational costs cause long processing times. To achieve high-speed operations, we developed a VLSI system that has a scalable architecture. The architecture effectively uses parallel computations on the word HMM structure. It can reduce processing time and/or extend the word vocabulary. To explore the practicality of our architecture, we designed and evaluated a complete system recognizer, including speech analysis and noise robustness parts, on a 0.18-/spl mu/m CMOS standard cell library and field-programmable gate array. In the CMOS standard-cell implementation, the total processing time is 56.9 /spl mu/s/word at an operating frequency of 80 MHz in a single system. The recognizer gives a real-time response using an 800-word vocabulary. 相似文献