期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

王作英《电信科学》1993,(4)

本文从语音状态驻留长度分布出发,建立了一个非齐次隐含马尔可夫(Markov)语音识别模型。这个模型更接近语音信号物理实际,训练和识别的时间、空间复杂性比经典的HMM模型有很大的改进。文中描述了新模型的训练和识别算法,介绍了根据这一模型所设计的一个汉语孤立字全字表的实时识别和理解系统。相似文献

2.

采用多任务学习和循环神经网络的语音情感识别算法

下载免费PDF全文

冯天艺杨震《信号处理》2019,35(7):1133-1140

随着机器学习的快速发展,许多研究者使用神经网络来解决语音识别领域中的各类问题。然而由于训练数据有限等原因,常规的神经网络分类器普遍存在泛化误差等问题。为了解决此问题,迁移学习中的多任务学习被引入到研究中。本文提出了一种采用多任务学习和循环神经网络的语音情感识别算法(MTL-RNN),将说话人情感识别作为主任务,性别识别和身份识别作为辅助任务,三个任务在神经网络中并行训练。算法模型通过RNN共享层共享网络参数、学习共享特征,通过属性依赖层学习独有特征,以提升模型的分类性能。实验结果表明,本文所提出的MTL-RNN算法在汉语和阿拉伯语、较少说话人和较多说话人的场景下均有较好的识别性能。相似文献

3.

基于词片的语言模型及在汉语语音检索中的应用

郑铁然韩纪庆李海洋《通信学报》2009,30(3):84-88

在汉语语音检索研究中,为充分利用汉语中音节相互搭配的语言学知识,提出了一种新的汉语语言模型构造基元--"词片"(word fragment),研究了最佳词片选择算法.汉语语音识别实验和语音检索实验表明,采用基于词片的语音模型后,音节正确率有所提高,并取得了更好的语音检索性能. 相似文献

4.

汉语语音识别的抗噪性前端算法及性能分析

林建臻孙甲松王作英《电声技术》2004,(3):45-48,52

讨论了欧洲电信标准委员会ETSI提出的分布式语音识别系统的抗噪前端特征提取算法,该算法融合多种抗噪技术。结合汉语语音的特点,进行了汉语语音识别整体框架下的算法实现,并进行了实验和分析,典型噪声环境下的识别结果证明,相对于基线MFCC特征提取算法,稳健性有较大提高。相似文献

5.

利用韵律信息的CHMM连续数字语音识别

张静亚俞一彪《电子工程师》2006,32(12):43-46

提出了一种结合韵律信息的高性能汉语连续数字语音识别算法,该识别算法基于CHMM(连续隐马尔可夫模型),采用MFCC(MEL频率倒谱系数)为主要语音特征参数,结合韵律信息进行连续数字精确分割,能够有效区分易混数字。算法采用两级识别框架来提高语音识别率,其中,第1级对连续数字分割,在此基础上进行数字语音识别,输出各候选结果,第2级在候选结果中确定易混数字对,并运用韵律信息进一步选择正确结果。实验表明,最终汉语连续数字语音识别率有很大提高。相似文献

6.

汉语连续语音识别结果评价算法研究

下载免费PDF全文

刘刚陈伟郭军《中国通信》2010,7(2):132-138

在汉语语音识别中,由于汉语构词的特点,使得基于词的汉语语音识别结果评价不准确。论文对于传统连续语音识别结果评价算法进行了改进,提出了一种基于字词混合的汉语连续语音识别结果评价算法,可以有效完成基于词的识别结果评价,同时也将识别结果评价由四种情况(正确、替代、插入、删除)扩展到六种情况(增加了插入式替代和删除式替代),可以为语音识别的后处理提供更多有用的信息。实验表明,本文所提算法可以有效降低传统评价算法带来的虚假错相似文献

7.

汉语连续语音识别中多项式拟合语音轨迹模型的研究

下载免费PDF全文

欧智坚王作英《电子学报》2003,31(4):608-611

尽管作为当前最为流行的语音识别模型, HMM由于采用状态输出独立同分布假设,忽略了对语音轨迹动态特性的描述.本文基于一个更为灵活的语音描述统计框架—广义DDBHMM,提出了一个具体的多项式拟合语音轨迹模型,以及新的训练和识别算法,更好地刻划了真实的语音特性.本文还给出了一种有效的剪枝算法,得到一个实用化模型.汉语大词汇量非特定人连续语音识别的实验表明,这种剪枝的多项式拟合语音轨迹模型以较少的计算量明显改善了识别系统的性能. 相似文献

8.

一种改进的线性区分分析方法及其在汉语数码语音识别上的应用 总被引：1，自引：0，他引：1

史媛媛刘加刘润生《电子学报》2002,30(7):959-963

尽管汉语数码语音识别只涉及十个数字,但由于不同数字的发音存在相同或相似的声母或韵母,造成汉语数码语音之间的混淆性很大.采用通常的隐含马尔科夫模型(HMM)作为汉语数码语音识别模型难以得到很高的识别率.为了解决汉语数码之间的混淆问题,提高汉语数码语音识别性能,本文在隐含马尔科夫模型的状态层次上采用线性区分分析方法,将不同状态之间容易混淆的特征样本构成混淆模式类,针对混淆模式类进行线性区分分析.通过线性区分变换,在变换特征空间中仅保留那些能够有效区分该混淆类别的特征参数.这种基于状态的线性区分分析有效地提高了模型对混淆数码的区分能力.实验表明即使采用状态数很少的粗糙识别模型,也能很大幅度提高模型的识别性能;经过线性区分变换优化后的汉语数码识别模型,孤立汉语数码语音识别率可以达到99.32%. 相似文献

9.

一种面向语音识别的新型神经网络 总被引：1，自引：0，他引：1

王晓明郑宝玉《南京邮电学院学报(自然科学版)》1998,18(4):11-13,18

提出了一种新型神经网络模型，描述了该网络的工作原理和训练方法以及识别算法。为克服神经网络对时序信号建模能力差的缺点，引入了非线性分段处理和代表帧特征提取方法。最后介绍了根据这一模型所设计的一个汉语语音识别系统，试验表明该网络在汉语语音识别方面具有较大的潜力。相似文献

10.

基于DSP的高速实时语音识别系统的设计与实现

李邵梅陈鸿昶王凯《现代电子技术》2007,30(15):109-111

识别正确率和抗噪性能固然是语音识别的研究重点,但是识别响应速度也是决定系统实用化的关键所在。以TMS320C6713为核心构建硬件平台,通过采用高效C语言和线性汇编混合编程的方式,结合硬件特点,对代码进行了优化,实现了以美尔频率倒谱系数为特征参数,采用动态时间弯折算法的高速语音实时识别系统,识别速率达0.29倍实时,可实现多路语音的并行识别。相似文献

11.

A dynamic-time-warp integrated circuit for a 1000-word speech recognition system

《Solid-State Circuits, IEEE Journal of》1987,22(1):3-14

The design of a custom MOS-LSI chip capable of performing the pattern matching portion of a 1000-word speech recognition algorithm in real time is reported. The chip implements a dynamic-time-warp algorithm. The chip is part of a single-board speech recognition system that performs spectral analysis, dictionary storage and management, and speech recognition for both isolated and connected word applications of up to 1000 words. Speech recognition algorithms are normally refined to work well on general-purpose machines without the influence of future special-purpose hardware implementation. With general-purpose machines, chip implementation issues such as bit widths and parallelism cannot be utilized so they are ignored in favor of increasing algorithmic complexity by techniques such as pruning. If developed together, the chip architecture and algorithm can be refined to fully use parallelism and increasing throughput, while retaining efficient silicon area utilization. The resulting special-purpose architecture is sufficiently general that connected speech can be recognized without a speed penalty. 相似文献

12.

基于端到端的多语种语音识别研究

下载免费PDF全文

胡文轩王秋林李松洪青阳李琳《信号处理》2021,37(10):1816-1824

端到端语音识别模型无需发音词典进行训练,可以大幅降低开发新语种语音识别系统的负担。本文利用端到端模型的这一优势,建立了一种语种无关的端到端多语种语音识别系统。该模型使用基于字符的建模方法进行训练,同时构建多语种输出符号集,使其包括所有目标语言中出现的字符。模型训练生成单一模型,其网络参数为所有语种共享。在OLR竞赛提供的10个语种数据集上,相较于单语种语音识别系统,本文提出的多语种语音识别系统在所有语言上的表现都更加优秀。相似文献

13.

HMM非特定人连续语音识别的嵌入式实现

杜利民谢凌云刘斌《电子与信息学报》2005,27(1):60-63

嵌入式系统正逐渐成为语音识别实际应用的首选平台。该文在嵌入式平台上研究HMM连续语音识别的计算复杂度要素,提出特征系数屏蔽方法和综合剪枝相结合的瘦身计算方法,降低计算复杂度并保持识别率。该方法在嵌入式平台上研究的实验数据表明,HMM连续语音识别瘦身系统与基线系统相比,计算时间从基线系统的100％降低到27.91％,识别率仅从基线系统的89.65％下降到89.41％。相似文献

14.

Parallel system design for time-delay neural networks

Zhang D. Pal S.K. 《IEEE transactions on systems, man and cybernetics. Part C, Applications and reviews》2000,30(2):265-275

The authors develop a parallel structure for the time-delay neural network used in some speech recognition applications. The effectiveness of the design is illustrated by: (1) extracting a window computing model from the time-delay neural systems; (2) building its pipelined architecture with parallel or serial processing stages; and (3) applying this parallel window computing to some typical speech recognition systems. An analysis of the complexity of the proposed design shows a greatly reduced complexity while maintaining a high throughput rate 相似文献

15.

基于HMM的连续小词量语音识别系统的研究

高建《现代电子技术》2011,34(11):205-207

为了提高语音识别效率及对环境的依赖性,文章对语音识别算法部分和硬件部分做了分析与改进,采用ARMS3C2410微处理器作为主控制模块,采用UDA1314TS音频处理芯片作为语音识别模块,利用HMM声学模型及Viterbi算法进行模式训练和识别,设计了一种连续的、小词量的语音识别系统。实验证明,该语音识别系统具有较高的识别率和一定程度的鲁棒性,实验室识别率和室外识别率分别达到95.6%,92.3%。相似文献

16.

嵌入式数字语音拨号系统的研究与实现

李芬兰马小月《电声技术》2012,36(1):46-50

将语音识别技术应用于拨号系统,在嵌入式平台上实现了一款针对非特定人的数字语音拨号系统。语音识别算法中选择梅尔频率倒谱系数为特征参数,连续隐马尔科夫模型。为训练和识别过程模型,利用Qt界面对识别过程进行控制,系统针对非特定人数字语音识别进行实验。结果表明,系统针对非特定人识别率达到了98％,识别时间为3．55S。识别率和实时性都满足语音拨号的需求。相似文献

17.

基于电话用户交换机的语音识别系统研究 总被引：3，自引：0，他引：3

刘加胡凯军《电子学报》1999,27(1):5-7

本论文对电话用户交换机研制了一个声控语音命令交换系统,该系统能够实现与特定人无关中小词汇量连续命令语音自动识别,研究中统计了用和命令语句,生成相应识别文法网络,识别系统的训练采用由子词模型构成的复合模型进行强化训练,识别采用令牌传递式改进Ｖｉｔｅｒｂｉ算法,提高系统的识别性能,论文比较了不同语音特征参数以及隐含马尔可夫模型状态数对电话语音识别精度的影响,研究中还开发识别系统拒识系统,在无拒识情况下相似文献

18.

Leandro Ezequiel Di Persia Diego Humberto Milone Masuzo Yanagida 《Journal of Signal Processing Systems》2011,63(3):333-344

In a recent publication the pseudoanechoic mixing model for closely spaced microphones was proposed and a blind audio sources separation algorithm based on this model was developed. This method uses frequency-domain independent component analysis to identify the mixing parameters. These parameters are used to synthesize the separation matrices, and then a time-frequency Wiener postfilter to improve the separation is applied. In this contribution, key aspects of the separation algorithm are optimized with two novel methods. A deeper analysis of the working principles of the Wiener postfilter is presented, which gives an insight in its reverberation reduction capabilities. Also a variation of this postfilter to improve the performance using the information of previous frames is introduced. The basic method uses a fixed central frequency bin for the estimation of the mixture parameters. In this contribution an automatic selection of the central bin, based in the information of the separability of the sources, is introduced. The improvements obtained through these methods are evaluated in an automatic speech recognition task and with the PESQ objective quality measure. The results show an increased robustness and stability of the proposed method, enhancing the separation quality and improving the speech recognition rate of an automatic speech recognition system. 相似文献

19.

Application of Emotion Recognition and Modification for Emotional Telugu Speech Recognition

Vegesna Vishnu Vidyadhara Raju Gurugubelli Krishna Vuppala Anil Kumar 《Mobile Networks and Applications》2019,24(1):193-201

Majority of the automatic speech recognition systems (ASR) are trained with neutral speech and the performance of these systems are affected due to the presence of emotional content in the speech. The recognition of these emotions in human speech is considered to be the crucial aspect of human-machine interaction. The combined spectral and differenced prosody features are considered for the task of the emotion recognition in the first stage. The task of emotion recognition does not serve the sole purpose of improvement in the performance of an ASR system. Based on the recognized emotions from the input speech, the corresponding adapted emotive ASR model is selected for the evaluation in the second stage. This adapted emotive ASR model is built using the existing neutral and synthetically generated emotive speech using prosody modification method. In this work, the importance of emotion recognition block at the front-end along with the emotive speech adaptation to the ASR system models were studied. The speech samples from IIIT-H Telugu speech corpus were considered for building the large vocabulary ASR systems. The emotional speech samples from IITKGP-SESC Telugu corpus were used for the evaluation. The adapted emotive speech models have yielded better performance over the existing neutral speech models.

相似文献

20.

一种多尺度前向注意力模型的语音识别方法

下载免费PDF全文

唐海桃薛嘉宾韩纪庆《电子学报》2020,48(7):1255-1260

注意力模型是当前语音识别中的主流模型，然而其存在一个缺点，即当前时刻的注意力模型可能产生异常得分.为此，本文首先提出前向注意力模型，其采用上一时刻正常注意力得分平滑当前时刻异常得分.接着通过对上一时刻的注意力得分添加约束因子来对前向注意力模型进行优化，达到自适应平滑的目的.最后，在优化模型基础上提出多尺度前向注意力模型，其通过引入多尺度模型来对不同等级的语音基元进行建模，进而将所得到的不同等级目标向量进行融合，以达到解决注意力得分异常值的目的.采用SwitchBoard作为训练集，Hub5'00作为测试集进行实验，相比于基线系统，多尺度前向注意力模型的词错误率（Word Error Rate，WER）相对降低14.28%. 相似文献