期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

赵晖顾亚强唐朝京《计算机工程》2010,36(8):7-9

针对噪声环境中的语音识别,提出一种用于双模态语音识别的乘积隐马尔可夫模型(HMM)。在独立训练音频HMM和视频HMM的基础上,建立二维训练模型,表征音频流和视频流之间的异步特性。引入权重系数,根据不同噪声环境自适应调整音频流与视频流的权重。实验结果证明,与其他双模态语音识别方法相比,该方法的识别性能更高。相似文献

2.

基于变帧率训练的HMM汉语人名识别 总被引：1，自引：1，他引：0

刘刚张洪刚郭军《中文信息学报》2001,15(1):40-45

本文针对语音识别中HMM模型需要大量训练,而在某些实际应用中不可能训练多次的问题,提出一种基于余弦整形变换的变帧率训练方法,并在人名声控拨号系统中进行实验,在训练一次的条件下,系统识别率提高4.2%。实验表明,该方法对解决语音识别系统中训练数据少的问题具有明显效果。相似文献

3.

基于多带HMM和神经网络融合的语音识别方法的信道鲁棒性 总被引：1，自引：0，他引：1

姚志强戴蓓倩李辉黄伟《计算机工程与应用》2004,40(1):71-73,82

对于基于HMM的语音识别系统,由于训练和测试环境(背景噪声。语音传输信道Microphone等)的失配将会造成识别性能的严重下降。根据人类的听觉感知机理,该文针对语音传输信道失配问题,提出了一种基于多带HMM的系统结构,有若干个子带系统和一个全频带子系统组成,并采用神经网络对个子系统的输出进行后端融合及判决。实验表明,这种方法可以有效地提高识别系统的信道鲁棒性。相似文献

4.

基于自适应高斯混合模型特征映射的说话人确认

杨世清戴蓓蒨许敏强刘青松《模式识别与人工智能》2009,22(3):417-421

为了解决电话语音说话人确认系统中信道非线性失真导致系统性能下降的问题,提出一种消除信道影响的特征映射方法.采用高斯混合模型建立语音模型,通过最大后验概率自适应某种信道的语音模型,两种模型间相应高斯类的差异描述了该信道对于不同语音的影响.由此得出信道映射规则进行参数补偿,消除训练和测试语音中不匹配的影响.在NIST 1999年和2004年男性说话人的数据库上进行的实验表明,此方法使系统的等错误率分别改善了14.7%和15.18%. 相似文献

5.

基于循环神经网络的语音识别模型 总被引：5，自引：1，他引：4

朱小燕王昱徐伟《计算机学报》2001,24(2):213-218

近年来基于隐马尔可夫模型（HMM）的语音识别技术得到了很大发展。然而HMM模型有着一定的局限性,如何克服HMM的一阶假设和独立性假设带来的问题一直是研究讨论的热点,在语音识别中引入神经网络的方法是克服HMM局限性的一条途径。该文将循环神经网络应用于汉语语音识别,修改了原网络模型并提出了相应的训练方法,实验结果表明该模型具有良好的连续信号处理性能,与传统的HMM模型效果相当,新的训练策略能够在提高训练速度的同时,使得模型分类性能有明显提高。相似文献

6.

基于VQ/CDHMM的噪声环境下汉语口令识别研究 总被引：2，自引：0，他引：2

黄玲潘孟贤《计算机工程与应用》2003,39(28):106-108,161

该文研究了基于改进VQ/HMM模型的语音识别方法,设计实现了基于该模型的汉语口令识别系统;研究了鲁棒性特征参数问题,提出了一些新的基于MFCC和LPCC的高维动态参数;分别进行了纯净语音和不同信噪比语音的识别实验,分析比较了不同类型特征参数、训练状态数和高斯混合度对该系统识别性能的影响。在此基础上得出了以下结论:在加性白噪声的情况下,使用高维动态参数明显提高了系统的鲁棒性;在汉语两字组的短语音(口令)识别中,状态数取4,混合度取3时实验结果较好;利用不同特征参数的优势,进行信息融合,是提高系统性能的一个很好选择。相似文献

7.

基于MAP自适应算法的应力下变异语音识别方法

钱芳韩纪庆张磊《计算机工程与应用》2004,40(5):42-44

变异情况对语音的影响是导致语音识别系统性能下降的原因之一。一般情况下变异语音数据采集困难,获得的训练数据量少,这样即使测试环境和训练环境都相同,识别性能也不理想。利用自适应算法可以解决这类问题,它采用少量的测试环境数据进行训练,以达到训练模型和测试数据匹配的目的,保证系统良好的识别性能。MAP算法是常用的自适应算法,大多应用于话者自适应环境,该文尝试将其应用于变异语音识别系统中,并通过对该模型做相应改进获得了较好的识别结果。在小词表特定人应力变异的识别实验中,分别用非特定人模型和改进的特定人模型作为初始模型,应用MAP算法,系统识别率均有明显提高,与基本识别系统相比,在10遍自适应数据前提下,识别率分别提高了15.84%和15.97%,最好的识别率达到85.56%和90.42%。相似文献

8.

基于超向量子空间分析的自动语种识别方法 总被引：2，自引：0，他引：2

宋彦戴礼荣王仁华《模式识别与人工智能》2010,23(2):165-170

在针对电话语音的自动语种识别系统中,由不同信道、说话内容及说话人等所引起的干扰是影响系统识别性能的一个重要因素。针对此,文中提出一种基于超向量子空间分析的自动语种识别方法。首先构造表征各训练语句的超向量空间并利用SVM模型进行区分性训练,然后利用子空间分析方法估计出噪声子空间,并在距离度量中去除这部分影响。在NIST 07 语种识别测试30s和10s任务中,该方法与基线系统相比,性能有明显提高,等错误率相对降低约20％。相似文献

9.

衡阳方言孤立词识别研究

李荣华赵征鹏《计算机系统应用》2017,26(5):247-252

目前,汉语识别已经取得了一定的研究成果.但由于中国的地域性差异,十里不同音,使得汉语识别系统在进行方言识别时识别率低、性能差.针对语音识别系统在对方言进行识别时的缺陷,构建了基于HTK的衡阳方言孤立词识别系统.该系统使用HTK3.4.1工具箱,以音素为基本识别单元,提取39维梅尔频率倒谱系数（MFCC）语音特征参数,构建隐马尔可夫模型（HMM）,采用Viterbi算法进行模型训练和匹配,实现了衡阳方言孤立词语音识别.通过对比实验,比较了在不同因素模型下和不同高斯混合数下系统的性能.实验结果表明,将39维MFCC和5个高斯混合数与HMM模型结合实验时,系统的性能得到很大的改善. 相似文献

10.

采用复倒谱峰值滤波GMM识别混响语音

孔荣吴迪廖启鹏朱俊杰周强陶智《计算机工程与应用》2014,(15):191-193,203

针对混响环境下语音识别系统性能急剧下降问题,提出一种采用复倒谱峰值滤波GMM识别混响语音的方法。通过训练纯净语音的MFCC特征参数构建高斯混合模型,在识别混响语音前引入复倒谱峰值滤波器以减少混响引起的语音失真而提高混响环境下语音识别率。经实验验证,该方法避免了在现实条件下准确估计房间冲击响应函数的麻烦,降低了计算难度,提高了混响环境下至少4%的系统识别率。相似文献

11.

A unified framework of HMM adaptation with joint compensation of additive and convolutive distortions

Jinyu Li Li Deng Dong Yu Yifan Gong Alex Acero 《Computer Speech and Language》2009,23(3):389-405

In this paper, we present our recent development of a model-domain environment robust adaptation algorithm, which demonstrates high performance in the standard Aurora 2 speech recognition task. The algorithm consists of two main steps. First, the noise and channel parameters are estimated using multi-sources of information including a nonlinear environment-distortion model in the cepstral domain, the posterior probabilities of all the Gaussians in speech recognizer, and truncated vector Taylor series (VTS) approximation. Second, the estimated noise and channel parameters are used to adapt the static and dynamic portions (delta and delta–delta) of the HMM means and variances. This two-step algorithm enables joint compensation of both additive and convolutive distortions (JAC). The hallmark of our new approach is the use of a nonlinear, phase-sensitive model of acoustic distortion that captures phase asynchrony between clean speech and the mixing noise.In the experimental evaluation using the standard Aurora 2 task, the proposed Phase-JAC/VTS algorithm achieves 93.32% word accuracy using the clean-trained complex HMM backend as the baseline system for the unsupervised model adaptation. This represents high recognition performance on this task without discriminative training of the HMM system. The experimental results show that the phase term, which was missing in all previous HMM adaptation work, contributes significantly to the achieved high recognition accuracy. 相似文献

12.

Multi-environment model adaptation based on vector Taylor series for robust speech recognition

Yong Lü Author VitaeAuthor Vitae Lin Zhou Author Vitae Author Vitae 《Pattern recognition》2010,43(9):3093-3099

In this paper, we propose a multi-environment model adaptation method based on vector Taylor series (VTS) for robust speech recognition. In the training phase, the clean speech is contaminated with noise at different signal-to-noise ratio (SNR) levels to produce several types of noisy training speech and each type is used to obtain a noisy hidden Markov model (HMM) set. In the recognition phase, the HMM set which best matches the testing environment is selected, and further adjusted to reduce the environmental mismatch by the VTS-based model adaptation method. In the proposed method, the VTS approximation based on noisy training speech is given and the testing noise parameters are estimated from the noisy testing speech using the expectation-maximization (EM) algorithm. The experimental results indicate that the proposed multi-environment model adaptation method can significantly improve the performance of speech recognizers and outperforms the traditional model adaptation method and the linear regression-based multi-environment method. 相似文献

13.

Robust speech recognition method based on discriminative environment feature extraction

下载免费PDF全文

韩纪庆高文《计算机科学技术学报》2001,16(5):458-464

It is an effective approach to learn the influence of environmental parameters,such as additive noise and channel distortions,from training data for robust speech recognition.Most of the previous methods are based on maximum likelihood estimation criterion.However,these methods do not lead to a minimum error rate result.In this paper,a novel discriinative learning method of environmental parameters,which is based on Minimum Classification Error (MCE) criterion,is proposed.In the method,a simple classifier and the Generalized Probabilistic Descent (GPD)algorithm are adopted to iteratively learn the environmental parameters.Consequently,the clean speech features are estimated from the noisy speech features with the estimated environmental parameters,and then the estimations of clean speech features are utilized in the back-end HMM classifier,Experiments show that the best error rate reudction of 32.1% is obtained,tested on a task of 18 isolated confusion Korean words,relative to a conventional HMM system. 相似文献

14.

混合语音识别模型的设计与仿真研究

宋志章马丽刘省非李奇楠《计算机仿真》2012,29(5):152-155

研究语音识别率问题,语音信号是一种非平稳信号,含有大量噪声信息,目前大多数识别算法线性理论,难以正确识别语音信号非线性变化过程,识别正确率低。通过将隐马尔可夫模型(HMM)和SVM相结合组成一个混合抗噪语音识别模型(HMM-SVM)。同时用HMM模型对语音信号时序进行建模,并得到待识别语音信号的输出概率,然后将输出概率作为SVM的输入进行学习,得到语音分类信息,最后通过利用HMM-SVM识别结果做出正确识别决策。仿真结果表明,HMM-SVM提高语音识别正确率,尤其在低信噪比环境下,明显改善了语音识别系统的性能。相似文献

15.

Stereo hidden Markov modeling for noise robust speech recognition

Xiaodong Cui Mohamed Afify Yuqing Gao Bowen Zhou 《Computer Speech and Language》2013,27(2):407-419

This paper investigates a noise robust technique for automatic speech recognition which exploits hidden Markov modeling of stereo speech features from clean and noisy channels. The HMM trained this way, referred to as stereo HMM, has in each state a Gaussian mixture model (GMM) with a joint distribution of both clean and noisy speech features. Given the noisy speech input, the stereo HMM gives rise to a two-pass compensation and decoding process where MMSE denoising based on N-best hypotheses is first performed and followed by decoding the denoised speech in a reduced search space on lattice. Compared to the feature space GMM-based denoising approaches, the stereo HMM is advantageous as it has finer-grained noise compensation and makes use of information of the whole noisy feature sequence for the prediction of each individual clean feature. Experiments on large vocabulary spontaneous speech from speech-to-speech translation applications show that the proposed technique yields superior performance than its feature space counterpart in noisy conditions while still maintaining decent performance in clean conditions. 相似文献

16.

Training Wideband Acoustic Models Using Mixed-Bandwidth Training Data for Speech Recognition

Michael L. Seltzer Alex Acero 《IEEE transactions on audio, speech, and language processing》2007,15(1):235-245

One serious difficulty in the deployment of wideband speech recognition systems for new tasks is the expense in both time and cost of obtaining sufficient training data. A more economical approach is to collect telephone speech and then restrict the application to operate at the telephone bandwidth. However, this generally results in suboptimal performance compared to a wideband recognition system. In this paper, we propose a novel expectation-maximization (EM) algorithm in which wideband acoustic models are trained using a small amount of wideband speech and a larger amount of narrowband speech. We show how this algorithm can be incorporated into the existing training schemes of hidden Markov model (HMM) speech recognizers. Experiments performed using wideband speech and telephone speech demonstrate that the proposed mixed-bandwidth training algorithm results in significant improvements in recognition accuracy over conventional training strategies when the amount of wideband data is limited 相似文献

17.

Environmental Independent ASR Model Adaptation/Compensation by Bayesian Parametric Representation

Wang X. O'Shaughnessy D. 《IEEE transactions on audio, speech, and language processing》2007,15(4):1204-1217

The mismatch between system training and operating conditions can seriously deteriorate the performance of automatic speech recognition (ASR) systems. Various techniques have been proposed to solve this problem in a specified speech environment. Employment of these techniques often involves modification on the ASR system structure. In this paper, we propose an environment-independent (EI) ASR model parameter adaptation approach based on Bayesian parametric representation (BPR), which is able to adapt ASR models to new environments without changing the structure of an ASR system. The parameter set of BPR is optimized by a maximum joint likelihood criterion which is consistent with that of the hidden Markov model (HMM)-based ASR model through an independent expectation-maximization (EM) procedure. Variations of the proposed approach are investigated in the experiments designed in two different speech environments: one is the noisy environment provided by the AURORA 2 database, and the other is the network environment provided by the NTIMIT database. Performances of the proposed EI ASR model compensation approach are compared to those of the cepstral mean normalization (CMN) approach, which is one of the standard techniques for additive noise compensation. The experimental results show that performances of ASR models in different speech environments are significantly improved after being adapted by the proposed BPR model compensation approach 相似文献

18.

基于因子分析的隐马尔可夫模型及其训练算法

王新民姚天任《计算机工程与应用》2004,40(15):79-81

虽然基于对角协方差矩阵高斯分布的隐马尔可夫模型(HiddenMarkovModelBasedonDiagonalGaussiandistributions,HMM-DG)目前在现代大词表连续语音识别系统中得到了广泛的应用,但HMM-DG在帧内特征相关(intra-framefeaturescorrelation)建模方面存在缺陷。该文将因子分析方法与HMM-DG的混合高斯建模相结合,提出了一种具有弹性的帧内特征相关隐马尔可夫模型框架—基于因子分析的隐马尔可夫模型(HiddenMarkovModelBasedonFactorAnalysis,HMM-FA),并导出了HMM-FA的训练算法。仿真实验表明:在相同的条件下,HMM-FA的性能优于HMM-DG。相似文献

19.

Using geometric spectral subtraction approach for feature extraction for DSR front-end Arabic system

Zied Sakka Elhem Techini MedSalim Bouhlel 《International Journal of Speech Technology》2017,20(3):645-650

Noise robustness and Arabic language are still considered as the main challenges for speech recognition over mobile environments. This paper contributed to these trends by proposing a new robust Distributed Speech Recognition (DSR) system for Arabic language. A speech enhancement algorithm was applied to the noisy speech as a robust front-end pre-processing stage to improve the recognition performance. While an isolated Arabic word engine was designed, and developed using HMM Model to perform the recognition process at the back-end. To test the engine, several conditions including clean, noisy and enhanced noisy speech were investigated together with speaker dependent and speaker independent tasks. With the experiments carried out on noisy database, multi-condition training outperforms the clean training mode in all noise types in terms of recognition rate. The results also indicate that using the enhancement method increases the DSR accuracy of our system under severe noisy conditions especially at low SNR down to 10 dB. 相似文献

20.

一种改进的隐马尔可夫模型在语音识别中的应用 总被引：1，自引：0，他引：1

胡磊卢珞先黄涛《信息与控制》2007,36(6):0-780

提出了一种新的马尔可夫模型——异步隐马尔可夫模型．该模型针对噪音环境下语音识别过程中出现丢失帧的情况，通过增加新的隐藏时间标示变量Ck，估计出实际观察值对应的状态序列，实现对不规则或者不完整采样数据的建模．详细介绍了适合异步HMM的前后向算法以及用于训练的EM算法，并且对转移矩阵的计算进行了优化．最后通过实验仿真，分别使用经典HMM和异步HMM对相同的随机抽取帧的语音数据进行识别,识别结果显示在抽取帧相同情况下异步HMM比经典HMM的识别错误率低．相似文献