期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

吕勇吴镇扬《电子与信息学报》2010,32(1):107-111

在实际环境中,由于测试环境与训练环境的不匹配,语音识别系统的性能会急剧恶化。模型自适应算法是减小环境失配影响的有效方法之一,它通过测试环境下的少量自适应数据,将HMM模型的参数变换到测试环境下。该文将矢量泰勒级数用于模型自适应,同时对HMM模型的均值向量和协方差矩阵进行变换,使其与实际环境相匹配。实验证明,该文算法优于MLLR算法和基于矢量泰勒级数的特征补偿算法,在低信噪比环境中性能提高尤为明显。相似文献

2.

Use of Microphone Array and Model Adaptation for Hands-Free Speech Acquisition and Recognition

Jen-Tzung Chien Jain-Ray Lai 《Journal of Signal Processing Systems》2004,36(2-3):141-151

This paper presents a combined microphone array and model adaptation algorithm for hands-free speech recognition. Our purpose is to remove the inconvenience of using head-mounted/hand-holding microphone in conventional speech recognizer. To improve the speech quality with car noise interference, a linear microphone array is applied and acted as robust acquisition system. A time-domain coherence measure (TDCM) is applied to reliably estimate the time delay for speech signals collected by different microphones. The estimated delay is adopted in a delay-and-sum beamformer for speech enhancement. Further, we adapt the speech hidden Markov models to get close to the acoustic conditions of the enhanced test speech for robust speech recognition. In acquisition and recognition experiments using connected Chinese digits, we found that TDCM can effectively estimate the time delay. The increase in the speech sampling rate is helpful to determine the time delay. Incorporating the model adaptation scheme significantly reduces the recognition errors with moderate computation overhead. 相似文献

3.

Bayesian channel equalisation and robust features for speechrecognition

Milner P. Vaseghi S.V. 《Vision, Image and Signal Processing, IEE Proceedings -》1996,143(4):223-231

The use of a speech recognition system with telephone channel environments, or different microphones, requires channel equalisation. In speech recognition, the speech model provides a bank of statistical information that can be used in the channel identification and equalisation process. The authors consider HMM-based channel equalisation, and present results demonstrating that substantial improvement can be obtained through the equalisation process. An alternative method, for speech recognition, is to use a feature set which is more robust to channel distortion. Channel distortions result in an amplitude tilt of the speech cepstrum, and therefore differential cepstral features provide a measure of immunity to channel distortions. In particular the cepstral-time feature matrix, in addition to providing a framework for representing speech dynamics, can be made robust to channel distortions. The authors present results demonstrating that a major advantage of cepstral-time matrices is their channel insensitive character 相似文献

4.

采用子带长时信号变化特征的稳健语音活动检测

蔡铁唐飞龙志军《电视技术》2014,38(19)

为提高语音活动检测(VAD)在低信噪比下的准确率,提出了一种基于子带长时信号变化特征的VAD算法.将语音信号转换到频域,并分解为几个不重复的子频带,对这些子带信号分别提取长时信号变化特征,然后采用GMM在线建立语音和非语音模型,以模型的似然比进行VAD判决.实验结果表明,算法在较低的信噪比下能够显著地提高语音活动检测的准确率,且在多种噪声环境和信噪比条件下具有较好的稳健性.应用于语音识别系统的实验表明,该算法能有效提高噪声环境下的语音识别率. 相似文献

5.

Cepstral behaviour due to additive noise and a compensation schemefor noisy speech recognition

Hwang T.-H. Lee L.-M. Wang H.-C. 《Vision, Image and Signal Processing, IEE Proceedings -》1998,145(5):316-321

The speech cepstral coefficients affected by additive noise are investigated. The cepstral vector changes as the level of additive noise increases. The behaviour of cepstral vector change shows that the cepstral vector shrinks in its norm and converges to the cepstral vector of the noise. This nonlinear behaviour of the cepstral vector can be approximated by a simple linear expression. Based on this representation, a model adaptation method is developed using deviation vectors. For every model state mean, a deviation vector is calculated according to the extracted noise spectrum and a pre-defined noise-to-signal ratio. During the pattern matching, an optimal scaling factor for the deviation vector is determined frame by frame, and the scaled deviation vector is added to the state mean of speech models so that the clean speech models are adapted to the noisy environment. Experimental results show that the proposed method is effective for white noise and coloured noise. It also outperforms the weighted projection measure method in experiments 相似文献

6.

Performance Evaluation of Silence-Feature Normalization Model using Cepstrum Features of Noise Signals

SangYeob Oh Kyungyong Chung 《Wireless Personal Communications》2018,98(4):3287-3297

Speech enhancement algorithms play an important role in speech signal processing. Over the past several decades, many algorithms have been studied for speech enhancement. A speech enhancement algorithm uses a noise removal method and a statistical model filter to analyze the speech signal in the frequency domain. Spectral subtraction and Wiener filters have been used as representative algorithms. These algorithms have excellent speech enhancement performance, but suffer from deterioration in performance due to specific noise or low signal-to-noise ratio (SNR) environments. In addition, according to estimations of erroneous noise, a noise existing in a voice signal is maintained so that a spectrum corresponding to a voice signal is distorted, or a frame corresponding to a voice signal cannot be retrieved, and voice recognition performance deteriorates. The problem of deterioration in speech recognition performance arises from the difference between speech recognition and training model. We use silence-feature normalization model as a methodology to improve the recognition rate resulting from the difference in the noisy environments. Conventional silence-feature normalization has a problem in that the silent part of the energy increases, which affects recognition performance due to unclear boundaries categorizing the voice. In this study, we use the cepstrum feature of the noise signals in the silence-feature normalization model to improve the performance of silence-feature normalization in a signal with a low SNR by setting a reference value for voiced and unvoiced classification. As a result of recognition rate confirmation, the recognition rates improve in performance, compared with other methods. 相似文献

7.

Multimedia Corpus of In-Car Speech Communication

Nobuo Kawaguchi Kazuya Takeda Fumitada Itakura 《Journal of Signal Processing Systems》2004,36(2-3):153-159

An ongoing project for constructing a multimedia corpus of dialogues under the driving condition is reported. More than 500 subjects have been enrolled in this corpus development and more than 2 gigabytes of signals have been collected during approximately 60 minutes of driving per subject. Twelve microphones and three video cameras are installed in a car to obtain audio and video data. In addition, five signals regarding car control and the location of the car provided by the Global Positioning System (GPS) are recorded. All signals are simultaneously recorded directly onto the hard disk of the PCs onboard the specially designed data collection vehicle (DCV). The in-car dialogues are initiated by a human operator, an automatic speech recognition (ASR) system and a wizard of OZ (WOZ) system so as to collect as many speech disfluencies as possible. In addition to the details of data collection, in this paper, preliminary results on intermedia signal conversion are described as an example of the corpus-based in-car speech signal processing research. 相似文献

8.

Comparison of some noise-compensation methods for speechrecognition in adverse environments

Milner B.P. Vaseghi S.V. 《Vision, Image and Signal Processing, IEE Proceedings -》1994,141(5):280-288

A comparative study is presented of three noise-compensation schemes, namely spectral subtraction, Wiener filters, and noise adaptation, for hidden-Markov-model-based speech recognition in adverse environments. The noise-compensation methods are evaluated on a spoken-digit database, in the presence of car noise and helicopter noise at different signal-to-noise ratios. Experimental results demonstrate that the noise-compensation methods achieve a substantial improvement in recognition accuracy across a wide range of signal-to-noise ratios. At a signal-to-noise ratio of -6 dB the recognition accuracy is improved from 11% to 83%. The use of cepstral-time matrices as an improved speech representation is also considered, and their combination with the noise-compensation methods is shown. Experiments show that the cepstral-time matrix is a more robust feature than a vector of identical size, composed of a combination of cepstral and differential cepstral features 相似文献

9.

基于特征语音的说话人自适应算法研究

朴春俊李玉萍韩永成《信息技术》2007,31(8):101-103

介绍了说话人自适应技术中的特征语音(Eigenvoice,EV)方法。用最大后验概率特征分解(Maximum a Posteriori Eigen-decomposition,MAPED)法来计算线性组合系数,代替了传统方法中的最大似然特征分解(Maximum Likelihood Eigen-decomposition,MLED)的方法。实验对这两种方法的性能进行了比较。结果证明使用MAPED这种方法比用MLED的方法错误识别率有一定的降低,增强了系统的鲁棒性。相似文献

10.

Adaptation scheme for hidden Markov models in noisy speechrecognition

Tai-Hwei Hwang Hsiao-Chuan Wang 《Electronics letters》1997,33(4):257-258

Shrinkage of the mean vectors and the variances in HMM due to additive white noise is an important issue for the speech recogniser. By giving an assumed relation between the adaptation factors for mean vector and variances, an optimal adaptation factor can be found by using the maximum likelihood method 相似文献

11.

高性能汉语数码串语音识别 总被引：9，自引：0，他引：9

下载免费PDF全文

李虎生刘加刘润生《电子学报》2001,29(5):595-599

本文给出了一个高性能汉语数码串非特定人连续语音识别系统,其声学模型基于Mel倒谱系数和连续HMM,识别时采用多候选帧同步搜索算法,并采用了MCE算法进行训练以提高系统的区分能力,实验证明该系统的识别率为94.8%(不定长数字串)和96.8%(定长数字串).为增强系统的实用性,本文还研究了基于MAP算法的说话人自适应算法和基于置信度的拒识算法.在进行自适应后,误识率可相对下降40%以上,在拒绝掉5%的正确语音时,系统识别率可以上升到96.9%(不定长数字串)和98.7%(定长数字串). 相似文献

12.

Adaptation of hidden Markov model for telephone speech recognitionand speaker adaptation

Chien J.-T. Wang H.-C. 《Vision, Image and Signal Processing, IEE Proceedings -》1997,144(3):129-135

The authors propose a channel compensation method for the hidden Markov model (HMM) parameters in automatic speech recognition. The proposed approach is to adapt the existing reference models to a new channel environment by using a small amount of adaptation data. The concept of HMM parameter adaptation by incorporating the corresponding phone-dependent channel compensation (PDCC) vectors is applied to improve the performance of speech recognition. Two extended PDCC techniques are presented. One is based on the refinement of PDCC using vector quantisation. The other is based on the interpolation of compensation vectors. Both techniques are evaluated on the experiments on telephone speech recognition and speaker adaptation. The experimental results show that the performance can be significantly improved 相似文献

13.

车载视频下改进的核相关滤波跟踪算法

黄立勤朱飘《电子与信息学报》2018,40(8):1887-1894

针对相关滤波跟踪算法在车载视频下由于环境复杂及目标尺度变化等情况下容易跟踪失败的问题,该文提出一种基于背景信息的尺度自适应相关滤波跟踪算法。首先利用背景感知相关滤波跟踪器融合方向梯度直方图特征预测目标下一帧位置,然后根据预测位置选取图像块进行检测,最后结合动态尺度比例金字塔模型对目标进行尺度估计。实验选取了KITTI数据库中23段车载视频和标注国内的4段车载视频进行测试,实验结果表明,该算法能有效降低车载环境的复杂背景、目标尺度变化等因素干扰,整体性能优于KCF, DSST, SAMF, SATPLE等主流相关滤波算法,对车载环境下复杂背景和尺度变化的目标跟踪具有鲁棒性。相似文献

14.

基于无语音概率改进的对数谱幅度估计增强算法

赵晓群黄小珊宫云梅《信号处理》2008,24(6)

针对谱减语音增强法中一直存在的去噪度、残留的音乐噪声和语音畸变度三者间均衡这一关键问题,本文提出一种基于无语音概率改进的对数谱估计增强算法.该算法结合无语音概率的思想,按照纯噪声帧和带噪语音帧两种状态.有区别地实时更新语音最小均方误差的对数谱增益,并利用无语音概率参数(SAP)自适应地调节平滑系数,以求随着噪声环境的变化,在去噪度、残留"音乐噪声"和语音畸变度之间自适应地折中.实验表明,该算法在相同去噪程度下,语音畸变和音乐噪声相对其他谱减法都同时地减弱,特别在低信噪比环境下优势更明显,而且平滑参数利用SAP参数,无需多余计算,便于实时处理. 相似文献

15.

Noise‐Robust Speaker Recognition Using Subband Likelihoods and Reliable‐Feature Selection

Sungtak Kim Mikyong Ji Hoirin Kim 《ETRI Journal》2008,30(1):89-100

We consider the feature recombination technique in a multiband approach to speaker identification and verification. To overcome the ineffectiveness of conventional feature recombination in broadband noisy environments, we propose a new subband feature recombination which uses subband likelihoods and a subband reliable‐feature selection technique with an adaptive noise model. In the decision step of speaker recognition, a few very low unreliable feature likelihood scores can cause a speaker recognition system to make an incorrect decision. To overcome this problem, reliable‐feature selection adjusts the likelihood scores of an unreliable feature by comparison with those of an adaptive noise model, which is estimated by the maximum a posteriori adaptation technique using noise features directly obtained from noisy test speech. To evaluate the effectiveness of the proposed methods in noisy environments, we use the TIMIT database and the NTIMIT database, which is the corresponding telephone version of TIMIT database. The proposed subband feature recombination with subband reliable‐feature selection achieves better performance than the conventional feature recombination system with reliable‐feature selection. 相似文献

16.

Nonlinear cepstral equalisation method for noisy speech recognition

Lee L.-M. Chen J.-K. Wang H.-C. 《Vision, Image and Signal Processing, IEE Proceedings -》1994,141(6):397-402

The authors deal with the problem of automatic speech recognition in the presence of additive white noise. The effect of noise is modelled as an additive term to the power spectrum of the original clean speech. The cepstral coefficients of the noisy speech are then derived from this model. The reference cepstral vectors trained from clean speech are adapted to their appropriate noisy version to best fit the testing speech cepstral vector. The LPC coefficients, LPC derived cepstral coefficients, and the distance between test and reference, are all regarded as functions of the noise ratio (the spectral power ratio of noise to noisy speech). A gradient based algorithm is proposed to find the optimal noise ratio as well as the minimum distance between the test cepstral vector and the noise adapted reference. A recursive algorithm based on Levinson-Durbin recursion is proposed to simultaneously calculate the LPC coefficients and the derivatives of the LPC coefficients with respect to the noise ratio. The stability of the proposed adaptation algorithm is also addressed. Experiments on multispeaker (50 males and 50 females) isolated Mandarin digits recognition demonstrate remarkable performance improvements over noncompensated method under noisy environment. The results are also compared to the projection based approach, and experiments show that the proposed method is superior to the projection approach under a severe noisy environment 相似文献

17.

基于高斯相似度分析的插值自适应算法

下载免费PDF全文

吕萍王作英陆大《电子学报》2001,29(Z1):1759-1761

快速说话人自适应算法在非特定人连续语音识别的应用中有重要意义.现在流行的自适应算法多数只考虑均值的自适应.本文提出的自适应算法可以快速的对协方差矩阵进行自适应.该算法是用高斯相似度度量协方差矩阵间的距离,并由此测度建立了反映协方差矩阵结构关系的二叉决策树.树的每个中间节点包含一个类质心.在决策树基础上,训练多个与特定人模型相关的类质心.自适应时,通过对这些类质心进行线性插值得到自适应的协方差矩阵.实验结果表明,该方法能够在仅有一句自适应数据的情况下,使系统误识率由29.49%下降到27.55%. 相似文献

18.

Design and Implementation of Subspace-Based Speech Enhancement Under In-Car Noisy Environments

Chung-Hsien Yang Jia-Ching Wang Jhing-Fa Wang Chung-Hsien Wu Kai-Hsing Chang 《Vehicular Technology, IEEE Transactions on》2008,57(3):1466-1479

In this paper, a new subspace-based speech enhancement model is presented for in-car speech enhancement. To effectively suppress background noise, this model incorporates a perceptual filterbank and an auditory gain adaptation derived from a psychoacoustic model into a signal subspace approach. The projection approximation subspace tracking deflation (PASTd) algorithm is used to track the signal subspace. For real-time processing, a system-on-a-programmable-chip architecture and a very large scale integration design of the PASTd algorithm are proposed. To realize a pipeline computation, this paper presents a pipelined PASTd architecture without data-dependent hazards. The maximum clock rate is 9.7 MHz, and the typical clock rate, which achieves the real-time requirement, is 4.6 MHz. The corresponding architecture was experimentally verified via an ALTERA EPXA10 development board. 相似文献

19.

汉语数码语音识别自适应算法 总被引：4，自引：0，他引：4

李虎生杨明杰《电路与系统学报》1999,4(2):1-6

说话人自适应是提高非特定人语音识别性能的有效方法之一。本文将ＭＡＰ算法应用于汉语数码语音识别中,并讨论了几种加快自适应速度的方法以及自适应对非自适应人的影响。实验表明,ＭＡＰ算法可以有效地降低汉语数码识别对被适应人的误识率,而且对非自适应人性能影响很小。相似文献

20.

A Prior Model of Structural SVMs for Domain Adaptation

Changki Lee Myung‐Gil Jang 《ETRI Journal》2011,33(5):712-719

In this paper, we study the problem of domain adaptation for structural support vector machines (SVMs). We consider a number of domain adaptation approaches for structural SVMs and evaluate them on named entity recognition, part‐of‐speech tagging, and sentiment classification problems. Finally, we show that a prior model for structural SVMs outperforms other domain adaptation approaches in most cases. Moreover, the training time for this prior model is reduced compared to other domain adaptation methods with improvements in performance. 相似文献