共查询到20条相似文献,搜索用时 0 毫秒
1.
A. S. Kolokolov 《Automation and Remote Control》2002,63(3):494-501
Consideration was given to the transformations of speech in the frequency domain which precede extraction of the informative attributes of phonemes. A processing of the speech spectrum ensuring stability of recognition in the presence of frequency distortions and additive noise was proposed. It is based on linear bandpass filtering of the logarithmic amplitude spectrum and subsequent nonlinear transformation that models the effect of lateral inhibition in the auditory analyzer. 相似文献
2.
《IEEE transactions on audio, speech, and language processing》2006,14(6):2109-2121
In this paper, we introduce Subband LIkelihood-MAximizing BEAMforming (S-LIMABEAM), a new microphone-array processing algorithm specifically designed for speech recognition applications. The proposed algorithm is an extension of the previously developed LIMABEAM array processing algorithm. Unlike most array processing algorithms which operate according to some waveform-level objective function, the goal of LIMABEAM is to find the set of array parameters that maximizes the likelihood of the correct recognition hypothesis. Optimizing the array parameters in this manner results in significant improvements in recognition accuracy over conventional array processing methods when speech is corrupted by additive noise and moderate levels of reverberation. Despite the success of the LIMABEAM algorithm in such environments, little improvement was achieved in highly reverberant environments. In such situations where the noise is highly correlated to the speech signal and the number of filter parameters to estimate is large, subband processing has been used to improve the performance of LMS-type adaptive filtering algorithms. We use subband processing principles to design a novel array processing architecture in which select groups of subbands are processed jointly to maximize the likelihood of the resulting speech recognition features, as measured by the recognizer itself. By creating a subband filtering architecture that explicitly accounts for the manner in which recognition features are computed, we can effectively apply the LIMABEAM framework to highly reverberant environments. By doing so, we are able to achieve improvements in word error rate of over 20% compared to conventional methods in highly reverberant environments. 相似文献
3.
A. S. Kolokolov 《Automation and Remote Control》2003,64(6):985-994
Preprocessing of the speech signal before recognition of phonemes was considered. Methods of processing the spectrum and segmenting the speech signal for stable speech recognition in the presence of frequency distortions were proposed. They are based on a procedure of linear filtering of the logarithmic spectrum envelope. 相似文献
4.
Furuya K. Kataoka A. 《IEEE transactions on audio, speech, and language processing》2007,15(5):1579-1591
A robust dereverberation method is presented for speech enhancement in a situation requiring adaptation where a speaker shifts his/her head under reverberant conditions causing the impulse responses to change frequently. We combine correlation-based blind deconvolution with modified spectral subtraction to improve the quality of inverse-filtered speech degraded by the estimation error of inverse filters obtained in practice. Our method computes inverse filters by using the correlation matrix between input signals that can be observed without measuring room impulse responses. Inverse filtering reduces early reflection, which has most of the power of the reverberation, and then, spectral subtraction suppresses the tail of the inverse-filtered reverberation. The performance of our method in adaptation is demonstrated by experiments using measured room impulse responses. The subjective results indicated that this method provides superior speech quality to each of the individual methods: blind deconvolution and spectral subtraction. 相似文献
5.
Tomohiro Nakatani Keisuke Kinoshita Masato Miyoshi 《IEEE transactions on audio, speech, and language processing》2007,15(1):80-95
The distant acquisition of acoustic signals in an enclosed space often produces reverberant artifacts due to the room impulse response. Speech dereverberation is desirable in situations where the distant acquisition of acoustic signals is involved. These situations include hands-free speech recognition, teleconferencing, and meeting recording, to name a few. This paper proposes a processing method, named Harmonicity-based dEReverBeration (HERB), to reduce the amount of reverberation in the signal picked up by a single microphone. The method makes extensive use of harmonicity, a unique characteristic of speech, in the design of a dereverberation filter. In particular, harmonicity enhancement is proposed and demonstrated as an effective way of estimating a filter that approximates an inverse filter corresponding to the room impulse response. Two specific harmonicity enhancement techniques are presented and compared; one based on an average transfer function and the other on the minimization of a mean squared error function. Prototype HERB systems are implemented by introducing several techniques to improve the accuracy of dereverberation filter estimation, including time warping analysis. Experimental results show that the proposed methods can achieve high-quality speech dereverberation, when the reverberation time is between 0.1 and 1.0 s, in terms of reverberation energy decay curves and automatic speech recognition accuracy 相似文献
6.
《IEEE transactions on audio, speech, and language processing》2008,16(8):1512-1527
7.
提出一种基于模板的动态补偿方案(PDC),用来改善移动环境下语音识别(ASR)的鲁棒性.在PDC中,定义一个带偏差的固定模板来纠正数据训练时的环境变量,假设数据训练是根据一组事先定义好的应用场景下得到的;在识别时,瞬时的偏差由几种可能的模板线性加权得到.为了快速估计加权值,提出了基于语音相关先验模板的贝叶斯学习过程(PDC-SPE).室外环境下实验表明,PDC-SPE学习过程好于常规的补偿自适应方法,通过训练后系统的错误识别率有20-25%的相对减少. 相似文献
8.
Obtaining training material for rarely used English words and common given names from countries where English is not spoken is di?cult due to excessive time, storage and cost factors. By considering pe... 相似文献
9.
针对抗噪声语音特征技术和基于MFCC特征的模型补偿技术在低信噪比时识别率不高的缺点,将抗噪声语音特征和模型补偿结合起来,提出了一种基于单边自相关序列(One—sided autocorrelation,OSA)MFCC特征的模型补偿噪声语音识别方法,以提高语音识别系统在低信噪比时的性能。对0~9十个英文数字和NOISEX92中的白噪声、F16噪声和FACTORY噪声的识别实验结果表明.本文提出的识别方法可以有效地提高OSA—MFCC识别器在噪声环境中的识别率,并且在低信噪比时其性能明显优于经过相同补偿处理的MFCC识别器。 相似文献
10.
本文研究了图像手势识别和增强现实技术,设计了可以进行静态手势识别和动态跟踪的系统,通过提前录入不同手势,利用皮肤颜色对图像进行OSTU自适应阈值划分,建立二值化图像,与已知的手势进行匹配,以得到手势结果.实验结果表明,准确率达到96.8%,识别速度达到0.55 s.动态跟踪利用检测每帧图像中手部的位置进行定位和捕捉,图... 相似文献
11.
12.
Krishnamoorthy P. Prasanna S. 《IEEE transactions on audio, speech, and language processing》2009,17(2):253-266
This paper presents an approach for the enhancement of reverberant speech by temporal and spectral processing. Temporal processing involves identification and enhancement of high signal-to-reverberation ratio (SRR) regions in the temporal domain. Spectral processing involves removal of late reverberant components in the spectral domain. First, the spectral subtraction-based processing is performed to eliminate the late reverberant components, and then the spectrally processed speech is further subjected to the excitation source information-based temporal processing to enhance the high SRR regions. The objective measures segmental SRR and log spectral distance are computed for different cases, namely, reverberant, spectral processed, temporal processed, and combined temporal and spectral processed speech signals. The quality of the speech signal that is processed by the temporal and spectral processing is significantly enhanced compared to the reverberant speech as well as the signals that are processed by the individual temporal and spectral processing methods. 相似文献
13.
14.
15.
Joe Frankel Simon King 《IEEE transactions on audio, speech, and language processing》2007,15(1):246-256
The majority of automatic speech recognition systems rely on hidden Markov models, in which Gaussian mixtures model the output distributions associated with sub-phone states. This approach, whilst successful, models consecutive feature vectors (augmented to include derivative information) as statistically independent. Furthermore, spatial correlations present in speech parameters are frequently ignored through the use of diagonal covariance matrices. This paper continues the work of Digalakis and others who proposed instead a first-order linear state-space model which has the capacity to model underlying dynamics, and furthermore give a model of spatial correlations. This paper examines the assumptions made in applying such a model and shows that the addition of a hidden dynamic state leads to increases in accuracy over otherwise equivalent static models. We also propose a time-asynchronous decoding strategy suited to recognition with segment models. We describe implementation of decoding for linear dynamic models and present TIMIT phone recognition results 相似文献
16.
17.
18.
目前的语音识别系统在训练环境与测试环境匹配的情况下具有很高的识别率,而当环境失配时,其性能将急剧下降.作者研究发现,带宽失配,即训练语料和测试语料带宽不一致,也是引起环境失配的主要原因之一.当测试语音带宽比训练语音带宽窄时,丢失的频段不可逆,且其影响在倒谱域或对数频谱域七是时变的,因而无法用目前的信道补偿方法补偿.文章... 相似文献
19.
作为语音处理领域的支撑技术之一,语音识别以识别语音信号并将其转变成文字为目标,在智能人机交互、对话系统、多媒体内容分析等领域有着广阔的应用前景.经过数十年的发展,目前的语音识别技术在理想状况下能取得较高的识别率.然而,在采集和传输过程中,语音信号不可避免地会受到各种信道和加性噪声的干扰,引起训练环境和识别环境不一致、即环境失配,进而导致识别系统的性能急剧下降.这种失配严重阻碍了语音识别技术走向现实应用,已成为语音识别领域中迫切需要解决的问题之一.首先阐述了环境失配的问题,然后按照加性噪声、信道畸变和联合补偿的脉络,系统地综述了各个问题的补偿方法. 相似文献
20.
Yoshioka T. Nakatani T. Miyoshi M. 《IEEE transactions on audio, speech, and language processing》2009,17(2):231-246
This paper proposes a method for enhancing speech signals contaminated by room reverberation and additive stationary noise. The following conditions are assumed. 1) Short-time spectral components of speech and noise are statistically independent Gaussian random variables. 2) A room's convolutive system is modeled as an autoregressive system in each frequency band. 3) A short-time power spectral density of speech is modeled as an all-pole spectrum, while that of noise is assumed to be time-invariant and known in advance. Under these conditions, the proposed method estimates the parameters of the convolutive system and those of the all-pole speech model based on the maximum likelihood estimation method. The estimated parameters are then used to calculate the minimum mean square error estimates of the speech spectral components. The proposed method has two significant features. 1) The parameter estimation part performs noise suppression and dereverberation alternately. (2) Noise-free reverberant speech spectrum estimates, which are transferred by the noise suppression process to the dereverberation process, are represented in the form of a probability distribution. This paper reports the experimental results of 1500 trials conducted using 500 different utterances. The reverberation time RT60 was 0.6 s, and the reverberant signal to noise ratio was 20, 15, or 10 dB. The experimental results show the superiority of the proposed method over the sequential performance of the noise suppression and dereverberation processes. 相似文献