期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Signal Preprocessing for Speech Recognition

A. S. Kolokolov 《Automation and Remote Control》2002,63(3):494-501

Consideration was given to the transformations of speech in the frequency domain which precede extraction of the informative attributes of phonemes. A processing of the speech spectrum ensuring stability of recognition in the presence of frequency distortions and additive noise was proposed. It is based on linear bandpass filtering of the logarithmic amplitude spectrum and subsequent nonlinear transformation that models the effect of lateral inhibition in the auditory analyzer. 相似文献

2.

Subband Likelihood-Maximizing Beamforming for Speech Recognition in Reverberant Environments

《IEEE transactions on audio, speech, and language processing》2006,14(6):2109-2121

In this paper, we introduce Subband LIkelihood-MAximizing BEAMforming (S-LIMABEAM), a new microphone-array processing algorithm specifically designed for speech recognition applications. The proposed algorithm is an extension of the previously developed LIMABEAM array processing algorithm. Unlike most array processing algorithms which operate according to some waveform-level objective function, the goal of LIMABEAM is to find the set of array parameters that maximizes the likelihood of the correct recognition hypothesis. Optimizing the array parameters in this manner results in significant improvements in recognition accuracy over conventional array processing methods when speech is corrupted by additive noise and moderate levels of reverberation. Despite the success of the LIMABEAM algorithm in such environments, little improvement was achieved in highly reverberant environments. In such situations where the noise is highly correlated to the speech signal and the number of filter parameters to estimate is large, subband processing has been used to improve the performance of LMS-type adaptive filtering algorithms. We use subband processing principles to design a novel array processing architecture in which select groups of subbands are processed jointly to maximize the likelihood of the resulting speech recognition features, as measured by the recognizer itself. By creating a subband filtering architecture that explicitly accounts for the manner in which recognition features are computed, we can effectively apply the LIMABEAM framework to highly reverberant environments. By doing so, we are able to achieve improvements in word error rate of over 20% compared to conventional methods in highly reverberant environments. 相似文献

3.

Preprocessing and Segmentation of the Speech Signal in the Frequency Domain for Speech Recognition

A. S. Kolokolov 《Automation and Remote Control》2003,64(6):985-994

Preprocessing of the speech signal before recognition of phonemes was considered. Methods of processing the spectrum and segmenting the speech signal for stable speech recognition in the presence of frequency distortions were proposed. They are based on a procedure of linear filtering of the logarithmic spectrum envelope. 相似文献

4.

Robust Speech Dereverberation Using Multichannel Blind Deconvolution With Spectral Subtraction

Furuya K. Kataoka A. 《IEEE transactions on audio, speech, and language processing》2007,15(5):1579-1591

A robust dereverberation method is presented for speech enhancement in a situation requiring adaptation where a speaker shifts his/her head under reverberant conditions causing the impulse responses to change frequently. We combine correlation-based blind deconvolution with modified spectral subtraction to improve the quality of inverse-filtered speech degraded by the estimation error of inverse filters obtained in practice. Our method computes inverse filters by using the correlation matrix between input signals that can be observed without measuring room impulse responses. Inverse filtering reduces early reflection, which has most of the power of the reverberation, and then, spectral subtraction suppresses the tail of the inverse-filtered reverberation. The performance of our method in adaptation is demonstrated by experiments using measured room impulse responses. The subjective results indicated that this method provides superior speech quality to each of the individual methods: blind deconvolution and spectral subtraction. 相似文献

5.

Harmonicity-Based Blind Dereverberation for Single-Channel Speech Signals

Tomohiro Nakatani Keisuke Kinoshita Masato Miyoshi 《IEEE transactions on audio, speech, and language processing》2007,15(1):80-95

The distant acquisition of acoustic signals in an enclosed space often produces reverberant artifacts due to the room impulse response. Speech dereverberation is desirable in situations where the distant acquisition of acoustic signals is involved. These situations include hands-free speech recognition, teleconferencing, and meeting recording, to name a few. This paper proposes a processing method, named Harmonicity-based dEReverBeration (HERB), to reduce the amount of reverberation in the signal picked up by a single microphone. The method makes extensive use of harmonicity, a unique characteristic of speech, in the design of a dereverberation filter. In particular, harmonicity enhancement is proposed and demonstrated as an effective way of estimating a filter that approximates an inverse filter corresponding to the room impulse response. Two specific harmonicity enhancement techniques are presented and compared; one based on an average transfer function and the other on the minimization of a mean squared error function. Prototype HERB systems are implemented by introducing several techniques to improve the accuracy of dereverberation filter estimation, including time warping analysis. Experimental results show that the proposed methods can achieve high-quality speech dereverberation, when the reverberation time is between 0.1 and 1.0 s, in terms of reverberation energy decay curves and automatic speech recognition accuracy 相似文献

6.

Speech Dereverberation Based on Maximum-Likelihood Estimation With Time-Varying Gaussian Source Model

《IEEE transactions on audio, speech, and language processing》2008,16(8):1512-1527

Distant acquisition of acoustic signals in an enclosed space often produces reverberant components due to acoustic reflections in the room. Speech dereverberation is in general desirable when the signal is acquired through distant microphones in such applications as hands-free speech recognition, teleconferencing, and meeting recording. This paper proposes a new speech dereverberation approach based on a statistical speech model. A time-varying Gaussian source model (TVGSM) is introduced as a model that represents the dynamic short time characteristics of nonreverberant speech segments, including the time and frequency structures of the speech spectrum. With this model, dereverberation of the speech signal is formulated as a maximum-likelihood (ML) problem based on multichannel linear prediction, in which the speech signal is recovered by transforming the observed signal into one that is probabilistically more like nonreverberant speech. We first present a general ML solution based on TVGSM, and derive several dereverberation algorithms based on various source models. Specifically, we present a source model consisting of a finite number of states, each of which is manifested by a short time speech spectrum, defined by a corresponding autocorrelation (AC) vector. The dereverberation algorithm based on this model involves a finite collection of spectral patterns that form a codebook. We confirm experimentally that both the time and frequency characteristics represented in the source models are very important for speech dereverberation, and that the prior knowledge represented by the codebook allows us to further improve the dereverberated speech quality. We also confirm that the quality of reverberant speech signals can be greatly improved in terms of the spectral shape and energy time-pattern distortions from simply a short speech signal using a speaker-independent codebook. 相似文献

7.

移动环境下基于模板的动态补偿用于改善语音识别的鲁棒性

王亚迅许军《现代计算机》2008,(8)

提出一种基于模板的动态补偿方案(PDC),用来改善移动环境下语音识别(ASR)的鲁棒性.在PDC中,定义一个带偏差的固定模板来纠正数据训练时的环境变量,假设数据训练是根据一组事先定义好的应用场景下得到的;在识别时,瞬时的偏差由几种可能的模板线性加权得到.为了快速估计加权值,提出了基于语音相关先验模板的贝叶斯学习过程(PDC-SPE).室外环境下实验表明,PDC-SPE学习过程好于常规的补偿自适应方法,通过训练后系统的错误识别率有20-25%的相对减少. 相似文献

8.

Merge-Weighted Dynamic Time Warping for Speech Recognition

下载免费PDF全文

张湘莉兰骆志刚李明《计算机科学技术学报》2014,29(6):1072-1082

Obtaining training material for rarely used English words and common given names from countries where English is not spoken is di?cult due to excessive time, storage and cost factors. By considering pe... 相似文献

9.

一种基于鲁棒特征的模型补偿噪声语音识别方法

张军韦岗《数据采集与处理》2003,18(3):249-252

针对抗噪声语音特征技术和基于MFCC特征的模型补偿技术在低信噪比时识别率不高的缺点,将抗噪声语音特征和模型补偿结合起来,提出了一种基于单边自相关序列(One—sided autocorrelation,OSA)MFCC特征的模型补偿噪声语音识别方法,以提高语音识别系统在低信噪比时的性能。对0～9十个英文数字和NOISEX92中的白噪声、F16噪声和FACTORY噪声的识别实验结果表明．本文提出的识别方法可以有效地提高OSA—MFCC识别器在噪声环境中的识别率,并且在低信噪比时其性能明显优于经过相同补偿处理的MFCC识别器。相似文献

10.

图像识别的静态手势识别与动态跟踪系统设计

李天真宋齐顺贾岚絮何刚强《单片机与嵌入式系统应用》2021,21(4):34-37

本文研究了图像手势识别和增强现实技术,设计了可以进行静态手势识别和动态跟踪的系统,通过提前录入不同手势,利用皮肤颜色对图像进行OSTU自适应阈值划分,建立二值化图像,与已知的手势进行匹配,以得到手势结果.实验结果表明,准确率达到96.8％,识别速度达到0.55 s.动态跟踪利用检测每帧图像中手部的位置进行定位和捕捉,图... 相似文献

11.

Multichannel Eigenspace Beamforming in a Reverberant Noisy Environment With Multiple Interfering Speech Signals

《IEEE transactions on audio, speech, and language processing》2009,17(6):1071-1086

相似文献

12.

Reverberant Speech Enhancement by Temporal and Spectral Processing

Krishnamoorthy P. Prasanna S. 《IEEE transactions on audio, speech, and language processing》2009,17(2):253-266

This paper presents an approach for the enhancement of reverberant speech by temporal and spectral processing. Temporal processing involves identification and enhancement of high signal-to-reverberation ratio (SRR) regions in the temporal domain. Spectral processing involves removal of late reverberant components in the spectral domain. First, the spectral subtraction-based processing is performed to eliminate the late reverberant components, and then the spectrally processed speech is further subjected to the excitation source information-based temporal processing to enhance the high SRR regions. The objective measures segmental SRR and log spectral distance are computed for different cases, namely, reverberant, spectral processed, temporal processed, and combined temporal and spectral processed speech signals. The quality of the speech signal that is processed by the temporal and spectral processing is significantly enhanced compared to the reverberant speech as well as the signals that are processed by the individual temporal and spectral processing methods. 相似文献

13.

电话语音识别中统一的加性噪声和卷积噪声补偿算法

《自动化学报》2004,30(2)

相似文献

14.

基于噪声分类与补偿的车载语音识别

《计算机工程》2017,(3):220-224

针对现有车载语音识别系统在实际应用环境下噪声鲁棒性较差的问题,提出一种基于支持向量机(SVM)的噪声分类与补偿方法。采集各应用场景下的噪声构建SVM噪声分类器,利用SVM对待测语音静音段中的噪声进行分类,根据噪声类型选择相应的带噪训练模板进行噪声补偿,并将差分频谱倒谱系数作为特征参数进一步抑制语音段中的噪声,从而实现车载语音识别。实验结果表明,该方法可有效增强车载语音识别系统的噪声鲁棒性,并且与稀疏编码语音增强和能量规整倒谱系数特征增强方法相比,具有更高的语音识别率。相似文献

15.

Speech Recognition Using Linear Dynamic Models

Joe Frankel Simon King 《IEEE transactions on audio, speech, and language processing》2007,15(1):246-256

The majority of automatic speech recognition systems rely on hidden Markov models, in which Gaussian mixtures model the output distributions associated with sub-phone states. This approach, whilst successful, models consecutive feature vectors (augmented to include derivative information) as statistically independent. Furthermore, spatial correlations present in speech parameters are frequently ignored through the use of diagonal covariance matrices. This paper continues the work of Digalakis and others who proposed instead a first-order linear state-space model which has the capacity to model underlying dynamics, and furthermore give a model of spatial correlations. This paper examines the assumptions made in applying such a model and shows that the addition of a hidden dynamic state leads to increases in accuracy over otherwise equivalent static models. We also propose a time-asynchronous decoding strategy suited to recognition with segment models. We describe implementation of decoding for linear dynamic models and present TIMIT phone recognition results 相似文献

16.

静态背景中的动态图像识别

刘振安颜廷荣张蕊《计算机技术与发展》2001,11(1)

本文讨论如何处理CCD图像,识别出细玻璃管中的液面高度的变化。相似文献

17.

静态背景中的动态图像识别 总被引：1，自引：0，他引：1

刘振安颜廷荣等《微机发展》2001,11(1):52-53

本文讨论如何处理CCD图像,识别出细玻璃管中的液面高度的变化。相似文献

18.

语音识别中带宽失配的补偿研究

何勇军韩纪庆《计算机学报》2011,34(9):1629-1637

目前的语音识别系统在训练环境与测试环境匹配的情况下具有很高的识别率,而当环境失配时,其性能将急剧下降.作者研究发现,带宽失配,即训练语料和测试语料带宽不一致,也是引起环境失配的主要原因之一.当测试语音带宽比训练语音带宽窄时,丢失的频段不可逆,且其影响在倒谱域或对数频谱域七是时变的,因而无法用目前的信道补偿方法补偿.文章... 相似文献

19.

语音识别中环境失配补偿综述

何勇军韩纪庆《电脑学习》2012,2(6)

作为语音处理领域的支撑技术之一,语音识别以识别语音信号并将其转变成文字为目标,在智能人机交互、对话系统、多媒体内容分析等领域有着广阔的应用前景.经过数十年的发展,目前的语音识别技术在理想状况下能取得较高的识别率.然而,在采集和传输过程中,语音信号不可避免地会受到各种信道和加性噪声的干扰,引起训练环境和识别环境不一致、即环境失配,进而导致识别系统的性能急剧下降.这种失配严重阻碍了语音识别技术走向现实应用,已成为语音识别领域中迫切需要解决的问题之一.首先阐述了环境失配的问题,然后按照加性噪声、信道畸变和联合补偿的脉络,系统地综述了各个问题的补偿方法. 相似文献

20.

Integrated Speech Enhancement Method Using Noise Suppression and Dereverberation

Yoshioka T. Nakatani T. Miyoshi M. 《IEEE transactions on audio, speech, and language processing》2009,17(2):231-246

This paper proposes a method for enhancing speech signals contaminated by room reverberation and additive stationary noise. The following conditions are assumed. 1) Short-time spectral components of speech and noise are statistically independent Gaussian random variables. 2) A room's convolutive system is modeled as an autoregressive system in each frequency band. 3) A short-time power spectral density of speech is modeled as an all-pole spectrum, while that of noise is assumed to be time-invariant and known in advance. Under these conditions, the proposed method estimates the parameters of the convolutive system and those of the all-pole speech model based on the maximum likelihood estimation method. The estimated parameters are then used to calculate the minimum mean square error estimates of the speech spectral components. The proposed method has two significant features. 1) The parameter estimation part performs noise suppression and dereverberation alternately. (2) Noise-free reverberant speech spectrum estimates, which are transferred by the noise suppression process to the dereverberation process, are represented in the form of a probability distribution. This paper reports the experimental results of 1500 trials conducted using 500 different utterances. The reverberation time RT₆₀ was 0.6 s, and the reverberant signal to noise ratio was 20, 15, or 10 dB. The experimental results show the superiority of the proposed method over the sequential performance of the noise suppression and dereverberation processes. 相似文献