首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
Consideration was given to the transformations of speech in the frequency domain which precede extraction of the informative attributes of phonemes. A processing of the speech spectrum ensuring stability of recognition in the presence of frequency distortions and additive noise was proposed. It is based on linear bandpass filtering of the logarithmic amplitude spectrum and subsequent nonlinear transformation that models the effect of lateral inhibition in the auditory analyzer.  相似文献   

In this paper, we introduce Subband LIkelihood-MAximizing BEAMforming (S-LIMABEAM), a new microphone-array processing algorithm specifically designed for speech recognition applications. The proposed algorithm is an extension of the previously developed LIMABEAM array processing algorithm. Unlike most array processing algorithms which operate according to some waveform-level objective function, the goal of LIMABEAM is to find the set of array parameters that maximizes the likelihood of the correct recognition hypothesis. Optimizing the array parameters in this manner results in significant improvements in recognition accuracy over conventional array processing methods when speech is corrupted by additive noise and moderate levels of reverberation. Despite the success of the LIMABEAM algorithm in such environments, little improvement was achieved in highly reverberant environments. In such situations where the noise is highly correlated to the speech signal and the number of filter parameters to estimate is large, subband processing has been used to improve the performance of LMS-type adaptive filtering algorithms. We use subband processing principles to design a novel array processing architecture in which select groups of subbands are processed jointly to maximize the likelihood of the resulting speech recognition features, as measured by the recognizer itself. By creating a subband filtering architecture that explicitly accounts for the manner in which recognition features are computed, we can effectively apply the LIMABEAM framework to highly reverberant environments. By doing so, we are able to achieve improvements in word error rate of over 20% compared to conventional methods in highly reverberant environments.  相似文献   

基于动静态组合特征参数的语音识别   总被引:1,自引:0,他引:1  
基于语音信号的时变特性,本文提出了动静态特征参数结合的语音信号识别方法,首先在特征参数提取中引入了小波包变换,借助MFCC(Mel-Frequency Cepstrum Coefficient)参数的提取方法,用小波包变换代替傅立叶变换和Mel滤波器组,提取了新的静态特征参数DWPTMFCC(Discrete Wavelet Packet Transform Mel-Frequency Coefficient),然后把它与一阶DWPTMFCC差分参数相结合成一个向量,作为一帧语音信号的参数,通过试验和仿真,此参数具有很高的识别率,是一种很好的语音特征参数.并且把混沌特性引入到神经元,构成混沌神经网络,把这种神经网络用于语音识别,并与常用的BP神经网络识别方法进行了比较.试验结果表明,混沌神经网络的平均识别率要高于同等条件下常用的神经网络方法的识别率.  相似文献   

Preprocessing of the speech signal before recognition of phonemes was considered. Methods of processing the spectrum and segmenting the speech signal for stable speech recognition in the presence of frequency distortions were proposed. They are based on a procedure of linear filtering of the logarithmic spectrum envelope.  相似文献   

A robust dereverberation method is presented for speech enhancement in a situation requiring adaptation where a speaker shifts his/her head under reverberant conditions causing the impulse responses to change frequently. We combine correlation-based blind deconvolution with modified spectral subtraction to improve the quality of inverse-filtered speech degraded by the estimation error of inverse filters obtained in practice. Our method computes inverse filters by using the correlation matrix between input signals that can be observed without measuring room impulse responses. Inverse filtering reduces early reflection, which has most of the power of the reverberation, and then, spectral subtraction suppresses the tail of the inverse-filtered reverberation. The performance of our method in adaptation is demonstrated by experiments using measured room impulse responses. The subjective results indicated that this method provides superior speech quality to each of the individual methods: blind deconvolution and spectral subtraction.  相似文献   

The distant acquisition of acoustic signals in an enclosed space often produces reverberant artifacts due to the room impulse response. Speech dereverberation is desirable in situations where the distant acquisition of acoustic signals is involved. These situations include hands-free speech recognition, teleconferencing, and meeting recording, to name a few. This paper proposes a processing method, named Harmonicity-based dEReverBeration (HERB), to reduce the amount of reverberation in the signal picked up by a single microphone. The method makes extensive use of harmonicity, a unique characteristic of speech, in the design of a dereverberation filter. In particular, harmonicity enhancement is proposed and demonstrated as an effective way of estimating a filter that approximates an inverse filter corresponding to the room impulse response. Two specific harmonicity enhancement techniques are presented and compared; one based on an average transfer function and the other on the minimization of a mean squared error function. Prototype HERB systems are implemented by introducing several techniques to improve the accuracy of dereverberation filter estimation, including time warping analysis. Experimental results show that the proposed methods can achieve high-quality speech dereverberation, when the reverberation time is between 0.1 and 1.0 s, in terms of reverberation energy decay curves and automatic speech recognition accuracy  相似文献   

Distant acquisition of acoustic signals in an enclosed space often produces reverberant components due to acoustic reflections in the room. Speech dereverberation is in general desirable when the signal is acquired through distant microphones in such applications as hands-free speech recognition, teleconferencing, and meeting recording. This paper proposes a new speech dereverberation approach based on a statistical speech model. A time-varying Gaussian source model (TVGSM) is introduced as a model that represents the dynamic short time characteristics of nonreverberant speech segments, including the time and frequency structures of the speech spectrum. With this model, dereverberation of the speech signal is formulated as a maximum-likelihood (ML) problem based on multichannel linear prediction, in which the speech signal is recovered by transforming the observed signal into one that is probabilistically more like nonreverberant speech. We first present a general ML solution based on TVGSM, and derive several dereverberation algorithms based on various source models. Specifically, we present a source model consisting of a finite number of states, each of which is manifested by a short time speech spectrum, defined by a corresponding autocorrelation (AC) vector. The dereverberation algorithm based on this model involves a finite collection of spectral patterns that form a codebook. We confirm experimentally that both the time and frequency characteristics represented in the source models are very important for speech dereverberation, and that the prior knowledge represented by the codebook allows us to further improve the dereverberated speech quality. We also confirm that the quality of reverberant speech signals can be greatly improved in terms of the spectral shape and energy time-pattern distortions from simply a short speech signal using a speaker-independent codebook.   相似文献   

语音交互技术在实际的话音驱动应用中得到日益普及。然而,当声源距离传声器较远时,由于实际环境中混响现象的影响,语音交互的性能还远不能使人满意。针对混响问题,数十年来学者们不断地进行大量的研究,并提出了很多实用的方法。特别是近期兴起的在很大程度上改变语音处理格局的深度学习技术,在单通道去混响方面也取得了很多令人瞩目的效果。然而,目前系统性总结分析基于深度学习的去混响方法与经典算法之间联系的工作仍然比较匮乏。因此,本文对单通道语音去混响技术的发展脉络进行系统的梳理和总结,并讨论了有待进一步研究的开放问题。  相似文献   

提出一种基于模板的动态补偿方案(PDC),用来改善移动环境下语音识别(ASR)的鲁棒性.在PDC中,定义一个带偏差的固定模板来纠正数据训练时的环境变量,假设数据训练是根据一组事先定义好的应用场景下得到的;在识别时,瞬时的偏差由几种可能的模板线性加权得到.为了快速估计加权值,提出了基于语音相关先验模板的贝叶斯学习过程(PDC-SPE).室外环境下实验表明,PDC-SPE学习过程好于常规的补偿自适应方法,通过训练后系统的错误识别率有20-25%的相对减少.  相似文献   

This paper presents an approach for the enhancement of reverberant speech by temporal and spectral processing. Temporal processing involves identification and enhancement of high signal-to-reverberation ratio (SRR) regions in the temporal domain. Spectral processing involves removal of late reverberant components in the spectral domain. First, the spectral subtraction-based processing is performed to eliminate the late reverberant components, and then the spectrally processed speech is further subjected to the excitation source information-based temporal processing to enhance the high SRR regions. The objective measures segmental SRR and log spectral distance are computed for different cases, namely, reverberant, spectral processed, temporal processed, and combined temporal and spectral processed speech signals. The quality of the speech signal that is processed by the temporal and spectral processing is significantly enhanced compared to the reverberant speech as well as the signals that are processed by the individual temporal and spectral processing methods.  相似文献   

Obtaining training material for rarely used English words and common given names from countries where English is not spoken is di?cult due to excessive time, storage and cost factors. By considering pe...  相似文献   

针对抗噪声语音特征技术和基于MFCC特征的模型补偿技术在低信噪比时识别率不高的缺点,将抗噪声语音特征和模型补偿结合起来,提出了一种基于单边自相关序列(One—sided autocorrelation,OSA)MFCC特征的模型补偿噪声语音识别方法,以提高语音识别系统在低信噪比时的性能。对0~9十个英文数字和NOISEX92中的白噪声、F16噪声和FACTORY噪声的识别实验结果表明.本文提出的识别方法可以有效地提高OSA—MFCC识别器在噪声环境中的识别率,并且在低信噪比时其性能明显优于经过相同补偿处理的MFCC识别器。  相似文献   

利用动态部位变化的步态识别   总被引:3,自引:1,他引:2       下载免费PDF全文
为了解决人的衣着变化和携带物品对步态识别的影响,提出了一种基于动态部位变化的步态识别方法。首先应用背景差分和阴影消除获得人体步态轮廓,并对获取的轮廓进行位置中心化和大小归一化;然后通过步态能量图和阈值分割的方法划分出每一帧的动态部位,并使用扇形区域距离变换的方法对动态部位进行特征提取;最后使用最大熵马尔可夫模型对各个人的步态进行建模,完成了基于概率图的识别。该方法在CASIA步态数据库上进行了实验,取得了较高的正确识别率,实验结果表明该方法对人的衣着变化和携带物品情况下的步态识别具有较强的鲁棒性。  相似文献   

语音增强是提高语音质量与可懂度的关键技术,在语音识别、语音通话、电话会议和听力辅助等领域具有广泛应用前景与重要研究价值.从模型方法、数据集、特征、评估指标等方面,对单声道语音增强研究工作的发展现状进行了全面调研和深入分析.1)对传统的与基于机器学习的单声道语音降噪以及语音去混响的已有研究工作进行了梳理分类,简要介绍了典型方法的研究思路,并对不同方法的实验结果进行了综合比较;2)对在实验与结果评估过程中所涉及到的常用数据集、常见特征、学习目标与评估指标等进行了整理与介绍;3)对目前单声道语音增强仍然面临的主要问题与挑战进行了总结.  相似文献   

本文研究了图像手势识别和增强现实技术,设计了可以进行静态手势识别和动态跟踪的系统,通过提前录入不同手势,利用皮肤颜色对图像进行OSTU自适应阈值划分,建立二值化图像,与已知的手势进行匹配,以得到手势结果.实验结果表明,准确率达到96.8%,识别速度达到0.55 s.动态跟踪利用检测每帧图像中手部的位置进行定位和捕捉,图...  相似文献   

基于语音增强失真补偿的抗噪声语音识别技术   总被引:1,自引:0,他引:1  
本文提出了一种基于语音增强失真补偿的抗噪声语音识别算法。在前端,语音增强有效地抑制背景噪声;语音增强带来的频谱失真和剩余噪声是对语音识别不利的因素,其影响将通过识别阶段的并行模型合并或特征提取阶段的倒谱均值归一化得到补偿。实验结果表明,此算法能够在非常宽的信噪比范围内显著的提高语音识别系统在噪声环境下的识别精度,在低信噪比情况下的效果尤其明显,如对-5dB的白噪声,相对于基线识别器,该算法可使误识率下降67.4%。  相似文献   

为了统一地补偿电话语音受加性噪声和卷积通道响应的影响,本文提出了矢量分段多项式近似(VPP)算法.并把此算法成功地应用到稳态噪声和非稳态噪声环境.对于稳态噪声环境,在log谱域采用Batch EM(B EM)方法;对于非稳态噪声环境,在倒谱域采用递归EM(REM)方法.这两种方法都是基于最小均方误差估计(MMSE)准则的特征补偿.实验结果表明,受背景噪声和电话通道(包括固定电话和GSM)影响的大词汇量连续语音识别应用此算法误识率可以降低约18%.  相似文献   

This paper proposes a method for enhancing speech signals contaminated by room reverberation and additive stationary noise. The following conditions are assumed. 1) Short-time spectral components of speech and noise are statistically independent Gaussian random variables. 2) A room's convolutive system is modeled as an autoregressive system in each frequency band. 3) A short-time power spectral density of speech is modeled as an all-pole spectrum, while that of noise is assumed to be time-invariant and known in advance. Under these conditions, the proposed method estimates the parameters of the convolutive system and those of the all-pole speech model based on the maximum likelihood estimation method. The estimated parameters are then used to calculate the minimum mean square error estimates of the speech spectral components. The proposed method has two significant features. 1) The parameter estimation part performs noise suppression and dereverberation alternately. (2) Noise-free reverberant speech spectrum estimates, which are transferred by the noise suppression process to the dereverberation process, are represented in the form of a probability distribution. This paper reports the experimental results of 1500 trials conducted using 500 different utterances. The reverberation time RT60 was 0.6 s, and the reverberant signal to noise ratio was 20, 15, or 10 dB. The experimental results show the superiority of the proposed method over the sequential performance of the noise suppression and dereverberation processes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号