首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A new class‐based histogram equalization method is proposed for robust speech recognition. The proposed method aims at not only compensating the acoustic mismatch between training and test environments, but also at reducing the discrepancy between the phonetic distributions of training and test speech data. The algorithm utilizes multiple class‐specific reference and test cumulative distribution functions, classifies the noisy test features into their corresponding classes, and equalizes the features by using their corresponding class‐specific reference and test distributions. Experiments on the Aurora 2 database proved the effectiveness of the proposed method by reducing relative errors by 18.74%, 17.52%, and 23.45% over the conventional histogram equalization method and by 59.43%, 66.00%, and 50.50% over mel‐cepstral‐based features for test sets A, B, and C, respectively.  相似文献   

2.
We consider the feature recombination technique in a multiband approach to speaker identification and verification. To overcome the ineffectiveness of conventional feature recombination in broadband noisy environments, we propose a new subband feature recombination which uses subband likelihoods and a subband reliable‐feature selection technique with an adaptive noise model. In the decision step of speaker recognition, a few very low unreliable feature likelihood scores can cause a speaker recognition system to make an incorrect decision. To overcome this problem, reliable‐feature selection adjusts the likelihood scores of an unreliable feature by comparison with those of an adaptive noise model, which is estimated by the maximum a posteriori adaptation technique using noise features directly obtained from noisy test speech. To evaluate the effectiveness of the proposed methods in noisy environments, we use the TIMIT database and the NTIMIT database, which is the corresponding telephone version of TIMIT database. The proposed subband feature recombination with subband reliable‐feature selection achieves better performance than the conventional feature recombination system with reliable‐feature selection.  相似文献   

3.
基于加权特征值补偿的说话人识别   总被引:3,自引:0,他引:3  
于鹏  徐义芳  曹志刚 《信号处理》2002,18(6):513-517
背景噪声的存在,使得说话人识别系统的训练环境和测试环境发生失配,导致系统性能发生急剧下降。本论文提出一种加权特征值补偿算法,把由噪声引起的使带噪语音信号特征值与纯净语音特征值发生偏差的部分去除,从而使进入识别器的特征值接近纯净语音的特征值。在特征值补偿过程中引入了信噪比加权的方法。实验表明,这种方法能够有效的提高说话人识别系统的性能。  相似文献   

4.
We propose a novel feature processing technique which can provide a cepstral liftering effect in the log‐spectral domain. Cepstral liftering aims at the equalization of variance of cepstral coefficients for the distance‐based speech recognizer, and as a result, provides the robustness for additive noise and speaker variability. However, in the popular hidden Markov model based framework, cepstral liftering has no effect in recognition performance. We derive a filtering method in log‐spectral domain corresponding to the cepstral liftering. The proposed method performs a high‐pass filtering based on the decorrelation of filter‐bank energies. We show that in noisy speech recognition, the proposed method reduces the error rate by 52.7% to conventional feature.  相似文献   

5.
在信噪比依赖的非均匀谱压缩(SNSC)鲁棒语音特征提取技术和VTS算法的基础上,该文提出了一种新的MC-SNSC模型补偿算法。SNSC技术是一种根据人类听觉对声音强度-响度感知转化关系的谱幅度变化操作和噪声抑制技术。基于对数谱域的噪声以及SNSC特征提取对语音信号特征所产生的失配函数,推导出了MC-SNSC模型补偿算法。实验证明使用这一新算法,识别率比当前较理想的VTS和PMC算法有很明显的提升,算法的复杂度较VTS等算法仅有轻微的增加。  相似文献   

6.
本文在丢失数据技术与声学后退技术的基础上,提出了一种基于模糊规则的鲁棒语音识别方法,首先根据先验知识或假定建立特征分量的可靠程度与其概率分布之间的模糊规则,识别时观察矢量的输出概率由一个基于规则的模糊逻辑系统来得到,并针对倒谱识别系统给出了一种具体的实现方法.实验结果表明,所提识别方法的性能显著优于丢失数据技术和声学后退技术.  相似文献   

7.
Speaker adaptation techniques are generally used to reduce speaker differences in speech recognition. In this work, we focus on the features fitted to a linear regression‐based speaker adaptation. These are obtained by feature transformation based on independent component analysis (ICA), and the feature transformation matrices are estimated from the training data and adaptation data. Since the adaptation data is not sufficient to reliably estimate the ICA‐based feature transformation matrix, it is necessary to adjust the ICA‐based feature transformation matrix estimated from a new speaker utterance. To cope with this problem, we propose a smoothing method through a linear interpolation between the speaker‐independent (SI) feature transformation matrix and the speaker‐dependent (SD) feature transformation matrix. From our experiments, we observed that the proposed method is more effective in the mismatched case. In the mismatched case, the adaptation performance is improved because the smoothed feature transformation matrix makes speaker adaptation using noisy speech more robust.  相似文献   

8.
通过对纯净语音及含噪语音短时谱的分析比较,提出了一种基于基音频率及其谐波结构的新的语音特征参数。实验表明,与传统的倒谱特征相比,新特征对加性白噪声相对较不敏感,在闭集文本无关说话人识别中,新特征可以在加性白高斯噪声环境下提高系统的说话人识别率。  相似文献   

9.
采用特征分类直方图均衡化的鲁棒性语音识别   总被引:1,自引:0,他引:1  
姜莹  俞一彪 《信号处理》2011,27(6):896-900
大部分噪声会引起语音倒谱域特征参数的非线性失真,导致识别系统性能下降。直方图均衡化方法是一种非线性补偿变换技术,较传统的基于线性变换技术的抗噪声方法进一步提高了系统的鲁棒性。但实际识别系统中,除了噪声引起语音特征的非线性失真外,还存在训练和测试数据的语音特征类分布不一致问题,从而难以保证传统的直方图均衡化方法发挥其优势。本文提出一种基于特征分类的直方图均衡化方法,首先对初步均衡化后的含噪语音特征矢量进行K均值分类,然后对各类别下的特征矢量再进行直方图均衡变换。实验结果表明,低信噪比时无论在平稳噪声还是非平稳噪声环境下,与传统的直方图均衡化方法相比都进一步增强了识别系统的鲁棒性。   相似文献   

10.
陈伟红 《现代电子技术》2006,29(14):44-45,48
研究了3种背景噪声下与说话人有关的孤立词语音识别方法。即语音前端声学处理法、正则相关分析的谱变换补偿方法和同模极点增加法。实验结果表明,这3种方法都有效地提高了噪声环境中语音识别率,其中较好的方法在强噪声环境中(信噪比为0 dB)的语音识别率达到80%以上,为信噪比较低的噪声环境中自动语音识别展现了美好前景。  相似文献   

11.
改进的直方图均衡化算法在图像增强中的应用   总被引:8,自引:0,他引:8  
在数字图像处理过程中,针对直方图均衡化算法增强图像,提出了一种改进的直方图均衡化算法,应用于图像增强。该算法结合全局直方图均衡化、局部直方图均衡化特性,利用增量式直方图均衡化,从处理方法上对直方图均衡化算法进行了改进。实验结果表明,文中提出的方法能更有效的增强图像,去除图像噪声,为后续有效的识别特征提供了基础,同时提高图像目标识别的效果。  相似文献   

12.
Cepstral mean and variance normalization (CMVN) is an efficient noise compensation technique popularly used in many speech applications. CMVN eliminates the mismatch between training and test utterances by transforming them to zero mean and unit variance. In this work, we argue that some amount of useful information is lost during normalization as every utterance is forced to have the same first- and second-order statistics, i.e., zero mean and unit variance. We propose to modify CMVN methodology to retain the useful information and yet compensate for noise. The proposed normalization approach transforms every test utterance to utterance-specific clean mean (i.e., utterance mean if the noise was absent) and clean variance, instead of zero mean and unit variance. We derive expressions to estimate the clean mean and variance from a noisy utterance. The proposed normalization is effective in the recognizing voice commands that are typically short (single words or short phrases), where more advanced methods [such as histogram equalization (HEQ)] are not effective. Recognition results show a relative improvement (RI) of \(21\,\%\) in word error rate over conventional CMVN on the Aurora-2 database and a RI of 20 and \(11\,\%\) over CMVN and HEQ on short utterances of the Aurora-2 database.  相似文献   

13.
Subband-based blind signal separation for noisy speech recognition   总被引:1,自引:0,他引:1  
A method for directly extracting clean speech features from noisy speech is proposed. This process is based on independent component analysis (ICA) and a new feature analysis technique for reducing the computational complexity of the frequency domain ICA. For noisy speech signals recorded in real environments, this method yielded a considerable performance improvement  相似文献   

14.
为了提高单通道语音分离性能,该文提出基于深度学习特征融合和联合约束的单通道语音分离方法。传统基于深度学习的分离算法的损失函数只考虑了预测值和真实值的误差,这使得分离后的语音与纯净语音之间误差较大。该文提出一种新的联合约束损失函数,该损失函数不仅约束了理想比值掩蔽的预测值和真实值的误差,还惩罚了相应幅度谱的误差。另外,为了充分利用多种特征的互补性,提出一种含特征融合层的卷积神经网络(CNN)结构。利用该CNN提取多通道输入特征的深度特征,并在融合层中将深度特征与声学特征融合用来训练分离模型。由于融合构成的特征含有丰富的语音信息,具有强的语音信号表征能力,使得分离模型预测的掩蔽更加准确。实验结果表明,从信号失真比(SDR)、主观语音质量评估(PESQ)和短时客观可懂度(STOI)3个方面评价,相比其他优秀的基于深度学习的语音分离方法,该方法能够更有效地分离目标语音。  相似文献   

15.
Breast cancer detection and segmentation of cytological images is the standard clinical practice for the diagnosis and prognosis of breast cancer. This paper presents a fully automated method for cell nuclei detection and segmentation in breast cytological images. The images are enhanced with histogram stretching and contrast-limited adaptive histogram equalization (CLAHE). The locations of the cell nuclei in the image are detected with circular Hough transform (CHT) and local maximum filtering. The elimination of false positive findings (noisy circles and blood cells) is achieved using Otsu’s thresholding method and fuzzy C-means clustering technique. The segmentation of the nuclei boundaries is accomplished with the application of the marker controlled watershed transform in the gradient image, using the nuclei markers extracted in the detection step. The proposed method is evaluated using 92 breast cytological images containing 11,502 cell nuclei. Experimental evidence shows that the proposed method has very effective results even in the case of images with high degree of blood cells, noisy circles.  相似文献   

16.
With the remarkable growth in rich media in recent years, people are increasingly exposed to visual information from the environment. Visual information continues to play a vital role in rich media because people's real interests lie in dynamic information. This paper proposes a novel discrete dynamic swarm optimization (DDSO) algorithm for video object tracking using invariant features. The proposed approach is designed to track objects more robustly than other traditional algorithms in terms of illumination changes, background noise, and occlusions. DDSO is integrated with a matching procedure to eliminate inappropriate feature points geographically. The proposed novel fitness function can aid in excluding the influence of some noisy mismatched feature points. The test results showed that our approach can overcome changes in illumination, background noise, and occlusions more effectively than other traditional methods, including color‐tracking and invariant feature‐tracking methods.  相似文献   

17.
视觉特征提取是听视觉语音识别研究的热点问题。文章引入了一种稳健的基于Visemic LDA的口形动态特征,这种特征充分考虑了发音时口形轮廓的变化及视觉Viseme划分。文章同时提出了一利利用语音识别结果进行LDA训练数据自动标注的方法。这种方法免去了繁重的人工标注工作,避免了标注错误。实验表明,将'VisemicLDA视觉特征引入到听视觉语音识别中,可以大大地提高噪声条件下语音识别系统的识别率;将这种视觉特征与多数据流HMM结合之后,在信噪比为10dB的强噪声情况下,识别率仍可以达到80%以上。  相似文献   

18.
This paper concerns a robust real‐time voice activity detection (VAD) approach which is easy to understand and implement. The proposed approach employs several short‐term speech/nonspeech discriminating features in a voting paradigm to achieve a reliable performance in different environments. This paper mainly focuses on the performance improvement of a recently proposed approach which uses spectral peak valley difference (SPVD) as a feature for silence detection. The main issue of this paper is to apply a set of features with SPVD to improve the VAD robustness. The proposed approach uses a weighted voting scheme in order to take the discriminative power of the employed feature set into account. The experiments show that the proposed approach is more robust than the baseline approach from different points of view, including channel distortion and threshold selection. The proposed approach is also compared with some other VAD techniques for better confirmation of its achievements. Using the proposed weighted voting approach, the average VAD performance is increased to 89.29% for 5 different noise types and 8 SNR levels. The resulting performance is 13.79% higher than the approach based only on SPVD and even 2.25% higher than the not‐weighted voting scheme.  相似文献   

19.
简志华  杨震 《信号处理》2007,23(3):383-387
本文提出了一种改进的倒谱域特征参数补偿算法GMCSM。根据语音信号的时变特性,GMCSM算法使用广义自回归条件异方差(Generalized Auto-Regressive Conditional Heteroscedasticity,GARCH)模型对语音信号的方差进行建模。实验数据表明,与常规倒谱相减法CSM和MEMCSM相比,GMCSM能够更有效地补偿因加性噪声引起的倒谱特征参数失真,减少识别的错误率,特别是在信噪比较低的情况下,GMCSM的性能更为显著。  相似文献   

20.
In this paper, a feature extraction (FE) method is proposed that is comparable to the traditional FE methods used in automatic speech recognition systems. Unlike the conventional spectral‐based FE methods, the proposed method evaluates the similarities between an embedded speech signal and a set of predefined speech attractor models in the reconstructed phase space (RPS) domain. In the first step, a set of Gaussian mixture models is trained to represent the speech attractors in the RPS. Next, for a new input speech frame, a posterior‐probability‐based feature vector is evaluated, which represents the similarity between the embedded frame and the learned speech attractors. We conduct experiments for a speech recognition task utilizing a toolkit based on hidden Markov models, over FARSDAT, a well‐known Persian speech corpus. Through the proposed FE method, we gain 3.11% absolute phoneme error rate improvement in comparison to the baseline system, which exploits the mel‐frequency cepstral coefficient FE method.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号