共查询到20条相似文献,搜索用时 15 毫秒
1.
A new class‐based histogram equalization method is proposed for robust speech recognition. The proposed method aims at not only compensating the acoustic mismatch between training and test environments, but also at reducing the discrepancy between the phonetic distributions of training and test speech data. The algorithm utilizes multiple class‐specific reference and test cumulative distribution functions, classifies the noisy test features into their corresponding classes, and equalizes the features by using their corresponding class‐specific reference and test distributions. Experiments on the Aurora 2 database proved the effectiveness of the proposed method by reducing relative errors by 18.74%, 17.52%, and 23.45% over the conventional histogram equalization method and by 59.43%, 66.00%, and 50.50% over mel‐cepstral‐based features for test sets A, B, and C, respectively. 相似文献
2.
We consider the feature recombination technique in a multiband approach to speaker identification and verification. To overcome the ineffectiveness of conventional feature recombination in broadband noisy environments, we propose a new subband feature recombination which uses subband likelihoods and a subband reliable‐feature selection technique with an adaptive noise model. In the decision step of speaker recognition, a few very low unreliable feature likelihood scores can cause a speaker recognition system to make an incorrect decision. To overcome this problem, reliable‐feature selection adjusts the likelihood scores of an unreliable feature by comparison with those of an adaptive noise model, which is estimated by the maximum a posteriori adaptation technique using noise features directly obtained from noisy test speech. To evaluate the effectiveness of the proposed methods in noisy environments, we use the TIMIT database and the NTIMIT database, which is the corresponding telephone version of TIMIT database. The proposed subband feature recombination with subband reliable‐feature selection achieves better performance than the conventional feature recombination system with reliable‐feature selection. 相似文献
3.
4.
Ho‐Young Jung 《ETRI Journal》2004,26(3):273-276
We propose a novel feature processing technique which can provide a cepstral liftering effect in the log‐spectral domain. Cepstral liftering aims at the equalization of variance of cepstral coefficients for the distance‐based speech recognizer, and as a result, provides the robustness for additive noise and speaker variability. However, in the popular hidden Markov model based framework, cepstral liftering has no effect in recognition performance. We derive a filtering method in log‐spectral domain corresponding to the cepstral liftering. The proposed method performs a high‐pass filtering based on the decorrelation of filter‐bank energies. We show that in noisy speech recognition, the proposed method reduces the error rate by 52.7% to conventional feature. 相似文献
5.
6.
7.
Speaker adaptation techniques are generally used to reduce speaker differences in speech recognition. In this work, we focus on the features fitted to a linear regression‐based speaker adaptation. These are obtained by feature transformation based on independent component analysis (ICA), and the feature transformation matrices are estimated from the training data and adaptation data. Since the adaptation data is not sufficient to reliably estimate the ICA‐based feature transformation matrix, it is necessary to adjust the ICA‐based feature transformation matrix estimated from a new speaker utterance. To cope with this problem, we propose a smoothing method through a linear interpolation between the speaker‐independent (SI) feature transformation matrix and the speaker‐dependent (SD) feature transformation matrix. From our experiments, we observed that the proposed method is more effective in the mismatched case. In the mismatched case, the adaptation performance is improved because the smoothed feature transformation matrix makes speaker adaptation using noisy speech more robust. 相似文献
8.
9.
采用特征分类直方图均衡化的鲁棒性语音识别 总被引:1,自引:0,他引:1
大部分噪声会引起语音倒谱域特征参数的非线性失真,导致识别系统性能下降。直方图均衡化方法是一种非线性补偿变换技术,较传统的基于线性变换技术的抗噪声方法进一步提高了系统的鲁棒性。但实际识别系统中,除了噪声引起语音特征的非线性失真外,还存在训练和测试数据的语音特征类分布不一致问题,从而难以保证传统的直方图均衡化方法发挥其优势。本文提出一种基于特征分类的直方图均衡化方法,首先对初步均衡化后的含噪语音特征矢量进行K均值分类,然后对各类别下的特征矢量再进行直方图均衡变换。实验结果表明,低信噪比时无论在平稳噪声还是非平稳噪声环境下,与传统的直方图均衡化方法相比都进一步增强了识别系统的鲁棒性。 相似文献
10.
研究了3种背景噪声下与说话人有关的孤立词语音识别方法。即语音前端声学处理法、正则相关分析的谱变换补偿方法和同模极点增加法。实验结果表明,这3种方法都有效地提高了噪声环境中语音识别率,其中较好的方法在强噪声环境中(信噪比为0 dB)的语音识别率达到80%以上,为信噪比较低的噪声环境中自动语音识别展现了美好前景。 相似文献
11.
改进的直方图均衡化算法在图像增强中的应用 总被引:8,自引:0,他引:8
在数字图像处理过程中,针对直方图均衡化算法增强图像,提出了一种改进的直方图均衡化算法,应用于图像增强。该算法结合全局直方图均衡化、局部直方图均衡化特性,利用增量式直方图均衡化,从处理方法上对直方图均衡化算法进行了改进。实验结果表明,文中提出的方法能更有效的增强图像,去除图像噪声,为后续有效的识别特征提供了基础,同时提高图像目标识别的效果。 相似文献
12.
Vikas Joshi N. Vishnu Prasad S. Umesh 《Circuits, Systems, and Signal Processing》2016,35(5):1593-1609
Cepstral mean and variance normalization (CMVN) is an efficient noise compensation technique popularly used in many speech applications. CMVN eliminates the mismatch between training and test utterances by transforming them to zero mean and unit variance. In this work, we argue that some amount of useful information is lost during normalization as every utterance is forced to have the same first- and second-order statistics, i.e., zero mean and unit variance. We propose to modify CMVN methodology to retain the useful information and yet compensate for noise. The proposed normalization approach transforms every test utterance to utterance-specific clean mean (i.e., utterance mean if the noise was absent) and clean variance, instead of zero mean and unit variance. We derive expressions to estimate the clean mean and variance from a noisy utterance. The proposed normalization is effective in the recognizing voice commands that are typically short (single words or short phrases), where more advanced methods [such as histogram equalization (HEQ)] are not effective. Recognition results show a relative improvement (RI) of \(21\,\%\) in word error rate over conventional CMVN on the Aurora-2 database and a RI of 20 and \(11\,\%\) over CMVN and HEQ on short utterances of the Aurora-2 database. 相似文献
13.
Hyung-Min Park Ho-Young Jung Te-Won Lee Soo-Young Lee 《Electronics letters》1999,35(23):2011-2012
A method for directly extracting clean speech features from noisy speech is proposed. This process is based on independent component analysis (ICA) and a new feature analysis technique for reducing the computational complexity of the frequency domain ICA. For noisy speech signals recorded in real environments, this method yielded a considerable performance improvement 相似文献
14.
为了提高单通道语音分离性能,该文提出基于深度学习特征融合和联合约束的单通道语音分离方法。传统基于深度学习的分离算法的损失函数只考虑了预测值和真实值的误差,这使得分离后的语音与纯净语音之间误差较大。该文提出一种新的联合约束损失函数,该损失函数不仅约束了理想比值掩蔽的预测值和真实值的误差,还惩罚了相应幅度谱的误差。另外,为了充分利用多种特征的互补性,提出一种含特征融合层的卷积神经网络(CNN)结构。利用该CNN提取多通道输入特征的深度特征,并在融合层中将深度特征与声学特征融合用来训练分离模型。由于融合构成的特征含有丰富的语音信息,具有强的语音信号表征能力,使得分离模型预测的掩蔽更加准确。实验结果表明,从信号失真比(SDR)、主观语音质量评估(PESQ)和短时客观可懂度(STOI)3个方面评价,相比其他优秀的基于深度学习的语音分离方法,该方法能够更有效地分离目标语音。 相似文献
15.
Yasmeen M. George Bassant M. Bagoury Hala H. Zayed Mohamed I. Roushdy 《Signal processing》2013,93(10):2804-2816
Breast cancer detection and segmentation of cytological images is the standard clinical practice for the diagnosis and prognosis of breast cancer. This paper presents a fully automated method for cell nuclei detection and segmentation in breast cytological images. The images are enhanced with histogram stretching and contrast-limited adaptive histogram equalization (CLAHE). The locations of the cell nuclei in the image are detected with circular Hough transform (CHT) and local maximum filtering. The elimination of false positive findings (noisy circles and blood cells) is achieved using Otsu’s thresholding method and fuzzy C-means clustering technique. The segmentation of the nuclei boundaries is accomplished with the application of the marker controlled watershed transform in the gradient image, using the nuclei markers extracted in the detection step. The proposed method is evaluated using 92 breast cytological images containing 11,502 cell nuclei. Experimental evidence shows that the proposed method has very effective results even in the case of images with high degree of blood cells, noisy circles. 相似文献
16.
Kyuchang Kang Changseok Bae Jinyoung Moon Jongyoul Park Yuk Ying Chung Feng Sha Ximeng Zhao 《ETRI Journal》2017,39(2):151-162
With the remarkable growth in rich media in recent years, people are increasingly exposed to visual information from the environment. Visual information continues to play a vital role in rich media because people's real interests lie in dynamic information. This paper proposes a novel discrete dynamic swarm optimization (DDSO) algorithm for video object tracking using invariant features. The proposed approach is designed to track objects more robustly than other traditional algorithms in terms of illumination changes, background noise, and occlusions. DDSO is integrated with a matching procedure to eliminate inappropriate feature points geographically. The proposed novel fitness function can aid in excluding the influence of some noisy mismatched feature points. The test results showed that our approach can overcome changes in illumination, background noise, and occlusions more effectively than other traditional methods, including color‐tracking and invariant feature‐tracking methods. 相似文献
17.
一种稳健的基于Visemic LDA的口形动态特征及听视觉语音识别 总被引:4,自引:0,他引:4
视觉特征提取是听视觉语音识别研究的热点问题。文章引入了一种稳健的基于Visemic LDA的口形动态特征,这种特征充分考虑了发音时口形轮廓的变化及视觉Viseme划分。文章同时提出了一利利用语音识别结果进行LDA训练数据自动标注的方法。这种方法免去了繁重的人工标注工作,避免了标注错误。实验表明,将'VisemicLDA视觉特征引入到听视觉语音识别中,可以大大地提高噪声条件下语音识别系统的识别率;将这种视觉特征与多数据流HMM结合之后,在信噪比为10dB的强噪声情况下,识别率仍可以达到80%以上。 相似文献
18.
This paper concerns a robust real‐time voice activity detection (VAD) approach which is easy to understand and implement. The proposed approach employs several short‐term speech/nonspeech discriminating features in a voting paradigm to achieve a reliable performance in different environments. This paper mainly focuses on the performance improvement of a recently proposed approach which uses spectral peak valley difference (SPVD) as a feature for silence detection. The main issue of this paper is to apply a set of features with SPVD to improve the VAD robustness. The proposed approach uses a weighted voting scheme in order to take the discriminative power of the employed feature set into account. The experiments show that the proposed approach is more robust than the baseline approach from different points of view, including channel distortion and threshold selection. The proposed approach is also compared with some other VAD techniques for better confirmation of its achievements. Using the proposed weighted voting approach, the average VAD performance is increased to 89.29% for 5 different noise types and 8 SNR levels. The resulting performance is 13.79% higher than the approach based only on SPVD and even 2.25% higher than the not‐weighted voting scheme. 相似文献
19.
本文提出了一种改进的倒谱域特征参数补偿算法GMCSM。根据语音信号的时变特性,GMCSM算法使用广义自回归条件异方差(Generalized Auto-Regressive Conditional Heteroscedasticity,GARCH)模型对语音信号的方差进行建模。实验数据表明,与常规倒谱相减法CSM和MEMCSM相比,GMCSM能够更有效地补偿因加性噪声引起的倒谱特征参数失真,减少识别的错误率,特别是在信噪比较低的情况下,GMCSM的性能更为显著。 相似文献
20.
In this paper, a feature extraction (FE) method is proposed that is comparable to the traditional FE methods used in automatic speech recognition systems. Unlike the conventional spectral‐based FE methods, the proposed method evaluates the similarities between an embedded speech signal and a set of predefined speech attractor models in the reconstructed phase space (RPS) domain. In the first step, a set of Gaussian mixture models is trained to represent the speech attractors in the RPS. Next, for a new input speech frame, a posterior‐probability‐based feature vector is evaluated, which represents the similarity between the embedded frame and the learned speech attractors. We conduct experiments for a speech recognition task utilizing a toolkit based on hidden Markov models, over FARSDAT, a well‐known Persian speech corpus. Through the proposed FE method, we gain 3.11% absolute phoneme error rate improvement in comparison to the baseline system, which exploits the mel‐frequency cepstral coefficient FE method. 相似文献