首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
噪声下差分复合子带语音识别方法   总被引:4,自引:0,他引:4  
蒋文建  韦岗 《通信学报》2002,23(1):18-24
本文根据子带特征反映语音信号局部特性和全带特征反映语音信号整体特性的事实,提出了 一种差分复合子带语音识别新方法。先用频谱差分减少噪声的干扰,再将多子带特征识别概率与全带特征识别概率相结合进行综合判决,以得到最终识别结果。将新方法应用于TIMIT数据包0-9十个英文数字和E-Set在NoiseX92的白噪声和F16战机噪声下的识别实验。实验结果表明新方法比传统方法识别性能有很大提高。  相似文献   

2.
In this letter, we propose a new histogram equalization technique for feature compensation in speech recognition under noisy environments. The proposed approach combines a signal‐to‐noise‐ratio–dependent feature reconstruction method and the class histogram equalization technique to effectively reduce the acoustic mismatch present in noisy speech features. Experimental results from the Aurora 2 task confirm the superiority of the proposed approach for acoustic feature compensation.  相似文献   

3.
We propose a novel feature processing technique which can provide a cepstral liftering effect in the log‐spectral domain. Cepstral liftering aims at the equalization of variance of cepstral coefficients for the distance‐based speech recognizer, and as a result, provides the robustness for additive noise and speaker variability. However, in the popular hidden Markov model based framework, cepstral liftering has no effect in recognition performance. We derive a filtering method in log‐spectral domain corresponding to the cepstral liftering. The proposed method performs a high‐pass filtering based on the decorrelation of filter‐bank energies. We show that in noisy speech recognition, the proposed method reduces the error rate by 52.7% to conventional feature.  相似文献   

4.
基于压缩感知的稳健性说话人识别   总被引:1,自引:1,他引:0  
单进  芮贤义 《电声技术》2011,35(2):61-63
阐述了在噪声条件下,将基于压缩感知理论的丢失数据重建技术应用于说话人识别系统的系统前端.首先使用Mel滤波器组将带噪语音信号转换成Mel频谱,然后利用带噪MeI谱中可靠数据重建不可靠数据,最后从重建的Mel频谱中提取Mel倒谱特征参数用于说话人识别.稳健性实验结果表明,该方法能够提高在噪声环境下说话人系统的识别率.  相似文献   

5.
李昕  郑宇  费敏锐 《信号处理》2003,19(3):256-261
话者识别系统的性能在实际环境中往往会有很大程度的降低。本文中提出了一种新的基于EBF神经网络的特征映射器,试图克服上述问题。本文通过训练EBF神经网络来构建一个映射器,以失真的语音特征和未失真的语音特征分别作为其输入和相应的理想输出。也就是说,网络将在以失真倒频谱为输入的情况下,给出未失真的倒频谱。在特征恢复阶段,将失真的语音特征通过该特征映射器即可复原成未失真语音特征。这些复原后的语音特征就可以作为未失真语音来对话者模型进行测试。本文通过包含有258个话者的TIMIT和NTIMIT语音集对上述思路进行了试验,实验表明该特征映射器可以显著地改善识别性能。  相似文献   

6.
This paper concerns robust and reliable speaker model training for text‐independent speaker verification. The baseline speaker modeling approach is the Gaussian mixture model (GMM). In text‐independent speaker verification, the amount of speech data may be different for speakers. However, we still wish the modeling approach to perform equally well for all speakers. Besides, the modeling technique must be least vulnerable against unseen data. A traditional approach for GMM training is expectation maximization (EM) method, which is known for its overfitting problem and its weakness in handling insufficient training data. To tackle these problems, variational approximation is proposed. Variational approaches are known to be robust against overtraining and data insufficiency. We evaluated the proposed approach on two different databases, namely KING and TFarsdat. The experiments show that the proposed approach improves the performance on TFarsdat and KING databases by 0.56% and 4.81%, respectively. Also, the experiments show that the variationally optimized GMM is more robust against noise and the verification error rate in noisy environments for TFarsdat dataset decreases by 1.52%.  相似文献   

7.
基于高斯混合模型(GMM)的说话人识别方法通常采用对数似然得分作为测试时判定目标说话人的依据。文章在分析对数似然得分特点的基础上,提出了一种改进方法,提高了测试语音帧对于目标模型和非目标模型得分的相对差值。基于TIMIT数据库的实验证明了采用变换后似然得分的说话人识别系统比采用对数似然得分的系统具有更好的识别性能和抗噪声性能。  相似文献   

8.
Speaker adaptation techniques are generally used to reduce speaker differences in speech recognition. In this work, we focus on the features fitted to a linear regression‐based speaker adaptation. These are obtained by feature transformation based on independent component analysis (ICA), and the feature transformation matrices are estimated from the training data and adaptation data. Since the adaptation data is not sufficient to reliably estimate the ICA‐based feature transformation matrix, it is necessary to adjust the ICA‐based feature transformation matrix estimated from a new speaker utterance. To cope with this problem, we propose a smoothing method through a linear interpolation between the speaker‐independent (SI) feature transformation matrix and the speaker‐dependent (SD) feature transformation matrix. From our experiments, we observed that the proposed method is more effective in the mismatched case. In the mismatched case, the adaptation performance is improved because the smoothed feature transformation matrix makes speaker adaptation using noisy speech more robust.  相似文献   

9.
基于加权特征值补偿的说话人识别   总被引:3,自引:0,他引:3  
于鹏  徐义芳  曹志刚 《信号处理》2002,18(6):513-517
背景噪声的存在,使得说话人识别系统的训练环境和测试环境发生失配,导致系统性能发生急剧下降。本论文提出一种加权特征值补偿算法,把由噪声引起的使带噪语音信号特征值与纯净语音特征值发生偏差的部分去除,从而使进入识别器的特征值接近纯净语音的特征值。在特征值补偿过程中引入了信噪比加权的方法。实验表明,这种方法能够有效的提高说话人识别系统的性能。  相似文献   

10.
Currently, many speaker recognition applications must handle speech corrupted by environmental additive noise without having a priori knowledge about the characteristics of noise. Some previous works in speaker recognition have used the missing feature (MF) approach to compensate for noise. In most of those applications, the spectral reliability decision step is performed using the signal to noise ratio (SNR) criterion, which attempts to directly measure the relative signal to noise energy at each frequency. An alternative approach to spectral data reliability has been used with some success in the MF approach to speech recognition. Here, we compare the use of this new criterion with the SNR criterion for MF mask estimation in speaker recognition. The new reliability decision is based on the extraction and analysis of several spectro-temporal features from across the entire speech frame, but not across the time, which highlight the differences between spectral regions dominated by speech and by noise. We call it the feature classification (FC) criterion. It uses several spectral features to establish spectrogram reliability unlike SNR criterion that relies only in one feature: SNR. We evaluated our proposal through speaker verification experiments, in Ahumada speech database corrupted by different types of noise at various SNR levels. Experiments demonstrated that the FC criterion achieves considerably better recognition accuracy than the SNR criterion in the speaker verification tasks tested.  相似文献   

11.
基于鲁棒听觉特征的说话人识别   总被引:3,自引:0,他引:3  
林琳  陈虹  陈建 《电子学报》2013,41(3):619-624
 为了提高噪声环境中说话人识别系统的性能,本文提出了一种鲁棒听觉特征提取的算法,并将其应用到说话人识别系统中.运用自适应压缩Gammachirp滤波器组模拟人耳耳蜗的听觉特性,对输入的语音信号进行频域子带滤波,将得到的对数子带能量作为听觉特征参数.分别运用离散余弦变换和核主成分分析方法,对提取的特征参数进行特征变换,降低特征参数的维数,提高特征参数的噪声鲁棒性和个性表现力.实验结果表明,将提取的新听觉特征参数应用到说话人识别系统中,新特征参数在鲁棒性和识别性能上均优于梅尔倒谱系数和基于Gammatone的听觉特征参数.  相似文献   

12.
全刚  肖熙 《电声技术》2010,34(6):45-47
数字语音识别具有很高的识别率,具有较高的实用价值。为实现在真实噪声环境下能达到高识别率的数字语音识别系统,采用基于段长分布的隐马尔可夫模型(DDBHMM)进行了安静环境和带噪环境下,特定人和非特定人的数字语音识别试验。试验结果表明,基于DDBHMM模型的数字语音识别技术对真实非平稳噪声环境下录制的特定人和非特定人语音都具有较高识别率。  相似文献   

13.
Long‐term electroencephalography (EEG) monitoring is time‐consuming, and requires experts to interpret EEG signals to detect seizures in patients. In this paper, we propose a novel automated method called adaptive slope of wavelet coefficient counts over various thresholds (ASCOT) to classify patient episodes as seizure waveforms. ASCOT involves extracting the feature matrix by calculating the mean slope of wavelet coefficient counts over various thresholds in each frequency subband. We validated our method using our own database and a public database to avoid overtuning. The experimental results show that the proposed method achieved a reliable and promising accuracy in both our own database (98.93%) and the public database (99.78%). Finally, we evaluated the performance of the method considering various window sizes. In conclusion, the proposed method achieved a reliable seizure detection performance with a short‐term window size. Therefore, our method can be utilized to interpret long‐term EEG results and detect momentary seizure waveforms in diagnostic systems.  相似文献   

14.
A pattern recognition approach is proposed for tone detection. Three basic tone features are extracted from the signal in the form of power, mean frequency, and spectral concentration. These three features are calculated for each signal sample taken during the decision interval and are represented by points in a three dimensional space.The actual tone detection function is then performed by partitioning the feature space in two decision volumes corresponding to the two alternatives (tone present and absent respectively) and by identifying the presence of associated clusters. A reject option is available when the decision volumes are not complementary, and allows the system to be insensitive to very noisy samples (e.g. impulsive noise).A non-linear classification method is presented which provides adaptive and robust detection in presence of non gaussian noise. Moreover global performance may be optimized on-line for unknown or time varying environments.Hardware and Software simulation results are presented and show good performance in presence of impulsive and interference noise.  相似文献   

15.
An effective and robust speech feature extraction method is presented. Based on the time-frequency multiresolution property of the wavelet transform, the input speech signal is decomposed into various frequency channels. For capturing the characteristics of an individual speaker, the linear predictive cepstral coefficients of the approximation channel and entropy value of the detail channel for each decomposition process are calculated. In addition, an adaptive thresholding technique for each lower resolution is also applied to remove the influence of noise interference. Experimental results show that using this mechanism not only effectively reduces the influence of noise interference but also improves the recognition performance. Finally, the proposed method is evaluated on the MAT telephone speech database for text-independent speaker identification using the group vector quantisation identifier. Some popular existing methods are also evaluated for comparison, and the results show that the proposed feature extraction algorithm is more effective and robust than the other existing methods. In addition, the performance of the proposed method is very satisfactory even in a low SNR environment corrupted by Gaussian white noise.  相似文献   

16.
In this paper, we propose a rank‐weighted reconstruction feature to improve the robustness of a feed‐forward deep neural network (FFDNN)‐based acoustic model. In the FFDNN‐based acoustic model, an input feature is constructed by vectorizing a submatrix that is created by slicing the feature vectors of frames within a context window. In this type of feature construction, the appropriate context window size is important because it determines the amount of trivial or discriminative information, such as redundancy, or temporal context of the input features. However, we ascertained whether a single parameter is sufficiently able to control the quantity of information. Therefore, we investigated the input feature construction from the perspectives of rank and nullity, and proposed a rank‐weighted reconstruction feature herein, that allows for the retention of speech information components and the reduction in trivial components. The proposed method was evaluated in the TIMIT phone recognition and Wall Street Journal (WSJ) domains. The proposed method reduced the phone error rate of the TIMIT domain from 18.4% to 18.0%, and the word error rate of the WSJ domain from 4.70% to 4.43%.  相似文献   

17.
A new class‐based histogram equalization method is proposed for robust speech recognition. The proposed method aims at not only compensating the acoustic mismatch between training and test environments, but also at reducing the discrepancy between the phonetic distributions of training and test speech data. The algorithm utilizes multiple class‐specific reference and test cumulative distribution functions, classifies the noisy test features into their corresponding classes, and equalizes the features by using their corresponding class‐specific reference and test distributions. Experiments on the Aurora 2 database proved the effectiveness of the proposed method by reducing relative errors by 18.74%, 17.52%, and 23.45% over the conventional histogram equalization method and by 59.43%, 66.00%, and 50.50% over mel‐cepstral‐based features for test sets A, B, and C, respectively.  相似文献   

18.
通过对纯净语音及含噪语音短时谱的分析比较,提出了一种基于基音频率及其谐波结构的新的语音特征参数。实验表明,与传统的倒谱特征相比,新特征对加性白噪声相对较不敏感,在闭集文本无关说话人识别中,新特征可以在加性白高斯噪声环境下提高系统的说话人识别率。  相似文献   

19.
In this paper, a new robust auto‐adaptive approach for pseudo‐noise (PN) code acquisition is proposed. It is applied to the generalized multi‐carrier direct‐sequence code‐division multiple‐access (MC DS‐CDMA) systems communicating over frequency‐selective multipath Rayleigh fading channels. This new approach is based on the constant false alarm rate (CFAR) detection algorithm, referred here as automatic selection partial sum ordered statistics (ASPSOS)‐CFAR. The proposed approach does not require any prior information about the background environment and uses maximum likelihood estimation (MLE) method to detect the interfering signals group in the ranked cells for the full reference window. Once this group is identified and censored, the remaining smaller ranked cells are combined to form an estimate of the background noise level to compute the adaptive threshold. Through simulations, the performance of the proposed detector is analyzed and compared with traditional CFAR detectors based on fixed or automatic censoring algorithms. The obtained results show that the proposed detector eliminates the drawbacks of the previously related detectors and offers a robust detection performance to enhance the acquisition process in heterogeneous background environments.  相似文献   

20.
Recently, several noise‐robust adaptive multichannel LMS algorithms have been proposed based on the spectral flatness of the estimated channel coefficients in the presence of additive noise. In this work, we propose a general form for the algorithms that integrates the existing algorithms into a common framework. Computer simulation results are presented and demonstrate that a new proposed algorithm gives better performance compared to existing algorithms in noisy environments.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号