首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 187 毫秒
1.
语音中存在加性噪声降低了MFCC参数的鲁棒性,使得说话人确认系统性能下降。多窗谱MFCC引入了多窗谱估计技术在增强 MFCC 特征的噪声鲁棒性上取得了一定效果,但改善的程度有限。为了使 MFCC 参数对噪声具有更强的鲁棒性,提出了一种改进的多窗谱 MFCC 提取算法。改进算法在多窗谱 MFCC 的基础上引入谱减思想,谱减法(Spectral Subtraction, SS)能够增强语音并降低噪音的干扰。因此,采用了Multitaper+SS组合的改进算法融合了两者的优势,具备了更好的性能。仿真结果表明,当测试语音中含有加性噪声时,与多窗谱 MFCC提取算法相比,采用改进的多窗谱 MFCC 的说话人确认系统性能在等错误率 EER 和最小检测代价函数值minDCF两项评测指标上都取得了更好的结果。  相似文献   

2.

Speaker recognition revolution has lead to the inclusion of speaker recognition modules in several commercial products. Most published algorithms for speaker recognition focus on text-dependent speaker recognition. In contrast, text-independent speaker recognition is more advantageous as the client can talk freely to the system. In this paper, text-independent speaker recognition is considered in the presence of some degradation effects such as noise and reverberation. Mel-Frequency Cepstral Coefficients (MFCCs), spectrum and log-spectrum are used for feature extraction from the speech signals. These features are processed with the Long-Short Term Memory Recurrent Neural Network (LSTM-RNN) as a classification tool to complete the speaker recognition task. The network learns to recognize the speakers efficiently in a text-independent manner, when the recording circumstances are the same. The recognition rate reaches 95.33% using MFCCs, while it is increased to 98.7% when using spectrum or log-spectrum. However, the system has some challenges to recognize speakers from different recording environments. Hence, different speech enhancement techniques, such as spectral subtraction and wavelet denoising, are used to improve the recognition performance to some extent. The proposed approach shows superiority, when compared to the algorithm of R. Togneri and D. Pullella (2011).

  相似文献   

3.
Speaker verification techniques neglect the short-time variation in the feature space even though it contains speaker related attributes. We propose a simple method to capture and characterize this spectral variation through the eigenstructure of the sample covariance matrix. This covariance is computed using sliding window over spectral features. The newly formulated feature vectors representing local spectral variations are used with classical and state-of-the-art speaker recognition systems. Results on multiple speaker recognition evaluation corpora reveal that eigenvectors weighted with their normalized singular values are useful in representing local covariance information. We have also shown that local variability features can be extracted using mel frequency cepstral coefficients (MFCCs) as well as using three recently developed features: frequency domain linear prediction (FDLP), mean Hilbert envelope coefficients (MHECs) and power-normalized cepstral coefficients (PNCCs). Since information conveyed in the proposed feature is complementary to the standard short-term features, we apply different fusion techniques. We observe considerable relative improvements in speaker verification accuracy in combined mode on text-independent (NIST SRE) and text-dependent (RSR2015) speech corpora. We have obtained up to 12.28% relative improvement in speaker recognition accuracy on text-independent corpora. Conversely in experiments on text-dependent corpora, we have achieved up to 40% relative reduction in EER. To sum up, combining local covariance information with the traditional cepstral features holds promise as an additional speaker cue in both text-independent and text-dependent recognition.  相似文献   

4.
This paper describes a Speaker Verification System based on the use of multi resolution classifiers in order to cope with performance degradation due to natural variations of the excitation source and of the vocal tract. The different resolution representations of the speaker are obtained by considering multiple frame lengths in the feature extraction process and from these representations a single Pseudo‐Multi Parallel Branch (P‐MPB) Hidden Markov Model is obtained. In the verification process, different resolution representations of the speech signal are classified by multiple P‐MPB systems: the final decision is obtained by means of different combination techniques. The system based on the Weighted Majority Vote technique considerably outperforms baseline systems: improvements are between 15% and 38%. The execution time of the verification process is also evaluated and it proves to be very acceptable, thus allowing the use of the approach for applications in real time systems.  相似文献   

5.
Our initial speaker verification study exploring the impact of mismatch in training and test conditions finds that the mismatch in sensor and acoustic environment results in significant performance degradation compared to other mismatches like language and style (Haris et al. in Int. J. Speech Technol., 2012). In this work we present a method to suppress the mismatch between the training and test speech, specifically due to sensor and acoustic environment. The method is based on identifying and emphasizing more speaker specific and less mismatch affected vowel-like regions (VLRs) compared to the other speech regions. VLRs are separated from the speech regions (regions detected using voice activity detection (VAD)) using VLR onset point (VLROP) and are processed independently during training and testing of the speaker verification system. Finally, the scores are combined with more weight to that generated by VLRs as those are relatively more speaker specific and less mismatch affected. Speaker verification studies are conducted using the mel-frequency cepstral coefficients (MFCCs) as feature vectors. The speaker modeling is done using the Gaussian mixture model-universal background model and the state-of-the-art i-vector based approach. The experimental results show that for both the systems, proposed approach provides consistent performance improvement on the conversational approach with and without different channel compensation techniques. For instance, with IITG-MV Phase-II dataset for headphone trained and voice recorder test speech, the proposed approach provides a relative improvement of 25.08?% (in EER) for the i-vector based speaker verification systems with LDA and WCCN compared to conventional approach.  相似文献   

6.
噪声环境下的鲁棒性说话人识别   总被引:5,自引:0,他引:5  
在实际应用中,噪声或信道干扰导致说话人识别(SR)识别性能急剧下降。针对该问题,本文分析传统方法的优缺点并提出相应的系统解决方案:采用维纳滤波对语音信号进行前端处理;以MFCC声道特征结合基频(F0)韵律特征来提高识别系统的鲁棒性。实验结果表明:维纳滤波能有效地消除噪声影响;经维纳滤波处理后,使得F0-MFCC联合模型能更好的区分说话人。可以看出在噪声环境下系统的综合性能得到很大改善。  相似文献   

7.
陈迪  龚卫国  杨利平 《计算机应用》2007,27(5):1217-1219
提出了一种可用于改善说话人识别效果的基于基音周期的可变窗长语音MFCC参数提取方法。基本原理是将原始的语音分解为当前基音周期整数倍长度以内部分及其以外部分,并保留前者舍去后者,以减小训练语音与测试语音的频谱失真。通过文本无关的说话人确认实验,验证了该方法能有效提高说话人确认的识别率,并能提高短时语音的稳定性。  相似文献   

8.
The ETSI has recently published a front-end processing standard for distributed speech recognition systems. The key idea of the standard is to extract the spectral features of speech signals at the front-end terminals so that acoustic distortion caused by communication channels can be avoided. This paper investigates the effect of extracting spectral features from different stages of the front-end processing on the performance of distributed speaker verification systems. A technique that combines handset selectors with stochastic feature transformation is also employed in a back-end speaker verification system to reduce the acoustic mismatch between different handsets. Because the feature vectors obtained from the back-end server are vector quantized, the paper proposes two approaches to adding Gaussian noise to the quantized feature vectors for training the Gaussian mixture speaker models. In one approach, the variances of the Gaussian noise are made dependent on the codeword distance. In another approach, the variances are a function of the distance between some unquantized training vectors and their closest code vector. The HTIMIT corpus was used in the experiments and results based on 150 speakers show that stochastic feature transformation can be added to the back-end server for compensating transducer distortion. It is also found that better verification performance can be achieved when the LMS-based blind equalization in the standard is replaced by stochastic feature transformation.  相似文献   

9.
在特定人语音识别系统中,噪声严重影响语音特征提取,并导致语音识别率明显下降。针对在噪声环境下语音识别率偏低的问题,通过谱减法去除语音信号噪声,并根据语音信号语谱图可视化的特点,运用脉冲耦合神经网络从语音信号的语谱图中提取熵序列作为特征参数进行语音识别。实验结果表明,该方法能较好地去除语音信号中的噪声,并能使在噪声环境下的特定人语音识别系统具有较好的识别效果。  相似文献   

10.
针对基于高斯分布的谱减语音增强算法,增强语音出现噪声残留和语音失真的问题,提出了基于拉普拉斯分布的最小均方误差(MMSE)谱减算法。首先,对原始带噪语音信号进行分帧、加窗处理,并对处理后每帧的信号进行傅里叶变换,得到短时语音的离散傅里叶变换(DFT)系数;然后,通过计算每一帧的对数谱能量及谱平坦度,进行噪声帧检测,更新噪声估计;其次,基于语音DFT系数服从拉普拉斯分布的假设,在最小均方误差准则下,求解最佳谱减系数,使用该系数进行谱减,得到增强信号谱;最后,对增强信号谱进行傅里叶逆变换、组帧,得到增强语音。实验结果表明,使用所提算法增强的语音信噪比(SNR)平均提高了4.3 dB,与过减法相比,有2 dB的提升;在语音质量感知评估(PESQ)得分方面,与过减法相比,所提算法平均得分有10%的提高。该算法有更好的噪声抑制能力和较小的语音失真,在SNR和PESQ评价标准上有较大提升。  相似文献   

11.
基于波束形成的谱相减语音增强   总被引:1,自引:0,他引:1  
为降低谱相减算法产生的音乐噪声,并提高语音增强效果,本文在深入研究波束形成技术和谱相减算法的基础上,提出波束形成器后级联谱相减的语音增强处理方法,并分析、验证了这种结构的可行性。对真实环境下的语音和噪声数据的处理结果显示,谊级联结构的语音增强系统可显著降低背景噪声,语音失真小,并易于实时实现,信噪比增益达16.5dB。  相似文献   

12.
Extraction of robust features from noisy speech signals is one of the challenging problems in speaker recognition. As bispectrum and all higher order spectra for Gaussian process are identically zero, it removes the additive white Gaussian noise while preserving the magnitude and phase information of original signal. The spectrum of original signal can be recovered from its noisy version using this property. Robust Mel Frequency Cepstral Coefficients (MFCC) are extracted from the estimated spectral magnitude (denoted as Bispectral-MFCC (BMFCC)). The effectiveness of BMFCC has been tested on TIMIT and SGGS databases in noisy environment. The proposed BMFCC features yield 95.30?%, 97.26?% and 94.22?% speaker recognition rate on TIMIT, SGGS and SGGS2 databases, respectively for 20 dB SNR whereas these values for 0 dB SNR are 45.84?%, 50.79?% and 44.98?%. The experimental results show the superiority of the proposed technique compared to conventional methods for all databases.  相似文献   

13.
基于语音存在概率和听觉掩蔽特性的语音增强算法   总被引:1,自引:0,他引:1  
宫云梅  赵晓群  史仍辉 《计算机应用》2008,28(11):2981-2983
低信噪比下,谱减语音增强法中一直存在的去噪度、残留的音乐噪声和语音畸变度三者间均衡这一关键问题显得尤为突出。为降低噪声对语音通信的干扰,提出了一种适于低信噪比下的语音增强算法。在传统的谱减法基础上,根据噪声的听觉掩蔽阈值自适应调整减参数,利用语音存在概率,对语音、噪声信号估计,避免低信噪比下端点检测(VAD)的不准确,有更强的鲁棒性。对算法进行了客观和主观测试,结果表明:相对于传统的谱减法,在几乎不损伤语音清晰度的前提下该算法能更好地抑制残留噪声和背景噪声,特别是对低信噪比和非平稳噪声干扰的语音信号,效果更加明显。  相似文献   

14.
说话人识别中MFCC参数提取的改进   总被引:1,自引:0,他引:1  
在说话人识别方面,最常用到的语音特征就是梅尔倒频谱系数(MFCC)。提出了一种改进的提取MFCC参数的方法,对传统的提取MFCC过程中计算FFT这一步骤进行频谱重构,对频谱进行噪声补偿重建,使之具有很好的抗噪性,逼近纯净语音的频谱。实验表明基于此改进提取的MFCC参数,可以明显提高说话人识别系统的识别率,尤其在低信噪比的环境下,效果明显。  相似文献   

15.
We are presenting a new method that improves the accuracy of text dependent speaker verification systems. The new method exploits a set of novel speech features derived from a principal component analysis of pitch synchronous voiced speech segments. We use the term principal pitch components (PPCs) or optimal pitch bases (OPBs) to denote the new feature set. Utterance distances computed from these new PPC features are only loosely correlated with utterance distances computed from cepstral features. A distance measure that combines both cepstral and PPC features provides a discriminative power that cannot be achieved with cepstral features alone. By augmenting the feature space of a cepstral baseline system with PPC features we achieve a significant reduction of the equal error probability of incorrect customer rejection versus incorrect impostor acceptance. The proposed method delivers robust performance in various noise conditions.  相似文献   

16.
针对说话人识别易受环境噪声影响的问题,借鉴生物听皮层神经元频谱-时间感受野(STRF)的时空滤波机制,提出一种新的声纹特征提取方法。在该方法中,对基于STRF获得的听觉尺度-速率图进行了二次特征提取,并与传统梅尔倒谱系数(MFCC)进行组合,获得了对环境噪声具有强容忍的声纹特征。采用支持向量机(SVM)作为分类器,对不同信噪比(SNR)语音数据进行测试的结果表明,基于STRF的特征对噪声的鲁棒性普遍高于MFCC系数,但识别正确率较低;组合特征提升了语音识别的正确率,同时对环境噪声具有良好的鲁棒性。该结果说明所提方法在强噪声环境下说话人识别上是有效的。  相似文献   

17.
基于基音周期与清浊音信息的梅尔倒谱参数   总被引:1,自引:0,他引:1  
提出一种在浊音部分不固定帧长的梅尔倒谱参数(Mel-cepstrum)提取的方法。针对浊音和清音所包含信息量不同,对浊音进行双倍的加权,从而将基音与清浊音信息融合进梅尔倒谱参数。将这种动态的梅尔倒谱参数应用在说话人确认中,在混合高斯模型(Gaussian mixture models,GMM)的情况下,取得了比常用的梅尔刻度式倒频谱参数(Mel-frequency cepstral coefficient,MFCC)更高的识别率,在NIST 2002年测试数据库中,512个混合高斯下能够将等错误率(EER)由9.4%降低到8.3%,2 048个混合高斯下能够将等错误率由7.8%降低到6.9%。  相似文献   

18.
The cascading appearance-based (CAB) feature extraction technique has established itself as the state-of-the-art in extracting dynamic visual speech features for speech recognition. In this paper, we will focus on investigating the effectiveness of this technique for the related speaker verification application. By investigating the speaker verification ability of each stage of the cascade we will demonstrate that the same steps taken to reduce static speaker and environmental information for the visual speech recognition application also provide similar improvements for visual speaker recognition. A further study is conducted comparing synchronous HMM (SHMM) based fusion of CAB visual features and traditional perceptual linear predictive (PLP) acoustic features to show that higher complexity inherit in the SHMM approach does not appear to provide any improvement in the final audio–visual speaker verification system over simpler utterance level score fusion.  相似文献   

19.
基于GMM统计特性参数和SVM的话者确认   总被引:1,自引:0,他引:1  
针对与文本无关的话者确认中大量训练样本数据的情况,本文提出了一种基于GMM统计特性参数和支持向量机的与文本无关的话者确认系统,以说话人的GMM统计特性参数作为特征参数训练建立目标话者的SVM模型,既有效地提取了话者特征信息,解决了大样本数据下的SVM训练问题,又结合了统计模型鲁棒性好和辨别模型分辨力好的优点,提高了确认系统的确认性能及鲁棒性。对微软麦克风语音数据库和NIST’01手机电话语音数据库的实验表明该方法的有效性。  相似文献   

20.
Humans are quite adept at communicating in presence of noise. However most speech processing systems, like automatic speech and speaker recognition systems, suffer from a significant drop in performance when speech signals are corrupted with unseen background distortions. The proposed work explores the use of a biologically-motivated multi-resolution spectral analysis for speech representation. This approach focuses on the information-rich spectral attributes of speech and presents an intricate yet computationally-efficient analysis of the speech signal by careful choice of model parameters. Further, the approach takes advantage of an information-theoretic analysis of the message and speaker dominant regions in the speech signal, and defines feature representations to address two diverse tasks such as speech and speaker recognition. The proposed analysis surpasses the standard Mel-Frequency Cepstral Coefficients (MFCC), and its enhanced variants (via mean subtraction, variance normalization and time sequence filtering) and yields significant improvements over a state-of-the-art noise robust feature scheme, on both speech and speaker recognition tasks.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号