Investigating Speaker Verification in real-world noisy environments, a novel feature extraction process suitable for suppression of time-varying noise is compared with a fine-tuned spectral subtraction method. The proposed feature extraction process is based on approximating the clean speech and the noise spectral magnitude with a mixture of Gaussian probability density functions (pdfs) by using the Expectation-Maximization algorithm (EM). Subsequently, the Bayesian inference framework is applied to the degraded spectral coefficients, and by employing Minimum Mean Square Error Estimation (MMSE), a closed form solution for the spectral magnitude estimation task is derived. The estimated spectral magnitude finally is incorporated into the Mel-Frequency Cepstral Coefficients (MFCCs) front-end of a baseline text-independent speaker verification system, based on Probabilistic Neural Networks, which participated successfully in the 2002 NIST (National Institute of Standards and Technology of USA) Speaker Recognition Evaluation. A comparative study of the proposed technique for real-world noise types demonstrates a significant performance gain compared to the baseline speech features and to the spectral subtraction enhancement method. Improvements of the absolute speaker verification performance with more than 27% for 0 dB signal-to-noise ratio (SNR), compared to the MFCCs, and with more than 13% for –5 dB SNR, compared to the spectral subtraction version, were obtained in the case of a passing-by aircraft scenario.  相似文献   

在说话人确认系统中,训练和测试的声学环境不匹配将造成性能急剧下降。本文提出了从特征规整和评分规整两个方面进行补偿的方法。首先,改进了基于分段的倒谱均值方差规整(SCMVN)方法,将倒谱系数都规整到相同的段内高斯统计分布,以提高不同环境条件下特征匹配程度;其次,针对由于不同说话人和不同测试环境引起的输出评分分布变化,提出了两阶段的评分规整方法,即先零规整再测试规整(TZnorm)和先测试规整再零规整(ZTnorm)两种得分变换方法,使得失配条件下与说话人无关的决策门限更加鲁棒。基于NIST2002说话人识别评测库上的实验表明,采用SCMVN的特征规整和ZTnorm的评分规整方法能够明显地提高系统性能。与采用倒谱均值减和零规整的基线系统相比,等错误率和最小检测代价分别降低了20.3%和18.1%。  相似文献   

在文本无关的说话人识别中,韵律特征由于其对信道环境噪声不敏感等特性而被应用于话者识别任务中.本文对韵律参数采用基于高斯混合模型超向量的支持向量机建模方法,并将类内协方差特征映射方法应用于模型超向量上,单系统的性能比传统方法的混合高斯-通用背景模型(Gaussian mixture model-universal background model,GMM-UBM)基线系统有了40.19%的提升.该方法与本文的基于声学倒谱参数的确认系统融合后,能使整体系统的识别性能有9.25%的提升.在NIST(National institute of standards and technology mixture)2006说话人测试数据库上,融合后的系统能够取得4.9%的等错误率.  相似文献   

论文介绍了一个基于DSP的说话人确认系统,该系统确认算法建立于高斯混合模型-全局背景模型(GMM-UBM)的基础上,并在特征空间采用一种新的基于信息熵特征融合的算法,实验结果表明在不影响识别率的情况下,该算法计算量比传统的特征关联融合的要减少以上,比归一化融合要少。硬件系统采用高速DSP芯片TMS320C6701,为确认算法的实时实现提供了保证。  相似文献   

x-vector系统将一段不定长的语音通过神经网络映射成固定维的矢量来表征说话人信息,该系统在文本无关的说话人确认(Speaker verification, SV)任务中取得了优异的性能。本文将其应用到文本相关的SV任务中,在x-vector模型选择上,采用残差神经网络以获得更有区分性的x-vector;在包含多字符的语句中,对每个字训练一个残差神经网络;在提取过程中,每一字单独提取一个x-vector并单独进行说话人判决,最后将多个判决得分进行融合后给出最终的识别结果。实验是在数据库RSR2015 Part Ⅲ 上进行的,提出的方法在男性和女性测试集上等错误率分别有15.34%、19.7%的下降。  相似文献   

支持向量机在与文本无关的话者确认系统中已经取得了广泛的应用,但是在实际应用系统中获得的目标说话人样本与冒认者样本数量比一般在几千分之一,因此存在很严重的样本非平衡问题,冒认者样本选择的好坏直接影响到整个系统的性能。本文提出了两种挑选冒认者样本的方法。实验证明这些方法能有效地解决上述问题,性能比随机挑选冒认者样本的方法有了提升,经过在2004年NIST说话人识别数据库上进行测试,等错误率由9.3%降低到6.8%,错误率相对下降了26.9%。  相似文献   

给出了一种基于多微商核函数(MDK)的结合高斯混合模型(GMM)和支持向量机(SVM)的方法,并应用于SVM文本无关话者确认。从GMM话者语音特征概率分布出发,用多阶微商描述GMM概率分布,将GMM和SVM结合的问题转化为用多阶微商建立SVM话者模型的问题。首先对说话人语音进行基于因子分析的参数域失配补偿,用GMM描述失配补偿后的话者语音特征的概率分布;然后对GMM求多阶微商;最后构建多微商核函数,建立多SVM话者模型。在NIST’01 2min-1min话者确认数据库上的实验表明,基于多微商核函数的SVM话者确认系统性能优于基于失配补偿的GMM系统,也比基于失配补偿的Fisher核函数SVM话者系统和基于失配补偿的Kullback-Leibler(KL)距离SVM话者系统有较大的提高。  相似文献   

提出一种可用于较少语音数据量的文本无关的超音段信息提取方法.通过对基音和能量的轨迹动态分段,提取超音段信息,并使用异方差线性区分分析(HLDA)进行参数优化,克服超音段信息提取对数据量大小的依赖,同时采用混合高斯-统一背景(GMM-UBM)模型结构,建立文本无关话者识别系统.在NIST′01数据库上的实验表明,该系统性能优于基于短时帧的音源信息参数系统,更重要的是不需要大数据量的支持,且与基于短时帧倒谱参数的话者识别系统融合后,系统识别性能明显改善,等误识率相对下降10%.  相似文献   

提出在与文本无关说话人确认中采用模型间马氏(Mahalanobis)距离的夹角作为测试算法,在混合高斯模型(Gaussian ixture Model)的情况下,采用这种算法在保持识别率与传统的对数似然度算法相近的前提下,可以大大降低运算量,对于说话人确认或识别的实时实现有很大的帮助.另外,推荐的算法与传统的对数似然度算法的结果可以融合,可以将说话人确认的等错误率降低12~15%.  相似文献   

基于TZ Normalization规整的话者确认阈值选取   总被引:3,自引:0,他引:3  
针对说话人确认中,各目标话者模型输出评分分布不一致而导致系统确认阈值设置的困难,本文采取了通过评分规整确定系统最小检测代价函数(DCF)确认阈值的方法.在分析了已有的两种评分规整方法Z normalization和T normalization的基础上,提出了一种结合两者优点的组合规整方法——TZ normalization,并据此给出了一种阈值动态修正方法,有效地提高了系统的性能和阈值选取的鲁棒性.对历年的NIST(手机电话语音)评测语料库进行了实验,表明了该方法的有效性.  相似文献   

传统特征映射需要大量具有通道标记的语料,近年出现的通道无监督聚类方法也要求每个说话人有多段语音。为此本文讨论了一种新的基于均值超矢量聚类的说话人确认方法,在确保性能的情况下放宽对语料的要求,聚类训练语料是每个说话人只有一段语音的小语料。以女性UBM为基准,对所有女性训练语音均值超矢量相对该UBM的偏移聚类,判别待映射男性语音所属类别后进行特征映射,在特征参数域同时削减掉匹配到的通道信息和一部分女性说话人信息。实验表明,不论从性能还是语料角度,采用本文方法相对其他方法均具备一定优势。  相似文献   

The evolution of robust speech recognition systems that maintain a high level of recognition accuracy in difficult and dynamically-varying acoustical environments is becoming increasingly important as speech recognition technology becomes a more integral part of mobile applications. In distributed speech recognition (DSR) architecture the recogniser's front-end is located in the terminal and is connected over a data network to a remote back-end recognition server. The terminal performs the feature parameter extraction, or the front-end of the speech recognition system. These features are transmitted over a data channel to the remote back-end recogniser. DSR provides particular benefits for the applications of mobile devices such as improved recognition performance compared to using the voice channel and ubiquitous access from different networks with a guaranteed level of recognition performance. A feature extraction algorithm integrated into the DSR system is required to operate in real-time as well as with the lowest possible computational costs.In this paper, two innovative front-end processing techniques for noise robust speech recognition are presented and compared, time-domain based frame-attenuation (TD-FrAtt) and frequency-domain based frame-attenuation (FD-FrAtt). These techniques include different forms of frame-attenuation, improvement of spectral subtraction based on minimum statistics, as well as a mel-cepstrum feature extraction procedure. Tests are performed using the Slovenian SpeechDat II fixed telephone database and the Aurora 2 database together with the HTK speech recognition toolkit. The results obtained are especially encouraging for mobile DSR systems with limited sizes of available memory and processing power.  相似文献   

在基于支持向量机(SVM)的文本无关的说话人确认中,为提高SVM话者模型的训练效率和区分性能,提出2种基于高斯混合模型(GMM)的冒认话者选取方法-通过GMM概率评分,为每个目标说话人选取最接近的话者作为冒认话者用于SVM话者模型的训练,不仅提高模型的训练效率,而且提高SVM模型的区分性,有效地改进系统性能。在NIST’04 Iside—Iside数据库上的实验表明该方法的有效性。  相似文献   

This paper presents the use of distance normalization techniques in order to improve speaker verification system performance. These techniques provide a dynamic threshold that compensates for the trial-to-trial variations and replaces the fixed threshold used in the classical speaker verification approach. Two methods are described: the cohort model normalization and a new and original hybrid cohort-world model normalization. These methods are compared from the point of view of storage space requirements and computational effort. Two algorithms are proposed: one uses existing user models, and the other creates new models. The algorithms were evaluated using the YOHO database and a proprietary database. The results showed that using these methods, the errors of false rejection are significantly reduced for a constant false acceptance error, when the cohort size is increasing. The algorithms also involve fewer computational resources than other algorithms, making them more suitable for commercial application.  相似文献   

This document describes the implementation of a free speech speaker verification solution in the Israel First Direct Bank's (FDI) call center. This implementation is, as far as we know, the world's first commercial free-speech SV implementation.Before implementing the system, FDI was using a manual speaker authentication method consisting of several personal questions that the calling customer had to answer correctly (e.g. What is the first letter of your mother's maiden name). Most call centers still use similar methods for authenticating callers. The FDI realized that this process was expensive and time-consuming and that it was also vulnerable to impostors since the answers for the authentication questions could be obtained quite easily.In this document we describe how free speech SV was implemented and integrated with the FDI systems in order to reduce the cost of customer authentication and to increase security. We discuss the type of expectations the FDI had from this system and how expectations should be adjusted to realistic levels. We also describe some of the major obstacles that were encountered when implementing the system in a real world environment and finally we present the current performance level of the system in terms of false acceptance and false rejection of callers.  相似文献   

本文提出了一种基于语音分段辨认序列信息的与文本无关的说话人确认方法,并且着重分析了其中关键因素的变化,包括聚类数、阈值以及判定准则的变化,对确认效果的影响。通过实验证明了分段辨认序列频度信息是一种非常有效的说话人确认信息,对于确认结果起到很好的辅助作用。同时也指出了新方法的不足和今后的改进方向。  相似文献   

互联网网页中存在大量的专业知识。如何从这些资源中获取知识已经成为10多年来的一个重要的研究课题。概念和概念间的关系是知识的基本组成部分,因此如何获取并验证概念,成为从文本到知识的过程中的重要步骤。本文提出并实现了一种自动从Web语料中获取概念的方法,该方法利用了规则、统计、上下文信息等多种方法和信息。实验结果表明,该方法达到了较好的效果。  相似文献   

针对话者识别系统中特征向量不定长和交叉信道干扰等问题,提出一种基于超向量的扰动属性投影(NAP)核函数。该函数是一种新型的序列核函数,使支持向量机能在整体语音序列上分类,移除核函数空间中与话者识别无关的信道子空间信息。仿真实验结果表明,该函数可有效提高支持向量机的分类性能和话者识别系统的识别准确率。  相似文献   

该文首次提出了一种将有生物视觉依据的人工神经网络——脉冲耦合神经网络(PulseCoupledNeuralNetwork,以下简称为PCNN)用于说话人识别领域的语谱图特征提取的新方法。该方法将语谱图输入到PCNN后得到输出图像的时间序列及其熵序列作为说话人语音的特征,利用它的不变性实现说话人识别。实验结果表明,该方法可以快速有效地进行说话人识别。该文将PCNN引入到语音识别的应用研究中,开拓了信号处理中两个极为重要的部分———语音信号处理和图像信号处理结合的新领域,同时对于PCNN的理论研究和实际应用具有非常重要的现实意义。  相似文献   

