共查询到15条相似文献,搜索用时 156 毫秒
1.
针对信道失配和统计模型区分性不足而导致话者确认性能下降问题,文中提出一种将因子分析信道失配补偿与支持向量机模型相结合的文本无关话者确认方法。在SVM话者模型前端采用高斯混合模型-背景模型(GMM-UBM)方法对语音特征参数进行聚类和升维,并利用因子分析(FA)方法,对聚类获得的超矢量进行信道补偿后作为基于SVM话者确认的输入特征,从而有效解决SVM用于文本无关话者确认的大样本、升维问题,以及信道失配对性能影响问题。在NIST 06数据库上实验结果表明,文中方法比未做失配补偿的GMM-UBM系统、GMM-SVM系统在等误识率上有50%以上的改善,比做了FA失配补偿的GMM-UBM系统也有15。8%的改善。 相似文献
2.
基于GMM统计特性参数和SVM的话者确认 总被引:1,自引:0,他引:1
针对与文本无关的话者确认中大量训练样本数据的情况,本文提出了一种基于GMM统计特性参数和支持向量机的与文本无关的话者确认系统,以说话人的GMM统计特性参数作为特征参数训练建立目标话者的SVM模型,既有效地提取了话者特征信息,解决了大样本数据下的SVM训练问题,又结合了统计模型鲁棒性好和辨别模型分辨力好的优点,提高了确认系统的确认性能及鲁棒性。对微软麦克风语音数据库和NIST’01手机电话语音数据库的实验表明该方法的有效性。 相似文献
3.
提出一种结合统计模型与区分性模型优点的说话人确认方法:基于GMM多维概率输出的SVM话者模型的说话人确认.以目标说话人的GMM模型对一条语音的不同特征分量的概率输出作为特征参数,建立目标说话人的SVM模型.在NIST'05 8conv4w-1conv4w数据库上的实验表明该方法的有效性. 相似文献
4.
针对支持向量机(SVM)输入参数不能充分利用高斯混合模型(GMM)均值、方差、权重所携带的说话人信息,而导致与文本无关话者确认系统性能下降的问题,本文结合GMM的均值、方差、权重,提出一种新的、基于自适应后GMM的,SVM模型输入特征提取方法。在NIST 06语音数据库上的实验表明,本方法将等误识率(EER)从高斯混合模型-通用背景模型(GMMUBM)系统的8.49%,下降到基于离散余弦变换(DCT)变换GMM-SVM系统的4.16%,以及基于主元成分分析(PCA)GMMSVM系统的3.3%. 相似文献
5.
6.
提出了一种基于韵律特征和SVM的文本无关说话人确认系统。采用小波分析方法,从语音信号的MFCC、F0和能量轨迹中提取出超音段韵律特征,通过实验研究三者的韵律特征在特征层的最佳互补融合,得到信号的韵律特征PMFCCFE,用韵律特征的GMM均值超矢量作为参数训练目标话者的SVM模型,以更有效地区分目标话者和冒认话者。在NIST06 8side-1side数据库的实验表明,以短时倒谱参数的GMM-UBM系统为基准,超音段韵律特征的GMM-SVM系统的EER相对下降了57.9%,MinDCF相对下降了41.4%。 相似文献
7.
支持向量机作为说话人建模方法用于与文本无关的话者确认研究时,如何提取适合SVM训练和测试的特征参数直接影响话者确认系统的性能和效率.根据高斯混合模型(GMM)聚类能力强的特点,提出一种基于自适应GMM聚类的说话人特征参数提取方法,通过自适应的GMM聚类将大样本、混叠严重的M FCC特征参数聚为小样本的、代表说话人个性特征的特征参数,并用于与文本无关的SVM话者确认.在N IST0′4 1side-1side数据库上的实验表明了该方法的有效性. 相似文献
8.
基于分类高斯混合模型和神经网络融合的与文本无关的说话人识别 总被引:1,自引:0,他引:1
本文提出了一种基于分类高斯混合模型和神经网络融合的说话人识别系统,根据能量阈值将每个话者语音的语音帧分为两类,在分类子空间分别为每个话者建立两个分类话者模型(GMM),并为每个话者建立一个用于对这两类模型进行数据融合的神经网络,话者识别的结果是经对各个话者神经网络的输出进行判决后做出的.在100个男性话者的与文本无关的说话人识别实验中,基于分类话者模型的策略在识别性能和噪声鲁棒性上均优于传统的GMM话者识别系统,而采用神经网络进行后端融合的策略又优于直接融合的策略,从而可以用较低的话者模型混合度和较短的测试语音获得较好的识别性能及噪声鲁棒性. 相似文献
9.
10.
相关向量机(RVM)分类法使用概率输出克服了支持向量机(SVM)识别速率低的缺点,并且具有更好的稀疏性。但在与文本无关的话者辨别中,大量训练样本数据体现了RVM在模型训练时计算量与内存需求过大的缺点。针对以上特点,提出基于GMM统计特征参数与RVM融合的与文本无关的语者辨别系统,既有效地提取话者特征信息,解决大样本数据下的RVM训练问题,又结合统计模型鲁棒性高和分辨模型辨别效果好的优点。实验结果证明,该系统比基本的GMM系统具有更优的错误辨别率,比GMM/SVM系统具有更高的稀疏性。 相似文献
11.
Gaussian mixture model (GMM) based approaches have been commonly used for speaker recognition tasks. Methods for estimation of parameters of GMMs include the expectation-maximization method which is a non-discriminative learning based method. Discriminative classifier based approaches to speaker recognition include support vector machine (SVM) based classifiers using dynamic kernels such as generalized linear discriminant sequence kernel, probabilistic sequence kernel, GMM supervector kernel, GMM-UBM mean interval kernel (GUMI) and intermediate matching kernel. Recently, the pyramid match kernel (PMK) using grids in the feature space as histogram bins and vocabulary-guided PMK (VGPMK) using clusters in the feature space as histogram bins have been proposed for recognition of objects in an image represented as a set of local feature vectors. In PMK, a set of feature vectors is mapped onto a multi-resolution histogram pyramid. The kernel is computed between a pair of examples by comparing the pyramids using a weighted histogram intersection function at each level of pyramid. We propose to use the PMK-based SVM classifier for speaker identification and verification from the speech signal of an utterance represented as a set of local feature vectors. The main issue in building the PMK-based SVM classifier is construction of a pyramid of histograms. We first propose to form hard clusters, using k-means clustering method, with increasing number of clusters at different levels of pyramid to design the codebook-based PMK (CBPMK). Then we propose the GMM-based PMK (GMMPMK) that uses soft clustering. We compare the performance of the GMM-based approaches, and the PMK and other dynamic kernel SVM-based approaches to speaker identification and verification. The 2002 and 2003 NIST speaker recognition corpora are used in evaluation of different approaches to speaker identification and verification. Results of our studies show that the dynamic kernel SVM-based approaches give a significantly better performance than the state-of-the-art GMM-based approaches. For speaker recognition task, the GMMPMK-based SVM gives a performance that is better than that of SVMs using many other dynamic kernels and comparable to that of SVMs using state-of-the-art dynamic kernel, GUMI kernel. The storage requirements of the GMMPMK-based SVMs are less than that of SVMs using any other dynamic kernel. 相似文献
12.
13.
This paper presents a principled SVM based speaker verification system. We propose a new framework and a new sequence kernel that can make use of any Mercer kernel at the frame level. An extension of the sequence kernel based on the Max operator is also proposed. The new system is compared to state-of-the-art GMM and other SVM based systems found in the literature on the Banca and Polyvar databases. The new system outperforms, most of the time, the other systems, statistically significantly. Finally, the new proposed framework clarifies previous SVM based systems and suggests interesting future research directions. 相似文献
14.
15.
Mitchell McLaren Driss Matrouf Robbie Vogt Jean-Francois Bonastre 《Computer Speech and Language》2011,25(2):327-340
This paper presents an extended study on the implementation of support vector machine (SVM) based speaker verification in systems that employ continuous progressive model adaptation using the weight-based factor analysis model. The weight-based factor analysis model compensates for session variations in unsupervised scenarios by incorporating trial confidence measures in the general statistics used in the inter-session variability modelling process. Employing weight-based factor analysis in Gaussian mixture models (GMMs) was recently found to provide significant performance gains to unsupervised classification. Further improvements in performance were found through the integration of SVM-based classification in the system by means of GMM supervectors.This study focuses particularly on the way in which a client is represented in the SVM kernel space using single and multiple target supervectors. Experimental results indicate that training client SVMs using a single target supervector maximises performance while exhibiting a certain robustness to the inclusion of impostor training data in the model. Furthermore, the inclusion of low-scoring target trials in the adaptation process is investigated where they were found to significantly aid performance. 相似文献