期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

古斌郭武《数据采集与处理》2019,34(5):837-843

在说话人确认任务中,得分规整可有效调整测试得分分布,使每个说话人的得分分布接近同一分布,从而提升系统整体性能。直接从开发集中获得针对待识别目标说话人的大量冒认者得分,利用无监督聚类手段对这些得分进行筛选,并采用混合高斯模型来拟合得分分布,挑选均值最大的高斯单元作为得分规整的参数并将其应用于说话人的得分规整。在NIST SRE 2016测试集上的测试结果表明,相对于其他得分规整算法,采用无监督聚类得分规整的方法可有效提升系统性能。相似文献

2.

说话人确认中基于无监督聚类得分规整

古斌《数据采集与处理》2019,34(5)

在说话人确认任务中,得分规整可有效调整测试得分分布,使得每个说话人的得分分布接近同一分布,从而提升系统整体性能。在本文中,直接从开发集中获得针对待识别目标说话人的大量冒认者得分,利用无监督聚类手段对这些得分进行筛选,并采用混合高斯模型来拟合得分分布,挑选均值最大的高斯单元作为得分规整的参数并将其应用于说话人的得分规整。在NIST SRE 2016测试集上,相对于其它得分规整算法,采用无监督聚类得分规整的方法可有效提升系统性能。相似文献

3.

Speaker Verification Using Adapted Gaussian Mixture Models

《Digital Signal Processing》2000,10(1-3):19-41

Reynolds, Douglas A., Quatieri, Thomas F., and Dunn, Robert B., Speaker Verification Using Adapted Gaussian Mixture Models, Digital Signal Processing10(2000), 19–41.In this paper we describe the major elements of MIT Lincoln Laboratory's Gaussian mixture model (GMM)-based speaker verification system used successfully in several NIST Speaker Recognition Evaluations (SREs). The system is built around the likelihood ratio test for verification, using simple but effective GMMs for likelihood functions, a universal background model (UBM) for alternative speaker representation, and a form of Bayesian adaptation to derive speaker models from the UBM. The development and use of a handset detector and score normalization to greatly improve verification performance is also described and discussed. Finally, representative performance benchmarks and system behavior experiments on NIST SRE corpora are presented. 相似文献

4.

基于自适应高斯混合模型特征映射的说话人确认

杨世清戴蓓蒨许敏强刘青松《模式识别与人工智能》2009,22(3):417-421

为了解决电话语音说话人确认系统中信道非线性失真导致系统性能下降的问题,提出一种消除信道影响的特征映射方法.采用高斯混合模型建立语音模型,通过最大后验概率自适应某种信道的语音模型,两种模型间相应高斯类的差异描述了该信道对于不同语音的影响.由此得出信道映射规则进行参数补偿,消除训练和测试语音中不匹配的影响.在NIST 1999年和2004年男性说话人的数据库上进行的实验表明,此方法使系统的等错误率分别改善了14.7%和15.18%. 相似文献

5.

说话人识别中的因子分析以及空间拼接 总被引：1，自引：0，他引：1

郭武李轶杰戴礼荣王仁华《自动化学报》2009,35(9):1193-1198

联合因子分析可以有效拟合混合高斯模型中的说话人和信道差异, 在说话人识别中得到广泛应用. 一般情况下, 该算法在对说话人和信道两个载荷矩阵进行联合估计时, 说话人残差矩阵无法发挥作用, 信道载荷矩阵的因子数不能提高. 本文提出说话人载荷矩阵、说话人残差载荷矩阵采用串行的训练模式, 在信道载荷矩阵训练中采用矩阵拼接的方法, 能够有效提高识别率; 在NIST SRE 2008年核心测试数据库的五个部分分别达到等错误率3.3%, 5.1%, 5.0%, 5.3%和5.0%. 相似文献

6.

Speaker recognition using pyramid match kernel based support vector machines

A. D. Dileep C. Chandra Sekhar 《International Journal of Speech Technology》2012,15(3):365-379

Gaussian mixture model (GMM) based approaches have been commonly used for speaker recognition tasks. Methods for estimation of parameters of GMMs include the expectation-maximization method which is a non-discriminative learning based method. Discriminative classifier based approaches to speaker recognition include support vector machine (SVM) based classifiers using dynamic kernels such as generalized linear discriminant sequence kernel, probabilistic sequence kernel, GMM supervector kernel, GMM-UBM mean interval kernel (GUMI) and intermediate matching kernel. Recently, the pyramid match kernel (PMK) using grids in the feature space as histogram bins and vocabulary-guided PMK (VGPMK) using clusters in the feature space as histogram bins have been proposed for recognition of objects in an image represented as a set of local feature vectors. In PMK, a set of feature vectors is mapped onto a multi-resolution histogram pyramid. The kernel is computed between a pair of examples by comparing the pyramids using a weighted histogram intersection function at each level of pyramid. We propose to use the PMK-based SVM classifier for speaker identification and verification from the speech signal of an utterance represented as a set of local feature vectors. The main issue in building the PMK-based SVM classifier is construction of a pyramid of histograms. We first propose to form hard clusters, using k-means clustering method, with increasing number of clusters at different levels of pyramid to design the codebook-based PMK (CBPMK). Then we propose the GMM-based PMK (GMMPMK) that uses soft clustering. We compare the performance of the GMM-based approaches, and the PMK and other dynamic kernel SVM-based approaches to speaker identification and verification. The 2002 and 2003 NIST speaker recognition corpora are used in evaluation of different approaches to speaker identification and verification. Results of our studies show that the dynamic kernel SVM-based approaches give a significantly better performance than the state-of-the-art GMM-based approaches. For speaker recognition task, the GMMPMK-based SVM gives a performance that is better than that of SVMs using many other dynamic kernels and comparable to that of SVMs using state-of-the-art dynamic kernel, GUMI kernel. The storage requirements of the GMMPMK-based SVMs are less than that of SVMs using any other dynamic kernel. 相似文献

7.

Efficient Speaker Recognition Using Approximated Cross Entropy (ACE)

Aronowitz H. Burshtein D. 《IEEE transactions on audio, speech, and language processing》2007,15(7):2033-2043

Techniques for efficient speaker recognition are presented. These techniques are based on approximating Gaussian mixture modeling (GMM) likelihood scoring using approximated cross entropy (ACE). Gaussian mixture modeling is used for representing both training and test sessions and is shown to perform speaker recognition and retrieval extremely efficiently without any notable degradation in accuracy compared to classic GMM-based recognition. In addition, a GMM compression algorithm is presented. This algorithm decreases considerably the storage needed for speaker retrieval. 相似文献

8.

Robust speaker recognition in cross-channel condition based on Gaussian mixture model

Yuxiang Shan Jia Liu 《Multimedia Tools and Applications》2011,52(1):159-173

One of the most difficult challenges for speaker recognition is dealing with channel variability. In this paper, several new cross-channel compensation techniques are introduced for a Gaussian mixture model—universal background model (GMM-UBM) speaker verification system. These new techniques include wideband noise reduction, echo cancellation, a simplified feature-domain latent factor analysis (LFA) and data-driven score normalization. A novel dynamic Gaussian selection algorithm is developed to reduce the feature compensation time by more than 60% without any performance loss. The performance of different techniques across varying channel train/test conditions are presented and discussed, finding that speech enhancement, which used to be neglected for telephone speech, is essential for cross-channel tasks, and the channel compensation techniques developed for telephone channel speech also perform effectively. The per microphone performance analysis further shows that speech enhancement can boost the effects of other techniques greatly, especially on channels with larger signal-to-noise ratio (SNR) variance. All results are presented on NIST SRE 2006 and 2008 data, showing a promising performance gain compared to the baseline. The developed system is also compared with other state-of-the-art speaker verification systems. The result shows that the developed system can obtain comparable or even better performance but consumes much less CPU time, making it more suitable for practical use. 相似文献

9.

采用主成分分析的特征映射 总被引：1，自引：0，他引：1

郭武 DAI Li-Rong 王仁华《自动化学报》2008,34(8):876-879

在与文本无关的说话人识别研究中, 特征映射的方法可以有效减少信道的影响. 本文首先通过主成分分析的方法在模型域中估计出信道因子所在的空间, 然后通过映射的方法在特征参数域中减去信道因子的影响. 采用这种方法需要有信道信息标记的数据, 但是在特征映射时不需要对信道进行判决. 在NIST 2006年SRE 1conv4w-1conv4w数据库上, 采用本文推荐方法的系统相对基线系统在等错误率上降低了19\%. 相似文献

10.

Quality measures for speaker verification with short utterances

《Digital Signal Processing》2019

The performances of the automatic speaker verification (ASV) systems degrade due to the reduction in the amount of speech used for enrollment and verification. Combining multiple systems based on different features and classifiers considerably reduces speaker verification error rate with short utterances. This work attempts to incorporate supplementary information during the system combination process. We use quality of the estimated model parameters as supplementary information. We introduce a class of novel quality measures formulated using the zero-order sufficient statistics used during the i-vector extraction process. We have used the proposed quality measures as side information for combining ASV systems based on Gaussian mixture model–universal background model (GMM–UBM) and i-vector. The proposed methods demonstrate considerable improvement in speaker recognition performance on NIST SRE corpora, especially in short duration conditions. We have also observed improvement over existing systems based on different duration-based quality measures. 相似文献

11.

A fast and noise resilient cluster-based anomaly detection

Elnaz Bigdeli Mahdi Mohammadi Bijan Raahemi Stan Matwin 《Pattern Analysis & Applications》2017,20(1):183-199

Clustering, while systematically applied in anomaly detection, has a direct impact on the accuracy of the detection methods. Existing cluster-based anomaly detection methods are mainly based on spherical shape clustering. In this paper, we focus on arbitrary shape clustering methods to increase the accuracy of the anomaly detection. However, since the main drawback of arbitrary shape clustering is its high memory complexity, we propose to summarize clusters first. For this, we design an algorithm, called Summarization based on Gaussian Mixture Model (SGMM), to summarize clusters and represent them as Gaussian Mixture Models (GMMs). After GMMs are constructed, incoming new samples are presented to the GMMs, and their membership values are calculated, based on which the new samples are labeled as “normal” or “anomaly.” Additionally, to address the issue of noise in the data, instead of labeling samples individually, they are clustered first, and then each cluster is labeled collectively. For this, we present a new approach, called Collective Probabilistic Anomaly Detection (CPAD), in which, the distance of the incoming new samples and the existing SGMMs is calculated, and then the new cluster is labeled the same as of the closest cluster. To measure the distance of two GMM-based clusters, we propose a modified version of the Kullback–Libner measure. We run several experiments to evaluate the performances of the proposed SGMM and CPAD methods and compare them against some of the well-known algorithms including ABACUS, local outlier factor (LOF), and one-class support vector machine (SVM). The performance of SGMM is compared with ABACUS using Dunn and DB metrics, and the results indicate that the SGMM performs superior in terms of summarizing clusters. Moreover, the proposed CPAD method is compared with the LOF and one-class SVM considering the performance criteria of (a) false alarm rate, (b) detection rate, and (c) memory efficiency. The experimental results show that the CPAD method is noise resilient, memory efficient, and its accuracy is higher than the other methods. 相似文献

12.

Fusion of Heterogeneous Speaker Recognition Systems in the STBU Submission for the NIST Speaker Recognition Evaluation 2006

Brummer N. Burget L. Cernocky J.H. Glembek O. Grezl F. Karafiat M. van Leeuwen D.A. Matejka P. Schwarz P. Strasheim A. 《IEEE transactions on audio, speech, and language processing》2007,15(7):2072-2084

This paper describes and discusses the "STBU" speaker recognition system, which performed well in the NIST Speaker Recognition Evaluation 2006 (SRE). STBU is a consortium of four partners: Spescom DataVoice (Stellenbosch, South Africa), TNO (Soesterberg, The Netherlands), BUT (Brno, Czech Republic), and the University of Stellenbosch (Stellenbosch, South Africa). The STBU system was a combination of three main kinds of subsystems: 1) GMM, with short-time Mel frequency cepstral coefficient (MFCC) or perceptual linear prediction (PLP) features, 2) Gaussian mixture model-support vector machine (GMM-SVM), using GMM mean supervectors as input to an SVM, and 3) maximum-likelihood linear regression-support vector machine (MLLR-SVM), using MLLR speaker adaptation coefficients derived from an English large vocabulary continuous speech recognition (LVCSR) system. All subsystems made use of supervector subspace channel compensation methods-either eigenchannel adaptation or nuisance attribute projection. We document the design and performance of all subsystems, as well as their fusion and calibration via logistic regression. Finally, we also present a cross-site fusion that was done with several additional systems from other NIST SRE-2006 participants. 相似文献

13.

基于线性对数似然核函数的说话人识别

何亮刘加《计算机应用》2011,31(8):2083-2086

为了提高文本无关的说话人识别系统的性能,提出了基于线性对数似然核函数的说话人识别系统。线性对数似然核函数利用高斯混合模型对频谱特征序列进行压缩;将频谱特征序列之间的相似程度转化为高斯混合模型参数之间的距离;根据距离表达式,利用极化恒等式求得频谱特征序列向高维矢量空间的映射方法;最后,在高维矢量空间,采用支持向量机(SVM)为目标说话人建立模型。在美国国家标准技术署公布的说话人识别数据库上的实验结果表明,所提核函数具有优异的识别性能。相似文献

14.

Sub-vector based biometric speaker verification using MLLR super-vector

A. K. Sarkar J. F. Bonastre 《International Journal of Speech Technology》2016,19(1):41-54

In this paper, we propose a sub-vector based speaker characterization method for biometric speaker verification, where speakers are represented by uniform segmentation of their maximum likelihood linear regression (MLLR) super-vectors called m-vectors. The MLLR transformation is estimated with respect to universal background model (UBM) without any speech/phonetic information. We introduce two strategies for segmentation of MLLR super-vector: one is called disjoint and other is an overlapped window technique. During test phase, m-vectors of the test utterance are scored against the claimant speaker. Before scoring, m-vectors are post-processed to compensate the session variability. In addition, we propose a clustering algorithm for multiple-class wise MLLR transformation, where Gaussian components of the UBM are clustered into different groups using the concept of expectation maximization (EM) and maximum likelihood (ML). In this case, MLLR transformations are estimated with respect to each class using the sufficient statistics accumulated from the Gaussian components belonging to the particular class, which are then used for m-vector system. The proposed method needs only once alignment of the data with respect to the UBM for multiple MLLR transformations. We first show that the proposed multi-class m-vector system shows promising speaker verification performance when compared to the conventional i-vector based speaker verification system. Secondly, the proposed EM based clustering technique is robust to the random initialization in-contrast to the conventional K-means algorithm and yields system performance better/equal which is best obtained by the K-means. Finally, we show that the fusion of the m-vector with the i-vector further improves the performance of the speaker verification in both score as well as feature domain. The experimental results are shown on various tasks of NIST 2008 speaker recognition evaluation (SRE) core condition. 相似文献

15.

Multistage speaker diarization of broadcast news

Barras C. Xuan Zhu Meignier S. Gauvain J.-L. 《IEEE transactions on audio, speech, and language processing》2006,14(5):1505-1512

相似文献

16.

基于因子分析信道失配补偿的SVM话者确认方法

吴德辉李辉刘青松戴蓓蒨《模式识别与人工智能》2010,23(1):59-64

针对信道失配和统计模型区分性不足而导致话者确认性能下降问题,文中提出一种将因子分析信道失配补偿与支持向量机模型相结合的文本无关话者确认方法。在SVM话者模型前端采用高斯混合模型-背景模型(GMM-UBM)方法对语音特征参数进行聚类和升维,并利用因子分析(FA)方法,对聚类获得的超矢量进行信道补偿后作为基于SVM话者确认的输入特征,从而有效解决SVM用于文本无关话者确认的大样本、升维问题,以及信道失配对性能影响问题。在NIST 06数据库上实验结果表明,文中方法比未做失配补偿的GMM-UBM系统、GMM-SVM系统在等误识率上有50%以上的改善,比做了FA失配补偿的GMM-UBM系统也有15。8%的改善。相似文献

17.

用于说话人识别的基于可变因子整合的高斯混合模型

李杰刘贺平《模式识别与人工智能》2012,25(6):937-942

针对传统高斯混合模型在噪声环境下识别率明显下降的问题,在借鉴随机概率分布模型间的α因子融合机制基础上,提出基于可变因子α整合的高斯混合模型。该模型通过引入可变因子使得混合模型中不同成分所占的比重又得到一次调整。实验结果表明,通过对该模型参数进行重估计,在TIMIT/NTIMIT两种不同语料库和不同样本集的情况下识别率较传统高斯模型均有提高。尤其在噪声环境和α因子取最优值时,识别率可提高8%,在NIST评测数据集上与GMM-UBM系统对比,识别率也有提高。相似文献

18.

A multi-manifold semi-supervised Gaussian mixture model for pattern classification

Xianglei Xing Yao Yu Hua Jiang Sidan Du 《Pattern recognition letters》2013,34(16):2118-2125

Semi-supervised Gaussian mixture model (SGMM) has been successfully applied to a wide range of engineering and scientific fields, including text classification, image retrieval, and biometric identification. Recently, many studies have shown that naturally occurring data may reside on or near manifold structures in ambient space. In this paper, we study the use of SGMM for data sets containing multiple separated or intersecting manifold structures. We propose a new multi-manifold regularized, semi-supervised Gaussian mixture model (M2SGMM) for classifying multiple manifolds. Specifically, we model the data manifold using a similarity graph with local and geometrical consistency properties. The geometrical similarity is measured by a novel application of local tangent space. We regularize the model parameters of the SGMM by incorporating the enhanced Laplacian of the graph. Experiments demonstrate the effectiveness of the proposed approach. 相似文献

19.

采用非参数方法建模的短时话者识别

靳玉红《小型微型计算机系统》2012,33(5):1131-1134

在与文本无关的声纹识别研究中,目前性能较好而且较成熟的系统均是基于训练并在测试数据时长较长的情况下获得的,如NIST评测中的核心测试环境下训练和测试语音时长约5分钟.而在实际应用中,由于声纹识别的特殊性,用户一般都不太配合,通常很难获得足够多的训练语音数据,从而限制了经典的话者识别系统,大大降低了其性能.本文针对与实际应用直接相关的短时话者识别,提出了一种采用Parzen Window的非参数估计方法,对目标话者的短时数据进行建模,从而达到提高话者模型推广能力的目标.该方法在NIST SRE2006的短时任务10s训练,测试的实验结果与传统的GMM-UBM得分融合后,在等错误率EER下比基线系统相对降低了10.76%. 相似文献

20.

基于GMM统计特性参数和SVM的话者确认 总被引：1，自引：0，他引：1

黄伟戴蓓蒨《数据采集与处理》2004,19(4):365-370

针对与文本无关的话者确认中大量训练样本数据的情况，本文提出了一种基于GMM统计特性参数和支持向量机的与文本无关的话者确认系统，以说话人的GMM统计特性参数作为特征参数训练建立目标话者的SVM模型，既有效地提取了话者特征信息，解决了大样本数据下的SVM训练问题，又结合了统计模型鲁棒性好和辨别模型分辨力好的优点，提高了确认系统的确认性能及鲁棒性。对微软麦克风语音数据库和NIST’01手机电话语音数据库的实验表明该方法的有效性。相似文献