首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In the i-vector/probabilistic linear discriminant analysis (PLDA) technique, the PLDA backend classifier is modelled on i-vectors. PLDA defines an i-vector subspace that compensates the unwanted variability and helps to discriminate among speaker-phrase pairs. The channel or session variability manifested in i-vectors are known to be nonlinear in nature. PLDA training, however, assumes the variability to be linearly separable, thereby causing loss of important discriminating information. Besides, the i-vector estimation, itself, is known to be poor in case of short utterances. This paper attempts to address these issues using a simple hierarchy-based system. A modified fuzzy-clustering technique is employed to divide the feature space into more characteristic feature subspaces using vocal source features. Thereafter, a separate i-vector/PLDA model is trained for each of the subspaces. The sparser alignment owing to subspace-specific universal background model and the relatively reduced dimensions of variability in individual subspaces help to train more effective i-vector/PLDA models. Also, vocal source features are complementary to mel frequency cepstral coefficients, which are transformed into i-vectors using mixture model technique. As a consequence, vocal source features and i-vectors tend to have complementary information. Thus using vocal source features for classification in a hierarchy tree may help to differentiate some of the speaker-phrase classes, which otherwise are not easily discriminable based on i-vectors. The proposed technique has been validated on Part 1 of RSR2015 database, and it shows a relative equal error rate reduction of up to 37.41% with respect to the baseline i-vector/PLDA system.  相似文献   

2.
利用i-vector/PLDA模型进行说话人确认时,对于不定时间的语音,由于将长度归一化后的i-vector转化到PLDA模型时,伴随着不确定性的扭曲和缩放,影响识别率。本文通过对全变量空间矩阵T的列向量执行归一化,代替在PLDA模型上对i-vector进行长度归一化,避免因在i-vector上执行长度归一化,导致转移到PLDA模型上产生不良的扭曲。实验结果表明,该方法得到和长度归一化相似的效果,部分效果要优于长度归一化。  相似文献   

3.
I-vector说话人识别系统常用距离来衡量说话人语音间的相似度。加权成对约束度量学习算法(WPCML)利用成对训练样本的加权约束信息训练一个用于计算马氏距离的度量矩阵。该度量矩阵表示的样本空间中,同类样本间的距离更小,非同类样本间的距离更大。在美国国家标准技术局(NIST)2008年说话人识别评测数据库(SRE08)的实验结果表明,WPCML算法训练度量矩阵用于马氏距离相似度打分,比用余弦距离相似度打分的性能更好。选择训练样本对方法用于构造度量学习训练样本集能进一步提高系统实验性能,并优于目前最流行的PLDA分类器。  相似文献   

4.
5.
置信度判决用于确定语音数据与模型之间的匹配程度,可以发现语音命令系统中的识别错误,提高其可靠性.近年来,基于身份矢量(identity vector,i-vector)以及概率线性判别分析(Probabilistic Linear Discriminant Analysis,PLDA)的方法在说话人识别任务中取得了显著效果.本文尝试将i-vector以及PLDA模型作为一种命令词识别结果置信度分析方法,其无需声学模型、语言模型支撑,且实验表明性能良好.在此基础上,针对i-vector在刻画时序信息方面的不足,尝试将该系统与DTW融合,有效提升了系统对音频时序的鉴别能力.  相似文献   

6.
稀疏表示以其出色的分类性能成为说话人确认研究的热点,其中过完备字典的构建是关键,直接影响其性能。为了提高说话人确认系统的鲁棒性,同时解决稀疏表示过完备字典中存在噪声及信道干扰信息的问题,提出一种基于i-向量的主成分稀疏表示字典学习算法。该算法在高斯通用背景模型的基础上提取说话人的i-向量,并使用类内协方差归一化技术对i-向量进行信道补偿;根据信道补偿后的说话人i-向量的均值向量估计其信道偏移空间,在该空间采用主成分分析方法提取低维信道偏移主分量,用于重新计算说话人i-向量,从而达到进一步抑制i-向量中信道干扰的目的;将新的i-向量作为字典原子构建高鲁棒性稀疏表示过完备字典。在测试阶段,测试语音的i-向量在该字典上寻找其稀疏表示系数向量,根据系数向量对测试i-向量的重构误差确定目标说话人。仿真实验表明,该算法具有良好的识别性能。  相似文献   

7.
Speaker verification (SV) using i-vector concept becomes state-of-the-art. In this technique, speakers are projected onto the total variability space and represented by vectors called i-vectors. During testing, the i-vectors of the test speech segment and claimant are conditioned to compensate for the session variability before scoring. So, i-vector system can be viewed as two processing blocks: one is total variability space and the other is post-processing module. Several questions arise, such as, (i) which part of the i-vector system plays a major role in speaker verification: total variability space or post-processing task; (ii) is the post-processing module intrinsic to the total variability space? The motivation of this paper is to partially answer these questions by proposing several simpler speaker characterization systems for speaker verification, where speakers are represented by their speaker characterization vectors (SCVs). The SCVs are obtained by uniform segmentation of the speakers gaussian mixture models (GMMs)- and maximum likelihood linear regression (MLLR) super-vectors. We consider two adaptation approaches for GMM super-vector: one is maximum a posteriori and other is MLLR. Similarly to the i-vector, SCVs are post-processed for session variability compensation during testing. The proposed system shows promising performance when compared to the classical i-vector system which indicates that the post-processing task plays an major role in i-vector based SV system and is not intrinsic to the total variability space. All experimental results are shown on NIST 2008 SRE core condition.  相似文献   

8.
The performances of the automatic speaker verification (ASV) systems degrade due to the reduction in the amount of speech used for enrollment and verification. Combining multiple systems based on different features and classifiers considerably reduces speaker verification error rate with short utterances. This work attempts to incorporate supplementary information during the system combination process. We use quality of the estimated model parameters as supplementary information. We introduce a class of novel quality measures formulated using the zero-order sufficient statistics used during the i-vector extraction process. We have used the proposed quality measures as side information for combining ASV systems based on Gaussian mixture model–universal background model (GMM–UBM) and i-vector. The proposed methods demonstrate considerable improvement in speaker recognition performance on NIST SRE corpora, especially in short duration conditions. We have also observed improvement over existing systems based on different duration-based quality measures.  相似文献   

9.
王康  董元菲 《计算机应用》2019,39(10):2937-2941
针对传统身份认证矢量(i-vector)与概率线性判别分析(PLDA)结合的声纹识别模型步骤繁琐、泛化能力较弱等问题,构建了一个基于角度间隔嵌入特征的端到端模型。该模型特别设计了一个深度卷积神经网络,从语音数据的声学特征中提取深度说话人嵌入;选择基于角度改进的A-Softmax作为损失函数,在角度空间中使模型学习到的不同类别特征始终存在角度间隔并且同类特征间聚集更紧密。在公开数据集VoxCeleb2上进行的测试表明,与i-vector结合PLDA的方法相比,该模型在说话人辨认中的Top-1和Top-5上准确率分别提高了58.9%和30%;而在说话人确认中的最小检测代价和等错误率上分别减小了47.9%和45.3%。实验结果验证了所设计的端到端模型更适合在多信道、大规模的语音数据集上学习到有类别区分性的特征。  相似文献   

10.
This paper presents a simplified and supervised i-vector modeling approach with applications to robust and efficient language identification and speaker verification. First, by concatenating the label vector and the linear regression matrix at the end of the mean supervector and the i-vector factor loading matrix, respectively, the traditional i-vectors are extended to label-regularized supervised i-vectors. These supervised i-vectors are optimized to not only reconstruct the mean supervectors well but also minimize the mean square error between the original and the reconstructed label vectors to make the supervised i-vectors become more discriminative in terms of the label information. Second, factor analysis (FA) is performed on the pre-normalized centered GMM first order statistics supervector to ensure each gaussian component's statistics sub-vector is treated equally in the FA, which reduces the computational cost by a factor of 25 in the simplified i-vector framework. Third, since the entire matrix inversion term in the simplified i-vector extraction only depends on one single variable (total frame number), we make a global table of the resulting matrices against the frame numbers’ log values. Using this lookup table, each utterance's simplified i-vector extraction is further sped up by a factor of 4 and suffers only a small quantization error. Finally, the simplified version of the supervised i-vector modeling is proposed to enhance both the robustness and efficiency. The proposed methods are evaluated on the DARPA RATS dev2 task, the NIST LRE 2007 general task and the NIST SRE 2010 female condition 5 task for noisy channel language identification, clean channel language identification and clean channel speaker verification, respectively. For language identification on the DARPA RATS, the simplified supervised i-vector modeling achieved 2%, 16%, and 7% relative equal error rate (EER) reduction on three different feature sets and sped up by a factor of more than 100 against the baseline i-vector method for the 120 s task. Similar results were observed on the NIST LRE 2007 30 s task with 7% relative average cost reduction. Results also show that the use of Gammatone frequency cepstral coefficients, Mel-frequency cepstral coefficients and spectro-temporal Gabor features in conjunction with shifted-delta-cepstral features improves the overall language identification performance significantly. For speaker verification, the proposed supervised i-vector approach outperforms the i-vector baseline by relatively 12% and 7% in terms of EER and norm old minDCF values, respectively.  相似文献   

11.
i-vector是反映说话人声学差异的一种重要特征,在目前的说话人识别和说话人验证中显示了有效性。将i-vector应用于语音识别中的说话人的声学特征归一化,对训练数据提取i-vector并利用LBG算法进行无监督聚类.然后对各类分别训练最大似然线性变换并使用说话人自适应训练来实现说话人的归一化。将变换后的特征用于训练和识别.实验表明该方法能够提高语音识别的性能。  相似文献   

12.
This paper explores the robustness of supervector-based speaker modeling approaches for speaker verification (SV) in noisy environments. In this paper speaker modeling is carried out in two different frameworks: (i) Gaussian mixture model-support vector machine (GMM-SVM) combined method and (ii) total variability modeling method. In the GMM-SVM combined method, supervectors obtained by concatenating the mean of an adapted speaker GMMs are used to train speaker-specific SVMs during the training/enrollment phase of SV. During the evaluation/testing phase, noisy test utterances transformed into supervectors are subjected to SVM-based pattern matching and classification. In the total variability modeling method, large size supervectors are reduced to a low dimensional channel robust vector (i-vector) prior to SVM training and subsequent evaluation. Special emphasis has been laid on the significance of a utterance partitioning technique for mitigating data-imbalance and utterance duration mismatches. An adaptive boosting algorithm is proposed in the total variability modeling framework for enhancing the accuracy of SVM classifiers. Experiments performed on the NIST-SRE-2003 database with training and test utterances corrupted with additive noises indicate that the aforementioned modeling methods outperform the standard GMM-universal background model (GMM-UBM) framework for SV. It is observed that the use of utterance partitioning and adaptive boosting in the speaker modeling frameworks result in substantial performance improvements under degraded conditions.  相似文献   

13.
提出了一种将基于深度神经网络(Deep Neural Network,DNN)特征映射的回归分析模型应用到身份认证矢量(identity vector,i-vector)/概率线性判别分析(Probabilistic Linear Discriminant Analysis,PLDA)说话人系统模型中的方法。DNN通过拟合含噪语音和纯净语音i-vector之间的非线性函数关系,得到纯净语音i-vector的近似表征,达到降低噪声对系统性能影响的目的。在TIMIT数据集上的实验验证了该方法的可行性和有效性。  相似文献   

14.
在基于全差异空间因子(i-Vector)的说话人确认系统中,需进一步从语音段的i-Vector表示中提取说话人相关的区分性信息,以提高系统性能。文中通过结合锚模型的思想,提出一种基于深层置信网络的建模方法。该方法通过对i-Vector中包含的复杂差异信息逐层进行分析、建模,以非线性变换的形式挖掘出其中的说话人相关信息。在NIST SRE 2008核心测试电话训练-电话测试数据库上,男声和女声的等错误率分别为4。96%和6。18%。进一步与基于线性判别分析的系统进行融合,能将等错误率降至4。74%和5。35%。  相似文献   

15.
The i-vector framework based system is one of the most popular systems in speaker identification (SID). In this system, session compensation is usually employed first and then the classifier. For any session-compensated representation of i-vector, there is a corresponding identification result, so that both the stages are related. However, in current SID systems, session compensation and classifier are usually optimized independently. An incomplete knowledge about the session compensation to the identification task may lead to involving uncertainties. In this paper, we propose a bilevel framework to jointly optimize session compensation and classifier to enhance the relationship between the two stages. In this framework, we use the sparse coding (SC) to obtain the session-compensated feature by learning an overcomplete dictionary, and employ the softmax classifier and support vector machine (SVM) in classifying respectively. Moreover, we present a joint optimization of the dictionary and classifier parameters under a discriminative criterion for classifier with conditions for SC. In addition, the proposed methods are evaluated on the King-ASR-010, VoxCeleb and RSR2015 databases. Compared with typical session compensation techniques, such as linear discriminant analysis (LDA) and nonparametric discriminant analysis (NDA), our methods can be more robust to complex session variability. Moreover, compared with the typical classifiers in i-vector framework, i.e. the cosine distance scoring (CDS) and probabilistic linear discriminant analysis (PLDA), our methods can be more suitable for SID (multiclass task).  相似文献   

16.
以说话人识别中的背景模型为基础,根据模型中的各个高斯分量,构造出说话人特征空间,将长度不一样的语句映射成为空间中大小相同的向量,且经过相关矩阵进行规整后,采用线性支持向量机进行说话人识别。借鉴几种常见的特征规整方式,结合语句映射后的向量,提出四种不同的规整方法:均值/方差规整、权重规整、WLOG规整和球形规整,并与概率序列核进行比较研究。根据语音特征向量序列中相邻的特征向量的前后转移关系,结合提出的概率序列核,构造出转移概率序列核。实验在NIST2001库上进行,结果表明概率序列核模型识别性能接近经典的UBM-MAP模型,将这两类模型得分进行融合,可非常明显地提高识别性能,进一步融合转移概率序列核后,性能还可提高19.1%。  相似文献   

17.
In this paper, speaker adaptive acoustic modeling is investigated by using a novel method for speaker normalization and a well known vocal tract length normalization method. With the novel normalization method, acoustic observations of training and testing speakers are mapped into a normalized acoustic space through speaker-specific transformations with the aim of reducing inter-speaker acoustic variability. For each speaker, an affine transformation is estimated with the goal of reducing the mismatch between the acoustic data of the speaker and a set of target hidden Markov models. This transformation is estimated through constrained maximum likelihood linear regression and then applied to map the acoustic observations of the speaker into the normalized acoustic space.Recognition experiments made use of two corpora, the first one consisting of adults’ speech, the second one consisting of children’s speech. Performing training and recognition with normalized data resulted in a consistent reduction of the word error rate with respect to the baseline systems trained on unnormalized data. In addition, the novel method always performed better than the reference vocal tract length normalization method adopted in this work.When unsupervised static speaker adaptation was applied in combination with each of the two speaker normalization methods, a different behavior was observed on the two corpora: in one case performance became very similar while in the other case the difference remained significant.  相似文献   

18.
在NIST SRE 2012年评测和实际应用中,可以用说话人的多个语音样本来注册说话人模型,并且这些语音样本取自于各种各样的信道。本文基于PLDA,尝试了多种打分方法,并提出一种新的得分规整技术,在NIST SRE 2012核心测试集上,EER平均提升26.0%,MinCost平均提升12.4%。  相似文献   

19.
近年来,基于总变化因子的说话人识别方法成为说话人识别领域的主流方法.其中,概率线性鉴别分析(Probabilistic linear discriminant analysis,PLDA)因其优异的性能而得到学者们的广泛关注.然而,在估计PLDA模型时,传统的因子分析方法只更新模型空间,因此,模型均值不能很好地与更新后的模型空间耦合.提出联合估计法对模型均值和模型空间同时估计,得到更为严格的期望最大化更新公式,在美国国家标准与技术局说话人识别评测2010扩展测试数据库以及2012核心测试数据库上,等错率得到一定提升.  相似文献   

20.
In this paper, we propose a sub-vector based speaker characterization method for biometric speaker verification, where speakers are represented by uniform segmentation of their maximum likelihood linear regression (MLLR) super-vectors called m-vectors. The MLLR transformation is estimated with respect to universal background model (UBM) without any speech/phonetic information. We introduce two strategies for segmentation of MLLR super-vector: one is called disjoint and other is an overlapped window technique. During test phase, m-vectors of the test utterance are scored against the claimant speaker. Before scoring, m-vectors are post-processed to compensate the session variability. In addition, we propose a clustering algorithm for multiple-class wise MLLR transformation, where Gaussian components of the UBM are clustered into different groups using the concept of expectation maximization (EM) and maximum likelihood (ML). In this case, MLLR transformations are estimated with respect to each class using the sufficient statistics accumulated from the Gaussian components belonging to the particular class, which are then used for m-vector system. The proposed method needs only once alignment of the data with respect to the UBM for multiple MLLR transformations. We first show that the proposed multi-class m-vector system shows promising speaker verification performance when compared to the conventional i-vector based speaker verification system. Secondly, the proposed EM based clustering technique is robust to the random initialization in-contrast to the conventional K-means algorithm and yields system performance better/equal which is best obtained by the K-means. Finally, we show that the fusion of the m-vector with the i-vector further improves the performance of the speaker verification in both score as well as feature domain. The experimental results are shown on various tasks of NIST 2008 speaker recognition evaluation (SRE) core condition.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号