首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 187 毫秒
1.
毛鹏飞  刘加 《电声技术》2009,33(11):56-59
实现了一个高性能、低成本、低功耗的声纹确认片上系统(SOC)。系统核心算法采用基于高斯混合模型以及通用背景模型(GMM—UBM)建模的说话人确认算法,采用了Mel倒谱系数(MFCC)作为说话人特征。此SOC系统不仅可进行声纹确认,而且包含说话人模型的训练,可实时更新说话人的人数和模型。系统的平均EER达到了0.0342。  相似文献   

2.
针对说话人分段与聚类算法中先验知识不足的问题,利用基于信息瓶颈(IB)准则和基于隐马尔科夫模型(HMM)/高斯混合模型(GMM)方法间的互补性,提出了一种基于特征层融合的说话人分段与聚类算法。该算法将基于IB准则算法的输出结果进行对数变换和降维处理;然后利用变换后的特征与传统梅尔频率倒谱系数(MFCC)特征分别训练说话人GMM模型,并在得分域对说话人类别的得分进行加权融合;根据融合的得分,进行基于HMM/GMM模型的说话人分段与聚类。实验表明,融合后的特征可以为系统提供更多的先验信息,比传统方法的误配率降低了1.2%。  相似文献   

3.
在文本无关的说话人识别中,采用均值超向量作为特征向量的支持向量机系统性能已经超过了传统的混合高斯-通用背景模型系统,但是信道的影响在均值超向量上仍然存在。该文对因子分析算法进行修改后,可以解决均值超向量的信道问题,能够取得优于扰动属性映射的性能,更重要的是采用因子分析的系统的稳定性可以得到保证。在NIST 2006说话人测试数据库上,利用该文的方法能够取得等错误率6.0%。  相似文献   

4.
介绍了一个基于GMM实时说话人识别系统的设计与实现,系统具有实时说话人辨认和实时说话人确认功能。在实验室条件下,对不同的高斯混合密度个数及采样率进行了测试,测试了模型的自适应性能。实验表明系统具有较好的识别准确率。  相似文献   

5.
本文针对摄像机固定下的复杂背景环境,提出一种基于时空的自适应混合高斯背景建模方法,克服经典混合高斯模型(Gaussian Mixture Model,GMM)中只考虑单个像素的独立性而忽略相邻像素间的空间域相关性。首先采用混合高斯模型对每个像素在时间域上进行学习,然后利用相邻像素的自信息对背景及前景目标进行二次聚类,以修正错误的判断。实验结果表明,与经典混合高斯背景算法相比,本文提出的方法目标检测结果更加完整,具有更强的鲁棒性和很好的应用前景。  相似文献   

6.
对于采用高斯混合模型(GMM)的与文本无关的说话人识别,出于模型参数数量和计算量的考虑 GMM的协方差矩阵通常取为对角矩阵形式,并假设观察矢量各维之间是不相关的。然而,这种假设在大多情况下是不成立的。为了使观察矢量空间适合于采用对角协方差的GMM进行拟合,通常采用对参数空间或模型空间进行解相关变换。该文提出了一种改进模型空间解相关的PCA方法,通过直接对GMM的各高斯成分的协方差进行主成分分析,使参数空间分布更符合使用对角化协方差的混合高斯分布,并通过共享PCA变换阵的方法减少参数数量和计算量。在微软语音库上的说话人识别实验表明,该方法取得了比常规的对角协方差GMM系统的最优结果有相对35%的误识率下降。  相似文献   

7.
说话人识别的关键在于如何为集合中的每一个人建立一个能表征该说话人个性特征的声学模型,建模方法将会严重影响系统的性能。基于当今与文本无关的话者识别的主流模型——高斯混合模型(Gaussian Mixture Model,GMM)的基础上,从声学的角度剖析了男女发音的差别,以增加说话人之间的差异性为出发点,引入竞争性思想和通用背景模型(Universal Background Model,UBM),提出了具有区分性的GMM的建模方法,克服了传统GMM需要大量训练样本的局限性和UBM将说话人强制服从统一分布的弱点。最后实验的对比结果表明,具有区分性的GMM相比传统的高斯混合模型在识别率上有所提高。  相似文献   

8.
一种自适应调整K-r的混合高斯背景建模和目标检测算法   总被引:1,自引:0,他引:1  
针对非平稳背景下的复杂场景,该文提出一种自适应调整K-r的混合高斯背景建模和目标检测算法。该方法利用混合高斯模型(GMM)学习每个像素在时间域上的分布,构建自适应调整高斯分量K的方法,并针对不同情况,对描述像素的高斯分量数进行增加、删除或合并;在此基础上,模型参数更新式中引入了两个新的参数,能够根据实际情况自适应调整r值,使得背景建模和目标检测能够准确实时地随像素变化而变化,从而减少了运动目标信息的损失,提高了算法的鲁棒性和收敛性。实验表明,该算法在有诸多不确定因素的序列视频中能够迅速响应实际场景的变化,实现自适应背景建模和准确的目标检测。  相似文献   

9.
陈存宝  赵力 《信号处理》2010,26(4):563-568
本文提出了一种嵌入时延神经网络(TDNN)的高斯混合背景模型(GMM UBM)说话人确认方法,它集成了作为判别性方法的时延神经网络和作为生成性方法的高斯混合模型各自的优点。该方法利用时延神经网络挖掘特征向量集的时序性,然后把时间信息传递给GMM;并且通过时延网络的变换使需要假设变量独立的最大似然概率(ML)方法更为合理。该方法利用极大似然概率作为训练准则,把高斯混合模型和神经网络作为整体来进行训练。训练过程中,高斯混合模型和神经网络的参数交替更新。实验结果表明,采用本文提出的方法结合TNorm比基线系统的等误差率(EER)降低28%。   相似文献   

10.
针对传统连续自适应均值漂移(CAMshift)跟踪算法在建立目标颜色模型阶段容易包含大量背景颜色信息从而使跟踪效果变差的问题,该文提出一种改进算法。利用混合高斯模型背景法(GMM)将原始图像分割成前景和背景的叠加,在原始图像和背景图像上运动物体所在区域分别建立色调分量直方图,利用背景图像的色调分量直方图计算原始图像中对应色调分量的权值,抑制原始图像中与背景颜色相同的色调,扩大前景与背景颜色的差异性。该方法通过对原始颜色模型中属于背景的色调进行抑制,扩大了目标颜色模型的显著性,提高了跟踪的准确性和稳定性,目标定位的最大中心误差小于20%,能够准确跟踪不发生丢失。  相似文献   

11.
大多数说话人确认系统都设置一个背景模型用于描述假冒者的特性。文中提出一种新的说话人确认背景模型,对所有说话人采用同一全局背景模型(UBM),并为每个说话人建立一个竞争者模型(cohort model)和一个疏远者模型(c-cohort model)。在全局背景模型不能做出准确判断的情况下,启用竞争者模型或疏远者模型再次进行判决。该模型充分利用了相近者模型和疏远者模型的特性。实验表明新的背景模型使系统性能有明显的提高。  相似文献   

12.
张涛涛  陈丽萍  戴礼荣 《信号处理》2016,32(10):1213-1219
在说话人确认中,特征端因子分析(Acoustic Factor Analysis, AFA)利用MPPCA(Mixtures of Probabilistic Principal Component Analyzers, MPPCA)算法在通用背景模型(Universal Background Model, UBM)的每个高斯上分别对特征降维以去除语音特征中文本、信道和噪声等信息的干扰,获得增强的说话人信息并用于提升说话人确认的性能。但是通用背景模型属于无监督的聚类方法,其每个高斯成分物理意义不够明确,不能区分不同说话人发不同音素时的情况。为解决这一问题,本文利用语音识别中的声学模型深度神经网络(Deep Neural Network, DNN)取代传统的通用背景模型并结合特征端因子分析分别对不同音素上的语音特征进行降维提取出说话人信息,进而提取DNN i-vector用于说话人确认。在RSR2015数据库PartIII上的实验结果表明该方法相对于基于UBM的特征端因子分析方法在男女测试集上等错误率(Equal Error Rate, EER)分别下降13.49%和22.43%.   相似文献   

13.
Speaker adaptive test normalization (ATnorm) is the most effective approach of the widely used score normalization in text-flldependent speaker verification, which selects speaker adaptive impostor cohorts with an extra development corpus in order to enhance the recognition performance. In this paper, an improved implementation of ATnorm that can offer overall significant advantages over the original ATnorm is presented. This method adopts a novel cross similarity measurement in speaker adaptive cohort model selection without an extra development corpus. It can achieve a comparable performance with the original ATnorm and reduce the computation complexity moderately. With the full use of the saved extra development corpus, the overall system performance can be improved significantly. The results are presented on NIST 2006 Speaker Recognition Evaluation data corpora where it is shown that this method provides significant improvements in system performance, with relatively 14.4% gain on equal error rate (EER) and 14.6% gain on decision cost function (DCF) obtained as a whole.  相似文献   

14.
朱唯鑫  郭武 《信号处理》2016,32(7):859-865
本文首次提出了长度规整的最大后验估计(MAP)方法,并将其应用到说话人分割聚类中的交叉似然比(CLR)和T Test这两种度量距离上。传统的MAP方法需要在通用背景模型(UBM)基础上进行统计量的计算,进而对模型参数进行自适应偏移,因此偏移的程度与语音片段的长度正相关。当在度量两个长度不相同的语音片段的相似性时,传统的MAP方法会使得说话人模型刻画不准确,从而影响距离度量。本文在MAP过程中,根据语音的长度对相关因子进行规整,然后再进行模型参数的调整,从而使得模型参数与语音长度无关,更能体现说话人的身份信息。在中文多人电视访谈节目数据的分割聚类评测任务上,采用长度规整的MAP方法相对于传统方法都有明显提升,在CLR度量准则下分割聚类错误率相对下降了35%,在T Test度量准则下分割聚类错误率相对下降了107%。   相似文献   

15.
In this letter, we introduce confusion‐based confidence measures for detecting an impostor in speaker recognition, which does not require an alternative hypothesis. Most traditional speaker verification methods are based on a hypothesis test, and their performance depends on the robustness of an alternative hypothesis. Compared with the conventional Gaussian mixture model–universal background model (GMM‐UBM) scheme, our confusion‐based measures show better performance in noise‐corrupted speech. The additional computational requirements for our methods are negligible when used to detect or reject impostors.  相似文献   

16.
This paper presents a generalized i-vector representation framework with phonetic tokenization and tandem features for text independent as well as text dependent speaker verification. In the conventional i-vector framework, the tokens for calculating the zero-order and first-order Baum-Welch statistics are Gaussian Mixture Model (GMM) components trained from acoustic level MFCC features. Yet besides MFCC, we believe that phonetic information makes another direction that can benefit the system performance. Our contribution in this paper lies in integrating phonetic information into the i-vector representation by several extensions, forming a more generalized i-vector framework. First, the tokens for calculating the zero-order statistics is extended from the MFCC trained GMM components to phonetic phonemes, trigrams and tandem feature trained GMM components, using phoneme posterior probabilities. Second, given the zero-order statistics (posterior probabilities on tokens), the feature used to calculate the first-order statistics is also extended from MFCC to tandem feature, and is not necessarily the same feature employed by the tokenizer. Third, the zero-order and first-order statistics vectors are then concatenated and represented by the simplified supervised i-vector approach followed by the standard Probabilistic Linear Discriminant Analysis (PLDA) back-end. We study different token and feature combinations, and we show that the feature level fusion of acoustic level MFCC features and phonetic level tandem features with GMM based i-vector representation achieves the best performance for text independent speaker verification. Furthermore, we demonstrate that the phonetic level phoneme constraints introduced by the tandem features help the text dependent speaker verification system to reject wrong password trials and improve the performance dramatically. Experimental results are reported on the NIST SRE 2010 common condition 5 female part task and the RSR 2015 part 1 female part task for text independent and text dependent speaker verification, respectively. For the text independent speaker verification task, the proposed generalized i-vector representation outperforms the i-vector baseline by relatively 53 % in terms of equal error rate (EER) and norm minDCF values. For the text dependent speaker verification task, our proposed approach also reduced the EER significantly from 23 % to 90 % relatively for different types of trials.  相似文献   

17.
This study presents investigations into the effectiveness of the state-of-the-art speaker verification techniques (i.e. GMM?UBM and GMM?SVM) in mismatched noise conditions. Based on experiments using white and real world noise, it is shown that the verification performance offered by these methods is severely affected when the level of degradation in the test material is different from that in the training utterances. To address this problem, a modified realisation of the parallel model combination (PMC) method is introduced and a new form of test normalisation (T-norm), termed condition adjusted T-norm, is proposed. It is experimentally demonstrated that the use of these techniques with GMM?UBM can significantly enhance the accuracy in mismatched noise conditions. Based on the experimental results, it is observed that the resultant relative improvement achieved for GMM?UBM (under the most severe mismatch condition considered) is in excess of 70%. Additionally, it is shown that the improvement in the verification accuracy achieved in this way is higher than that obtainable with the direct use of PMC with GMM?UBM. Moreover, it is found that while the accuracy performance of GMM?SVM can also considerably benefit from the use of these techniques, the extensive computational cost involved in this case severely limits the use of such a combined approach in practice.  相似文献   

18.
对于采用统一阈值的,基于高斯混合模型(GMM)的文本无关说话人确认系统,由于不同的话者模型的输出评分分布的不同,会影响到系统的确认性能,为此,需对输出评分进行规整。本文提出了一种新的评分规整方法-整体规整。整体规整同时考虑了不同测试语音和不同话者模型的差异,并在评分域做出调整,使得所有语音的输出评分具有相似的分布,从而使系统整体分类能力得以保证。在NIST’03电话语音库上进行的实验表明,采用了整体规整后的系统性能和传统的评分规整方法比较,有了明显提高。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号