说话人识别中的因子分析以及空间拼接 Factor Analysis and Space Assembling in Speaker Recognition期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

说话人识别中的因子分析以及空间拼接

引用本文：	郭武,李轶杰,戴礼荣,王仁华.说话人识别中的因子分析以及空间拼接[J].自动化学报,2009,35(9):1193-1198.

作者姓名：	郭武李轶杰戴礼荣王仁华

作者单位：	1.中国科学技术大学多媒体计算与通信教育部-微软重点实验室合肥 230027

基金项目：	国家自然科学基金(60970161);;多媒体计算与通信教育部-微软重点实验室科研基金资助(07122803)~~

摘要：	联合因子分析可以有效拟合混合高斯模型中的说话人和信道差异, 在说话人识别中得到广泛应用. 一般情况下, 该算法在对说话人和信道两个载荷矩阵进行联合估计时, 说话人残差矩阵无法发挥作用, 信道载荷矩阵的因子数不能提高. 本文提出说话人载荷矩阵、说话人残差载荷矩阵采用串行的训练模式, 在信道载荷矩阵训练中采用矩阵拼接的方法, 能够有效提高识别率; 在NIST SRE 2008年核心测试数据库的五个部分分别达到等错误率3.3%, 5.1%, 5.0%, 5.3%和5.0%.
关键词：	说话人识别联合因子分析本征音因子说话人确认期望最大化
收稿时间：	2008-6-11
修稿时间：	2009-1-3
Factor Analysis and Space Assembling in Speaker Recognition

GUO Wu LI Yi-Jie DAI Li-Rong WANG Ren-Hua .Ministry of Education-Microsoft Key Laboratory of Multimedia Computing , Communication,University of Science , Technology of China,Hefei.Factor Analysis and Space Assembling in Speaker Recognition[J].Acta Automatica Sinica,2009,35(9):1193-1198.

Authors:	GUO Wu LI Yi-Jie DAI Li-Rong WANG Ren-Hua Ministry of Education-Microsoft Key Laboratory of Multimedia Computing Communication University of Science Technology of China Hefei

Affiliation:	1.Ministry of Education-Microsoft Key Laboratory of Multimedia Computing and Communication, University of Science and Technology of China, Hefei 230027

Abstract:	Factor analysis is a model of the speaker and session variability in Gaussian mixture models and is widely used in text-independent speaker recognition. There exist two issues when the loading matrices of the eigenvoice and eigenchannel are estimated jointly. First, the speaker diagonal matrix (residual) will not take effect; second, the channel factors can not be very large. In this paper, the loading matrices of eigenvoice and the diagonal are calculated serially and different eigenchannel matrices are assembled to form a large channel loading matrix. The performance can be improved by the proposed algorithm. In the NIST speaker recognition evaluation (SRE) 2008 core test corpus, the equal error rates (EERs) of the five sub sessions were 3.3%, 5.1%, 5.0%, 5.3%, and 5.0%.

Keywords:	Speaker recognition joint factor analysis eigenvoice speaker verification expectation maximization
本文献已被 CNKI 等数据库收录！
	点击此处可从《自动化学报》浏览原始摘要信息
	点击此处可从《自动化学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏