基于双因子高斯过程动态模型的声道谱转换方法 Vocal Tract Spectrum Conversion Using a Two-factor Gaussian Process Dynamic Model期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于双因子高斯过程动态模型的声道谱转换方法

引用本文：	孙新建,张雄伟,杨吉斌,曹铁勇,钟新毅.基于双因子高斯过程动态模型的声道谱转换方法[J].自动化学报,2014,40(6):1198-1207.

作者姓名：	孙新建张雄伟杨吉斌曹铁勇钟新毅

作者单位：	1.解放军理工大学通信工程学院南京 210007;

基金项目：	国家自然科学基金（61072042），江苏省自然科学基金（BK2012510），解放军理工大学预先研究基金（20110205，20110211）资助

摘要：	针对作者已经提出的双因子高斯过程隐变量模型（Two-factor Gaussian process latent variable model，TF-GPLVM）用于语音转换时未考虑语音的动态特征，并且模型训练时需要估计的参数较多的问题，提出引入隐马尔科夫模型（Hidden Markov model，HMM）对语音动态特征进行建模，并利用HMM隐状态对各帧语音进行关于语义内容的概率软分类，建立了分离精度更高、运算负荷较小的双因子高斯过程动态模型（Two-factor Gaussian process dynamic model，TF-GPDM）.基于此模型，设计了一种全新的基于说话人特征替换的语音声道谱转换方案.主、客观实验结果表明，无论是与传统的统计映射和频率弯折转换方法相比，还是与双因子高斯过程隐变量模型方法相比，本文方法都获得了语音质量和转换相似度的提升，以及两项性能的更佳平衡.
关键词：	声道谱转换高斯过程隐变量模型双因子模型隐马尔科夫模型语音动态特征
收稿时间：	2012-12-12
Vocal Tract Spectrum Conversion Using a Two-factor Gaussian Process Dynamic Model

SUN Xin-Jian,ZHANG Xiong-Wei,YANG Ji-Bin,CAO Tie-Yong,ZHONG Xin-Yi.Vocal Tract Spectrum Conversion Using a Two-factor Gaussian Process Dynamic Model[J].Acta Automatica Sinica,2014,40(6):1198-1207.

Authors:	SUN Xin-Jian ZHANG Xiong-Wei YANG Ji-Bin CAO Tie-Yong ZHONG Xin-Yi

Affiliation:	1.College of Communication Engineering, PLA University of Science and Technology, Nanjing 210007;2.College of Command Information Systems, PLA University of Science and Technology, Nanjing 210007

Abstract:	We developed in a previous work a two-factor Gaussian process latent variable model (TF-GPLVM) to perform spectral conversion using a strategy of speaker characteristics replacement. Despite its improved performance compared with traditional mapping-based methods, the model suffers from two drawbacks: 1) it cannot capture the speech dynamical characteristics, and 2) there is a large number of parameters to estimate. To overcome these two drawbacks, we propose in this paper to combine TF-GPLVM with hidden Markov model (HMM), and develop an enhanced two-factor Gaussian process dynamic model (TF-GPDM). In the model, the speech dynamics are modeled by state transition probability of HMM, meanwhile speech frames are categorized into a limited number of phonetic content classes using HMM states. Both subjective and objective evaluations show that, compared with both traditional mapping-based methods, such as Gaussian mixture model (GMM) and FW, and TF-GPLVM based one, the proposed TF-GPDM not only improves the speech quality and identity similarity, but also reaches a better compromise between the two dimensions.

Keywords:	Vocal tract spectrum conversion Gaussian process latent variable model (GPLVM) two-factor model hidden Markov model (HMM) speech dynamical characteristics
本文献已被 CNKI 等数据库收录！
	点击此处可从《自动化学报》浏览原始摘要信息
	点击此处可从《自动化学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏