首页 | 本学科首页   官方微博 | 高级检索  
     

基于卷积非负矩阵分解的语音转换方法
引用本文:孙健,张雄伟,曹铁勇,孙新建.基于卷积非负矩阵分解的语音转换方法[J].数据采集与处理,2013,28(2):141.
作者姓名:孙健  张雄伟  曹铁勇  孙新建
作者单位:1. 解放军理工大学通信工程学院,南京,210007
2. 解放军理工大学指挥自动化学院,南京,210007
基金项目:国家自然科学基金,解放军理工大学预先研究基金
摘    要:为了在语音转换过程中充分考虑语音的帧间相关性,提出了一种基于卷积非负矩阵分解的语音转换方法.卷积非负矩阵分解得到的时频基可较好地保存语音信号中的个人特征信息及帧间相关性.利用这一特性,在训练阶段,通过卷积非负矩阵分解从训练数据中提取源说话人和目标说话人相匹配的时频基.在转换阶段,通过时频基替换实现对源说话人语音的转换.相对于传统方法,本方法能够更好地保存和转换语音帧间相关性.实验仿真及主、客观评价结果表明,与基于高斯混合模型、状态空间模型的语音转换方法相比,该方法具有更好的转换语音质量和转换相似度.

关 键 词:语音转换  卷积非负矩阵分解  时频基
收稿时间:2011/11/17 0:00:00
修稿时间:2012/2/17 0:00:00

Voice Conversion Based on Convolutive Nonnegative Matrix Factorization
sun jian,zhang xiongwei,cao tieyong and sun xinjian.Voice Conversion Based on Convolutive Nonnegative Matrix Factorization[J].Journal of Data Acquisition & Processing,2013,28(2):141.
Authors:sun jian  zhang xiongwei  cao tieyong and sun xinjian
Affiliation:Institute of Communication Engineering, PLA Univ. of Sci. & Tech.,Institute of Command Automation, PLA Univ. of Sci. & Tech.,Institute of Command Automation, PLA Univ. of Sci. & Tech.,Institute of Command Automation, PLA Univ. of Sci. & Tech.
Abstract:In order to fully consider the inter-frame correlation in voice conversion, a convolutive nonnegative matrix factorization based voice conversion method is proposed. The personal characteristics and inter-frame correlation in voice can be well preserved in the time-frequency bases obtained from convolutive nonnegative matrix factorization. With this feature, during the training phase of voice conversion, the matching time-frequency bases of source and target speakers can be extracted from training data through convolutive nonnegative matrix factorization. Then in the conversion phase, the voice of source speaker is converted through time-frequency bases substitution. Compared to traditional methods, the inter-frame correlation in voice can be better preserved and converted in the proposed method. Experimental results using objective and subjective evaluations show that the proposed method outperforms the Gaussian Mixture Model and State Space Model based methods in the view of both speech quality and conversion similarity to the target speech.
Keywords:Voice Conversion  Convolutive Nonnegative Matrix Factorization  time-frequency bases
本文献已被 万方数据 等数据库收录!
点击此处可从《数据采集与处理》浏览原始摘要信息
点击此处可从《数据采集与处理》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号