首页 | 本学科首页   官方微博 | 高级检索  
     


Speaker conversion using kernel non-negative matrix factorization
Authors:Xu Qinyu  Lu Guanming  Yan Jingjie  Li Haibo  Cheng Xiao
Affiliation:1. College of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210003, China;2. Jiangsu Province Key Laboratory on Image Processing and Image Communication, Nanjing 210003, China
Abstract:Voice conversion (VC) based on Gaussian mixture model (GMM) is the most classic and common method which converts the source spectrum to target spectrum. However this method is prone to over-fitting because of its frame-by-frame conversion. The VC with non-negative matrix factorization (NMF) is presented in this paper, which can keep spectrum from over-fitting by adjusting the size of basis vector (dictionary). In order to realize the non-linear mapping better, kernel NMF (KNMF) is adopted to achieve spectrum mapping. In addition, to increase the accuracy of conversion, KNMF combined with GMM (GKNMF) is also introduced into VC. In the end, KNMF, GKNMF, GMM, principal component regression (PCR), PCR combined with GMM (GPCR), partial least square regression (PLSR), NMF correlation-based frequency warping (NMF-CFW) and deep neural network (DNN) methods are compared with each other. The proposed GKNMF gets better performance in both objective evaluation and subjective evaluation.
Keywords:VC  kernel  NMF  spectrum mapping
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号