首页 | 本学科首页   官方微博 | 高级检索  
     


Comparing ANN and GMM in a voice conversion framework
Authors:R.H. Laskar  D. Chakrabarty  F.A. Talukdar  K. Sreenivasa Rao  K. Banerjee
Affiliation:1. Department of Electronics & Communication Engineering, National Institute of Technology Silchar, Silchar 788010, Assam, India;2. Department of Electronics & Communication Engineering, Indian Institute of Technology Guwahati, Guwahati 781039, Assam, India;3. School of Information Technology, IIT Kharagpur, Kharagpur 721302, West Bengal, India
Abstract:In this paper, we present a comparative analysis of artificial neural networks (ANNs) and Gaussian mixture models (GMMs) for design of voice conversion system using line spectral frequencies (LSFs) as feature vectors. Both the ANN and GMM based models are explored to capture nonlinear mapping functions for modifying the vocal tract characteristics of a source speaker according to a desired target speaker. The LSFs are used to represent the vocal tract transfer function of a particular speaker. Mapping of the intonation patterns (pitch contour) is carried out using a codebook based model at segmental level. The energy profile of the signal is modified using a fixed scaling factor defined between the source and target speakers at the segmental level. Two different methods for residual modification such as residual copying and residual selection methods are used to generate the target residual signal. The performance of ANN and GMM based voice conversion (VC) system are conducted using subjective and objective measures. The results indicate that the proposed ANN-based model using LSFs feature set may be used as an alternative to state-of-the-art GMM-based models used to design a voice conversion system.
Keywords:Artificial neural networks   Gaussian mixture models   Prosody   Pitch contour   Intonation patterns   Duration patterns   Energy profiles   Residual modification
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号