首页 | 本学科首页   官方微博 | 高级检索  
     

矢量量化正则变分自编码器做非平行语料语音转换
引用本文:王超,俞一彪.矢量量化正则变分自编码器做非平行语料语音转换[J].信号处理,2021,37(7):1339-1345.
作者姓名:王超  俞一彪
作者单位:苏州大学电子信息学院
摘    要:基于矢量量化变分自编码器(Vector Quantized Variational Autoencoder, VQVAE)的语音转换系统是国内外语音转换领域研究的一大热点,但是其较差的转换音质限制了模型的应用。本文在VQVAE的基础上提出一种改进的矢量量化正则变分自编码器(Vector Quantization Regularized Variational Autoencoder, VQ-REG-VAE)。在训练时,矢量量化退化为正则化项,通过矢量量化的正则约束让编码器学习生成说话人无关的语义特征,同时让解码器学习将说话人特征融合到语义特征中。在转换时,可以去掉矢量量化这一正则化项,通过编码器和解码器就能实现语音转换。由于转换时没有进行矢量量化,语义特征信息得以更好保留。客观和主观实验都表明:基于VQ-REG-VAE模型的转换语音在不降低相似度的前提下,音质比VQVAE模型有显著的提升。 

关 键 词:语音转换    矢量量化    矢量量化正则变分自编码器
收稿时间:2021-01-19

Vector Quantization Regularized Variational Autoencoders For Non-parallel Voice Conversion
Affiliation:School of Electronic and Information Engineering, Soochow University
Abstract:The vector quantized variational autoencoder(VQVAE) based voice conversion system is a hot spot in voice conversion area, but the poor quality of converted speeches limits its wide use. To address this problem, this paper proposes an improved model called vector quantization regularized variational autoencoder(VQ-REG-VAE). During training, vector quantization works as the regularization term. Through the regularization of vector quantization, the encoder learns to generate speaker-independent linguistic features while the decoder learns to fuse the speaker features into linguistic features. During conversion, voice conversion can be realized through the encoder and the decoder. Since vector quantization is not used during the conversion, more linguitic information can be preserved. The objective and subjective experiments have shown that, compared with VQVAE model, VQ-REG-VAE model achieved significant improvement in speech quality and comparable results in speaker similarity. 
Keywords:
点击此处可从《信号处理》浏览原始摘要信息
点击此处可从《信号处理》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号