一种基于改进注意力机制的实时鲁棒语音合成方法 A Real-time Robust Speech Synthesis Method Based on Improved Attention Mechanism期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

一种基于改进注意力机制的实时鲁棒语音合成方法

引用本文：	唐君,张连海,李嘉欣.一种基于改进注意力机制的实时鲁棒语音合成方法[J].信号处理,2022,38(3):527-535.

作者姓名：	唐君张连海李嘉欣

作者单位：	中国人民解放军战略支援部队信息工程大学信息系统工程学院，河南郑州 450001

基金项目：	国家自然科学基金资助项目61673395

摘要：	针对现有的语音合成系统Tacotron 2中存在的注意力模型学习慢、合成语音不够鲁棒以及合成语音速度较慢等问题,提出了三点改进措施:1.采用音素嵌入作为输入,以减少一些错误发音问题;2.引入一种注意力损失来指导注意力模型的学习,以实现其快速、准确的学习能力;3.采用WaveGlow模型作为声码器,以加快语音生成的速度....
关键词：	语音合成注意力损失机制 Tacotron 2 WaveGlow 序列到序列
收稿时间：	2021-06-11
A Real-time Robust Speech Synthesis Method Based on Improved Attention Mechanism

Affiliation:	School of Information System Engineering，PLA Strategic Support Force Information Engineering University，Zhengzhou，Henan 450001，China

Abstract:	In order to solve the problems of the existing speech synthesis system Tacotron 2，such as that the attention model is slow to learn， the synthesized speech is not robust enough， and the synthesized speech speed is slow， three improvement measures are proposed: 1.Use phoneme embedding as input to reduce some mispronunciation problem； 2.Introduce an attention loss to guide the learning of the attention model to realize its fast and accurate learning ability； 3.Use the WaveGlow model as a vocoder to accelerate the speed of speech generation. Experiments on the LJSpeech data set show that the improved network improves the speed and accuracy of attention learning， and the error rate of its synthesized speech is reduced by 33.4% compared to the baseline； at the same time， the speed of synthesized speech of the entire network is increased by approximately 523 times， the Real-Time Factor （RTF） is 0.96， which meets the real-time requirements； in addition， in terms of voice quality， the Mean Opinion Score （MOS） of synthesized speech reaches 3.88.

Keywords:

	点击此处可从《信号处理》浏览原始摘要信息
	点击此处可从《信号处理》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏