Synthesizing English emphatic speech for multimodal corrective feedback in computer-aided pronunciation training期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Synthesizing English emphatic speech for multimodal corrective feedback in computer-aided pronunciation training

Authors:	Fanbo Meng Zhiyong Wu Jia Jia Helen Meng Lianhong Cai

Affiliation:	1. Department of Computer Science and Technology, Tsinghua University, Beijing, 100084, China 2. Tsinghua National Laboratory for Information Science and Technology (TNList), Beijing, China 3. Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Hong Kong SAR, China 4. Tsinghua-CUHK Joint Research Center for Media Sciences, Technologies and Systems, Graduate School at Shenzhen, Tsinghua University, Shenzhen, 518055, China

Abstract:	Emphasis plays an important role in expressive speech synthesis in highlighting the focus of an utterance to draw the attention of the listener. We present a hidden Markov model (HMM)-based emphatic speech synthesis model. The ultimate objective is to synthesize corrective feedback in a computer-aided pronunciation training (CAPT) system. We first analyze contrastive (neutral versus emphatic) speech recording. The changes of the acoustic features of emphasis at different prosody locations and the local prominences of emphasis are analyzed. Based on the analysis, we develop a perturbation model that predicts the changes of the acoustic features from neutral to emphatic speech with high accuracy. Further based on the perturbation model we develop an HMM-based emphatic speech synthesis model. Different from the previous work, the HMM model is trained with neutral corpus, but the context features and additional acoustic-feature-related features are used during the growing of the decision tree. Then the output of the perturbation model can be used to supervise the HMM model to synthesize emphatic speeches instead of applying the perturbation model at the backend of a neutral speech synthesis model directly. In this way, the demand of emphasis corpus is reduced and the speech quality decreased by speech modification algorithm is avoided. The experiments indicate that the proposed emphatic speech synthesis model improves the emphasis quality of synthesized speech while keeping a high degree of the naturalness.

Keywords:
本文献已被 SpringerLink 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏