Using LSTM neural networks for cross-lingual phonetic speech segmentation with an iterative correction procedure期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Using LSTM neural networks for cross-lingual phonetic speech segmentation with an iterative correction procedure

Authors:	Zdeněk Hanzlíček Jindřich Matoušek Jakub Vít

Affiliation:	1. NTIS–New Technologies for the Information Society, Faculty of 2. Applied Sciences, University of West Bohemia, Pilsen, Czech Republic

Abstract:	This article describes experiments on speech segmentation using long short-term memory recurrent neural networks. The main part of the paper deals with multi-lingual and cross-lingual segmentation, that is, it is performed on a language different from the one on which the model was trained. The experimental data involves large Czech, English, German, and Russian speech corpora designated for speech synthesis. For optimal multi-lingual modeling, a compact phonetic alphabet was proposed by sharing and clustering phones of particular languages. Many experiments were performed exploring various experimental conditions and data combinations. We proposed a simple procedure that iteratively adapts the inaccurate default model to the new voice/language. The segmentation accuracy was evaluated by comparison with reference segmentation created by a well-tuned hidden Markov model-based framework with additional manual corrections. The resulting segmentation was also employed in a unit selection text-to-speech system. The generated speech quality was compared with the reference segmentation by a preference listening test.

Keywords:	LSTM neural networks multi-lingual and cross-lingual modeling speech segmentation

设为首页 | 免责声明 | 关于勤云 | 加入收藏