首页 | 本学科首页   官方微博 | 高级检索  
     


Using LSTM neural networks for cross-lingual phonetic speech segmentation with an iterative correction procedure
Authors:Zdeněk Hanzlíček  Jindřich Matoušek  Jakub Vít
Affiliation:1. NTIS–New Technologies for the Information Society, Faculty of 2. Applied Sciences, University of West Bohemia, Pilsen, Czech Republic
Abstract:This article describes experiments on speech segmentation using long short-term memory recurrent neural networks. The main part of the paper deals with multi-lingual and cross-lingual segmentation, that is, it is performed on a language different from the one on which the model was trained. The experimental data involves large Czech, English, German, and Russian speech corpora designated for speech synthesis. For optimal multi-lingual modeling, a compact phonetic alphabet was proposed by sharing and clustering phones of particular languages. Many experiments were performed exploring various experimental conditions and data combinations. We proposed a simple procedure that iteratively adapts the inaccurate default model to the new voice/language. The segmentation accuracy was evaluated by comparison with reference segmentation created by a well-tuned hidden Markov model-based framework with additional manual corrections. The resulting segmentation was also employed in a unit selection text-to-speech system. The generated speech quality was compared with the reference segmentation by a preference listening test.
Keywords:LSTM neural networks  multi-lingual and cross-lingual modeling  speech segmentation
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号