A Mixture of Recurrent Neural Networks for Speaker Normalisation期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

A Mixture of Recurrent Neural Networks for Speaker Normalisation

Authors:	Edmondo Trentin Diego Giuliani

Affiliation:	(1) ITC-irst, Centro per la Ricerca Scientifica e Technologica, Povo (Trento), Italy, IT

Abstract:	In spite of recent advances in automatic speech recognition, the performance of state-of-the-art speech recognisers fluctuates depending on the speaker. Speaker normalisation aims at the reduction of differences between the acoustic space of a new speaker and the training acoustic space of a given speech recogniser, improving performance. Normalisation is based on an acoustic feature transformation, to be estimated from a small amount of speech signal. This paper introduces a mixture of recurrent neural networks as an effective regression technique to approach the problem. A suitable Vit-erbi-based time alignment procedure is proposed for generating the adaptation set. The mixture is compared with linear regression and single-model connectionist approaches. Speaker-dependent and speaker-independent continuous speech recognition experiments with a large vocabulary, using Hidden Markov Models, are presented. Results show that the mixture improves recognition performance, yielding a 21% relative reduction of the word error rate, i.e. comparable with that obtained with model-adaptation approaches.

Keywords:	:Mixture of neural networks Multivariate regression Recurrent neural network Speaker adaptation Speaker normalisation Speech recognition
本文献已被 SpringerLink 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏