Continuous space language models期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Continuous space language models

Affiliation:	1. Department of Software and Computing Systems, University of Alicante, Apdo. de correos 99, E-03080 Alicante, Spain;2. Laboratoire Cedric, CNAM, 292 rue Saint Martin, 75141 Paris cedex 3, France

Abstract:	This paper describes the use of a neural network language model for large vocabulary continuous speech recognition. The underlying idea of this approach is to attack the data sparseness problem by performing the language model probability estimation in a continuous space. Highly efficient learning algorithms are described that enable the use of training corpora of several hundred million words. It is also shown that this approach can be incorporated into a large vocabulary continuous speech recognizer using a lattice rescoring framework at a very low additional processing time. The neural network language model was thoroughly evaluated in a state-of-the-art large vocabulary continuous speech recognizer for several international benchmark tasks, in particular the Nist evaluations on broadcast news and conversational speech recognition. The new approach is compared to four-gram back-off language models trained with modified Kneser–Ney smoothing which has often been reported to be the best known smoothing method. Usually the neural network language model is interpolated with the back-off language model. In that way, consistent word error rate reductions for all considered tasks and languages were achieved, ranging from 0.4% to almost 1% absolute.

Keywords:
本文献已被 ScienceDirect 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏