首页 | 本学科首页   官方微博 | 高级检索  
     


Backoff hierarchical class n-gram language models: effectiveness to model unseen events in speech recognition
Affiliation:1. Univ. Paris-Sud 11, LEMHE – ICMMO, CNRS UMR 8182, F-91405 Orsay Cedex, France;2. CEA, DEN, DM2S, SEMT, CEA Saclay, F-91191 Gif-sur-Yvette, France;1. INERIS (METO), Verneuil-en-Halatte 60550, France;2. ODESIA Neosciences, Sophia Antipolis F- 31076, France;3. INRA Toxalim, Sophia Antipolis F- 31076, France;4. AgroParisTech, UMR 1402 INRA-AgroParisTech EcoSys, Thiverval-Grignon F- 78850, France;1. National Kaohsiung Marine University, Department of Marine Engineering, No.482, Jhongjhou 3rd Rd., Cijin District, Kaohsiung City 80543, Taiwan;2. Providence University, Department of Financial and Computational Mathematics, 200, Sec. 7, Taiwan Boulevard, Shalu Dist., Taichung City 43301, Taiwan
Abstract:In this paper, we introduce the backoff hierarchical class n-gram language models to better estimate the likelihood of unseen n-gram events. This multi-level class hierarchy language modeling approach generalizes the well-known backoff n-gram language modeling technique. It uses a class hierarchy to define word contexts. Each node in the hierarchy is a class that contains all the words of its descendant nodes. The closer a node to the root, the more general the class (and context) is. We investigate the effectiveness of the approach to model unseen events in speech recognition. Our results illustrate that the proposed technique outperforms backoff n-gram language models. We also study the effect of the vocabulary size and the depth of the class hierarchy on the performance of the approach. Results are presented on Wall Street Journal (WSJ) corpus using two vocabulary set: 5000 words and 20,000 words. Experiments with 5000 word vocabulary, which contain a small numbers of unseen events in the test set, show up to 10% improvement of the unseen event perplexity when using the hierarchical class n-gram language models. With a vocabulary of 20,000 words, characterized by a larger number of unseen events, the perplexity of unseen events decreases by 26%, while the word error rate (WER) decreases by 12% when using the hierarchical approach. Our results suggest that the largest gains in performance are obtained when the test set contains a large number of unseen events.
Keywords:
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号