首页 | 本学科首页   官方微博 | 高级检索  
     


Paraphrastic language models
Affiliation:1. Department of Radiation Oncology, Shandong Cancer Hospital and Institute, Shandong University, Jinan, Shandong, China;2. Key Laboratory of Radiation Oncology of Shandong Province, Shandong Cancer Hospital and Institute, Jinan, Shandong, China;3. Department of Radiation Oncology, Duke University Medical Center, Durham, NC, USA;4. Department of Nuclear Medicine, Shandong Cancer Hospital and Institute, Jinan, Shandong, China;5. Department of Radiology, Shandong Cancer Hospital and Institute, Jinan, Shandong, China
Abstract:Natural languages are known for their expressive richness. Many sentences can be used to represent the same underlying meaning. Only modelling the observed surface word sequence can result in poor context coverage and generalization, for example, when using n-gram language models (LMs). This paper proposes a novel form of language model, the paraphrastic LM, that addresses these issues. A phrase level paraphrase model statistically learned from standard text data with no semantic annotation is used to generate multiple paraphrase variants. LM probabilities are then estimated by maximizing their marginal probability. Multi-level language models estimated at both the word level and the phrase level are combined. An efficient weighted finite state transducer (WFST) based paraphrase generation approach is also presented. Significant error rate reductions of 0.5–0.6% absolute were obtained over the baseline n-gram LMs on two state-of-the-art recognition tasks for English conversational telephone speech and Mandarin Chinese broadcast speech using a paraphrastic multi-level LM modelling both word and phrase sequences. When it is further combined with word and phrase level feed-forward neural network LMs, a significant error rate reduction of 0.9% absolute (9% relative) and 0.5% absolute (5% relative) were obtained over the baseline n-gram and neural network LMs respectively.
Keywords:Language modelling  Paraphrase  Speech recognition
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号