首页 | 本学科首页   官方微博 | 高级检索  
     


Batch-mode semi-supervised active learning for statistical machine translation
Authors:Sankaranarayanan Ananthakrishnan  Rohit Prasad  David Stallard  Prem Natarajan
Affiliation:BBN Technologies, Speech & Language Processing Unit, 10 Moulton Street, Cambridge, MA, USA
Abstract:The development of high-performance statistical machine translation (SMT) systems is contingent on the availability of substantial, in-domain parallel training corpora. The latter, however, are expensive to produce due to the labor-intensive nature of manual translation. We propose to alleviate this problem with a novel, semi-supervised, batch-mode active learning strategy that attempts to maximize in-domain coverage by selecting sentences, which represent a balance between domain match, translation difficulty, and batch diversity. Simulation experiments on an English-to-Pashto translation task show that the proposed strategy not only outperforms the random selection baseline, but also traditional active selection techniques based on dissimilarity to existing training data.
Keywords:
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号