首页 | 本学科首页   官方微博 | 高级检索  
     


Machine translation evaluation versus quality estimation
Authors:Lucia Specia  Dhwaj Raj  Marco Turchi
Affiliation:(1) University of Maryland, College Park, MD, USA;(2) Defense Advanced Research Projects Agency, Arlington, VA, USA;(2) University of Maryland, College Park, MD, USA;(2) University of Maryland, College Park, MD, USA;(2) National Institute of Standards and Technology, Gaithersburg, MD, USA;(2) Linguistic Data Consortium, University of Pennsylvania, Philadelphia, PA, USA;(2) National Institute of Standards and Technology, Gaithersburg, MD, USA;(2) University of Maryland, College Park, MD, USA;(2) University of Maryland, College Park, MD, USA;(2) BBN Technologies, Cambridge, MA, USA;(2) University of Maryland, College Park, MD, USA;(2) BBN Technologies, Cambridge, MA, USA;(2) University of Maryland, College Park, MD, USA;(2) BBN Technologies, Cambridge, MA, USA;(2) University of Maryland, College Park, MD, USA;(2) Lincoln Laboratory, Massachusetts Institute of Technology, Lexington, MA, USA;(2) University of Maryland, College Park, MD, USA;(2) Lincoln Laboratory, Massachusetts Institute of Technology, Lexington, MA, USA;(2) Defense Language Institute, Monterey, CA, USA;(2) University of Maryland, College Park, MD, USA;(2) Lincoln Laboratory, Massachusetts Institute of Technology, Lexington, MA, USA;(2) National Institute of Standards and Technology, Gaithersburg, MD, USA;(2) Linguistic Data Consortium, University of Pennsylvania, Philadelphia, PA, USA;(2) National Institute of Standards and Technology, Gaithersburg, MD, USA;(2) Linguistic Data Consortium, University of Pennsylvania, Philadelphia, PA, USA;(2) National Institute of Standards and Technology, Gaithersburg, MD, USA;(2) Linguistic Data Consortium, University of Pennsylvania, Philadelphia, PA, USA;(2) National Institute of Standards and Technology, Gaithersburg, MD, USA;(2) Linguistic Data Consortium, University of Pennsylvania, Philadelphia, PA, USA;(2) University of Maryland, College Park, MD, USA;(2) RWTH Aachen University, Aachen, Germany;(2) RWTH Aachen University, Aachen, Germany;(2) Carnegie Mellon University, Pittsburgh, PA, USA;(2) Carnegie Mellon University, Pittsburgh, PA, USA;(2) Carnegie Mellon University, Pittsburgh, PA, USA;(2) University of Maryland, College Park, MD, USA;(2) BBN Technologies, Cambridge, MA, USA;(2) Columbia University, New York, NY, USA;(2) University of Washington, Seattle, WA, USA;(2) Oregon Health & Sciences University, Portland, OR, USA;(2) Linguistic Data Consortium, University of Pennsylvania, Philadelphia, PA, USA;(2) University of Pennsylvania, Philadelphia, PA, USA;(2) Stanford University, Stanford, CA, USA;;
Abstract:Most evaluation metrics for machine translation (MT) require reference translations for each sentence in order to produce a score reflecting certain aspects of its quality. The de facto metrics, BLEU and NIST, are known to have good correlation with human evaluation at the corpus level, but this is not the case at the segment level. As an attempt to overcome these two limitations, we address the problem of evaluating the quality of MT as a prediction task, where reference-independent features are extracted from the input sentences and their translation, and a quality score is obtained based on models produced from training data. We show that this approach yields better correlation with human evaluation as compared to commonly used metrics, even with models trained on different MT systems, language-pairs and text domains.
Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号