首页 | 本学科首页   官方微博 | 高级检索  
     


Metrics for MT evaluation: evaluating reordering
Authors:Alexandra Birch  Miles Osborne  Phil Blunsom
Affiliation:(1) University of Maryland, College Park, MD, USA;(2) Defense Advanced Research Projects Agency, Arlington, VA, USA;(2) University of Maryland, College Park, MD, USA;(2) University of Maryland, College Park, MD, USA;(2) National Institute of Standards and Technology, Gaithersburg, MD, USA;(2) Linguistic Data Consortium, University of Pennsylvania, Philadelphia, PA, USA;(2) National Institute of Standards and Technology, Gaithersburg, MD, USA;(2) University of Maryland, College Park, MD, USA;(2) University of Maryland, College Park, MD, USA;(2) BBN Technologies, Cambridge, MA, USA;(2) University of Maryland, College Park, MD, USA;(2) BBN Technologies, Cambridge, MA, USA;(2) University of Maryland, College Park, MD, USA;(2) BBN Technologies, Cambridge, MA, USA;(2) University of Maryland, College Park, MD, USA;(2) Lincoln Laboratory, Massachusetts Institute of Technology, Lexington, MA, USA;(2) University of Maryland, College Park, MD, USA;(2) Lincoln Laboratory, Massachusetts Institute of Technology, Lexington, MA, USA;(2) Defense Language Institute, Monterey, CA, USA;(2) University of Maryland, College Park, MD, USA;(2) Lincoln Laboratory, Massachusetts Institute of Technology, Lexington, MA, USA;(2) National Institute of Standards and Technology, Gaithersburg, MD, USA;(2) Linguistic Data Consortium, University of Pennsylvania, Philadelphia, PA, USA;(2) National Institute of Standards and Technology, Gaithersburg, MD, USA;(2) Linguistic Data Consortium, University of Pennsylvania, Philadelphia, PA, USA;(2) National Institute of Standards and Technology, Gaithersburg, MD, USA;(2) Linguistic Data Consortium, University of Pennsylvania, Philadelphia, PA, USA;(2) National Institute of Standards and Technology, Gaithersburg, MD, USA;(2) Linguistic Data Consortium, University of Pennsylvania, Philadelphia, PA, USA;(2) University of Maryland, College Park, MD, USA;(2) RWTH Aachen University, Aachen, Germany;(2) RWTH Aachen University, Aachen, Germany;(2) Carnegie Mellon University, Pittsburgh, PA, USA;(2) Carnegie Mellon University, Pittsburgh, PA, USA;(2) Carnegie Mellon University, Pittsburgh, PA, USA;(2) University of Maryland, College Park, MD, USA;(2) BBN Technologies, Cambridge, MA, USA;(2) Columbia University, New York, NY, USA;(2) University of Washington, Seattle, WA, USA;(2) Oregon Health & Sciences University, Portland, OR, USA;(2) Linguistic Data Consortium, University of Pennsylvania, Philadelphia, PA, USA;(2) University of Pennsylvania, Philadelphia, PA, USA;(2) Stanford University, Stanford, CA, USA;;
Abstract:Translating between dissimilar languages requires an account of the use of divergent word orders when expressing the same semantic content. Reordering poses a serious problem for statistical machine translation systems and has generated a considerable body of research aimed at meeting its challenges. Direct evaluation of reordering requires automatic metrics that explicitly measure the quality of word order choices in translations. Current metrics, such as BLEU, only evaluate reordering indirectly. We analyse the ability of current metrics to capture reordering performance. We then introduce permutation distance metrics as a direct method for measuring word order similarity between translations and reference sentences. By correlating all metrics with a novel method for eliciting human judgements of reordering quality, we show that current metrics are largely influenced by lexical choice, and that they are not able to distinguish between different reordering scenarios. Also, we show that permutation distance metrics correlate very well with human judgements, and are impervious to lexical differences.
Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号