首页 | 本学科首页   官方微博 | 高级检索  
     

“细粒度英汉机器翻译错误分析语料库”的构建与思考
引用本文:裘白莲,王明文,李茂西,陈聪,徐凡.“细粒度英汉机器翻译错误分析语料库”的构建与思考[J].中文信息学报,2022,36(1):47-55.
作者姓名:裘白莲  王明文  李茂西  陈聪  徐凡
作者单位:1.江西师范大学 计算机信息工程学院,江西 南昌 330022;
2.华东交通大学 外国语学院,江西 南昌 330013
基金项目:国家自然科学基金(61876074,61662031,61772246);国家社会科学基金(19BYY121);教育部人文社科基金(21YJC740040)
摘    要:机器翻译错误分析旨在找出机器译文中存在的错误,包括错误类型、错误分布等,它在机器翻译研究和应用中发挥着重要作用。该文将人工译后编辑与错误分析结合起来,对译后编辑操作进行错误标注,采用自动标注和人工标注相结合的方法,构建了一个细粒度英汉机器翻译错误分析语料库,其中每一个标注样本包括源语言句子、机器译文、人工参考译文、译后编辑译文、词错误率和错误类型标注;标注的错误类型包括增词、漏词、错词、词序错误、未译和命名实体翻译错误等。标注的一致性检验表明了标注的有效性;对标注语料的统计分析结果能有效地指导机器翻译系统的开发和人工译员的后编辑。

关 键 词:机器翻译  错误分析  错误标注  译后编辑  

Construction of Fine-Grained Error Analysis Corpus of English-Chinese Machine Translation
QIU Bailian,WANG Mingwen,LI Maoxi,CHEN Cong,XU Fan.Construction of Fine-Grained Error Analysis Corpus of English-Chinese Machine Translation[J].Journal of Chinese Information Processing,2022,36(1):47-55.
Authors:QIU Bailian  WANG Mingwen  LI Maoxi  CHEN Cong  XU Fan
Affiliation:1.School of Computer and Information Engineering, Jiangxi Normal University, Nanchang, Jiangxi 330022, China;
2.School of Foreign Languages, East China Jiaotong University, Nanchang, Jiangxi 330013, China
Abstract:Machine translation error analysis, including error classes and error distribution etc. Error analysis of machine translaution output, plays an important role in the research and application of machine translation. In this paper, post-editing is introduced into error analysis to annotate error labels. Automatic error annotation and manual annotation are applied to build a Fine-grained Error Analysis Corpus of English-Chinese Machine Translation (ErrAC), in which every annotated sample includes a source sentence, MT output, reference, post-edit, WER and error type. The annotated error types include addition, omission, lexical error, word order error, untranslated word, named entity translation error etc. Annotator agreement analysis shows the effectiveness of the annotation. The statistics and analysis based on the corpus provide effective guidance for the development of machine translation system and post-editing practice.
Keywords:machine translation  error analysis  error annotation  post-editing  
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号