首页 | 本学科首页   官方微博 | 高级检索  
     

自然语言处理评测中的问题与对策
引用本文:董青秀,穗志方,詹卫东,常宝宝.自然语言处理评测中的问题与对策[J].中文信息学报,2021,35(6):1-15.
作者姓名:董青秀  穗志方  詹卫东  常宝宝
作者单位:1.北京大学 计算语言学教育部重点实验室,北京 100871;
2.北京大学 信息科学技术学院,北京 100871;
3.北京大学 中文系,北京 100871
基金项目:国家科技创新2030“新一代人工智能”重大项目(2020AAA0067067000);国家自然科学基金(U19A2065)
摘    要:自然语言处理中的评测任务引导和推动着技术、模型和方法上的研究。近年来,新的评测数据集和评测任务不断被提出,与此同时,现有评测暴露的一系列问题也限制了自然语言处理技术的进步。该文从自然语言处理评测的概念、构成、发展和意义出发,分类综述了主流自然语言处理评测的任务和特点,进而总结归纳了自然语言处理评测中的问题及其成因。最后,该文参照人类语言能力评测规范,提出类人机器语言能力评测的概念,并从信度、难度、效度三个方面提出了一系列类人机器语言能力评测的基本原则和实施设想,并对评测技术的未来发展进行了展望。

关 键 词:自然语言处理评测  数据集偏差  评测指标  
收稿时间:2020-12-14

Problems and Countermeasures in Natural Language Processing Evaluation
DONG Qingxiu,SUI Zhifang,ZHAN Weidong,CHANG Baobao.Problems and Countermeasures in Natural Language Processing Evaluation[J].Journal of Chinese Information Processing,2021,35(6):1-15.
Authors:DONG Qingxiu  SUI Zhifang  ZHAN Weidong  CHANG Baobao
Affiliation:1.MOE Key Laboratory of Computational Linguistics, Peking University, Beijing 100871, China;2.School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China;3.Department of Chinese Language and Literature, Peking University, Beijing 100871, China
Abstract:Evaluation in natural language processing drives and promotes research on models and methods. In recent years, new evaluation data sets and evaluation tasks have been continuously proposed. At the same time, a series of problems exposed by such evaluations seems to restrict the progress of natural language processing technology. Starting from the concept, composition, development and significance of natural language Processing evaluation, this article classifies and summarizes the tasks and characteristics of mainstream natural language Processing evaluation, and then reveals the problems and their possible causes. In parallel to the human language ability evaluation standard, this paper puts forward the concept of human-like machine language ability evaluation, and proposes a series of basic principles and implementation ideas for human-like machine language ability evaluation from three aspects: reliability, difficulty and validity.
Keywords:natural language processing evaluation  data set bias  evaluation metric  
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号