自然语言处理评测中的问题与对策 Problems and Countermeasures in Natural Language Processing Evaluation期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

自然语言处理评测中的问题与对策

引用本文：	董青秀,穗志方,詹卫东,常宝宝.自然语言处理评测中的问题与对策[J].中文信息学报,2021,35(6):1-15.

作者姓名：	董青秀穗志方詹卫东常宝宝

作者单位：	1.北京大学计算语言学教育部重点实验室,北京 100871; 2.北京大学信息科学技术学院,北京 100871; 3.北京大学中文系,北京 100871

基金项目：	国家科技创新2030“新一代人工智能”重大项目(2020AAA0067067000);国家自然科学基金(U19A2065)

摘要：	自然语言处理中的评测任务引导和推动着技术、模型和方法上的研究。近年来,新的评测数据集和评测任务不断被提出,与此同时,现有评测暴露的一系列问题也限制了自然语言处理技术的进步。该文从自然语言处理评测的概念、构成、发展和意义出发,分类综述了主流自然语言处理评测的任务和特点,进而总结归纳了自然语言处理评测中的问题及其成因。最后,该文参照人类语言能力评测规范,提出类人机器语言能力评测的概念,并从信度、难度、效度三个方面提出了一系列类人机器语言能力评测的基本原则和实施设想,并对评测技术的未来发展进行了展望。
关键词：	自然语言处理评测数据集偏差评测指标
收稿时间：	2020-12-14
Problems and Countermeasures in Natural Language Processing Evaluation

DONG Qingxiu,SUI Zhifang,ZHAN Weidong,CHANG Baobao.Problems and Countermeasures in Natural Language Processing Evaluation[J].Journal of Chinese Information Processing,2021,35(6):1-15.

Authors:	DONG Qingxiu SUI Zhifang ZHAN Weidong CHANG Baobao

Affiliation:	1.MOE Key Laboratory of Computational Linguistics, Peking University, Beijing 100871, China;2.School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China;3.Department of Chinese Language and Literature, Peking University, Beijing 100871, China

Abstract:	Evaluation in natural language processing drives and promotes research on models and methods. In recent years, new evaluation data sets and evaluation tasks have been continuously proposed. At the same time, a series of problems exposed by such evaluations seems to restrict the progress of natural language processing technology. Starting from the concept, composition, development and significance of natural language Processing evaluation, this article classifies and summarizes the tasks and characteristics of mainstream natural language Processing evaluation, and then reveals the problems and their possible causes. In parallel to the human language ability evaluation standard, this paper puts forward the concept of human-like machine language ability evaluation, and proposes a series of basic principles and implementation ideas for human-like machine language ability evaluation from three aspects: reliability, difficulty and validity.

Keywords:	natural language processing evaluation data set bias evaluation metric

	点击此处可从《中文信息学报》浏览原始摘要信息
	点击此处可从《中文信息学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏