首页 | 本学科首页   官方微博 | 高级检索  
     

基于众包标注的语文教材句子难易度评估研究
引用本文:于东,吴思远,耿朝阳,唐玉玲.基于众包标注的语文教材句子难易度评估研究[J].中文信息学报,2020,34(2):16-26.
作者姓名:于东  吴思远  耿朝阳  唐玉玲
作者单位:1.北京语言大学 信息科学学院,北京 100083;
2.北京语言大学 汉语国际教育研究院,北京 100083
基金项目:国家社会科学基金(17ZDA305);教育部人文社会科学研究青年基金项目(19YJCZH230);北京语言大学中青年学术骨干支持计划
摘    要:该文提出了一种基于成对比较的众包标注方法,该方法可以通过非专业人士的简单判断获取标准统一的句子难度标注结果。基于该方法,构建了基于语文教材的由18411个句子组成的汉语句子难度语料库。面向单句绝对难度评估和句对相对难度评估两项基本的句子难易度评估任务,使用机器学习方法训练汉语句子难度评估模型,并进一步探讨了不同层面语言特征对模型性能的影响。实验结果显示,基于机器学习的分类模型可以有效预测句子的绝对难度和相对难度,最高准确率分别为63.37%和67.95%。语言特征可以帮助提升模型的性能,相比于词汇和句法层面的特征,加入汉字层面特征的模型在两项任务上的准确率最高。

关 键 词:句子难易度评估  可读性研究  众包标注  语文教材语料库

Assessing Sentence Difficulty in Chinese Textbooks Based on Crowdsourcing
YU Dong,WU Siyuan,GENG Zhaoyang,TANG Yuling.Assessing Sentence Difficulty in Chinese Textbooks Based on Crowdsourcing[J].Journal of Chinese Information Processing,2020,34(2):16-26.
Authors:YU Dong  WU Siyuan  GENG Zhaoyang  TANG Yuling
Affiliation:1.College of Information Science, Beijing Language and Culture University, Beijing 100083, China;
2.Research Institute of International Chinese Language Education, Beijing Language and Culture University, Beijing 100083, China
Abstract:We propose a crowd-sourcing annotation approach based on pairwise comparison. With this approach, non-experts annotators' comparative judgements would lead to labelled data with a uniform standard. We construct a textbook-based corpus with 18, 411 Chinese sentences and utilize it to train a machine learning model which is capable of predicting the difficulty of sentences and the relative difficulty of sentence-pairs. We also explore the impact of multi-level linguistic features in two difficulty prediction tasks, in which our model achieves 63.37% and 67.95% accuracy respectively. The results show that Chinese character-level features are of greatest prediction among all the features in the two tasks.
Keywords:sentence difficulty assessment  readability research  crowdsourcing  textbook corpus  
本文献已被 维普 等数据库收录!
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号