首页 | 本学科首页   官方微博 | 高级检索  
     

中文问句的形式分类和资源建设
引用本文:黎江涛,饶高琦.中文问句的形式分类和资源建设[J].中文信息学报,2022,36(7):69-76.
作者姓名:黎江涛  饶高琦
作者单位:北京语言大学 汉语国际教育研究院,北京100083
基金项目:教育部人文社会科学基金(20YJC740050)
摘    要:该文归纳了问句形式在问句语料筛选中的作用,探索了问句分类必需的形式特征,同时通过人工标注建设了中文问句分类语料库,并在此基础上进行了基于规则和统计的分类实验,通过多轮实验迭代优化特征组合形成特征规则集,为当前问答提供形式上的分类基础。实验中,基于优化特征规则集的有限状态自动机可实现宏平均F1值为0.94;统计机器学习中随机森林模型的分类效果较好,F1值宏平均达到0.98。

关 键 词:疑问句  分类  形式特征  语料库

Formal Classification of Chinese Question Sentence and Resource Construction
LI Jiangtao,RAO Gaoqi.Formal Classification of Chinese Question Sentence and Resource Construction[J].Journal of Chinese Information Processing,2022,36(7):69-76.
Authors:LI Jiangtao  RAO Gaoqi
Affiliation:Research Institute of International Chinese Language Education, Beijing Language and Culture University, Beijing 100083, China
Abstract:This paper explores the formal features in questions classification and summarized the question types in question corpus filtering. Based on a Chinese question classification corpus manually annotated, this paper has conducted experiments based on rules and statistics for Chinese question sentence classification. In the experiment, the finite state machine based on the optimized feature set can achieve a macro average F1-score of 0.94, and the random forest model reaches 0.98.
Keywords:interrogative sentences  classification  formal features  corpus  
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号