首页 | 本学科首页   官方微博 | 高级检索  
     

面向中文的修辞结构关系分类体系及无歧义标注方法
引用本文:侯圣峦,费超群,张书涵. 面向中文的修辞结构关系分类体系及无歧义标注方法[J]. 中文信息学报, 2019, 33(7): 20-30
作者姓名:侯圣峦  费超群  张书涵
作者单位:1.中国科学院 计算技术研究所 智能信息处理重点实验室,北京 100190;
2.中国科学院大学,北京 100190
基金项目:国家重点研发计划(2016YFB1000902);国家自然科学基金(61232015, 61472412, 61621003)
摘    要:修辞结构理论是一种重要的篇章结构理论,其核心是修辞结构关系。该文基于修辞结构理论,结合中文文本特点,提出面向中文的层次化修辞结构关系分类体系及多元定义。同时,针对标注者遇到的歧义问题,提出了无歧义标注方法。为了便于标注,设计并实现了基于Java图形界面的标注工具RSTTagger,该工具以句子的主谓结构关键词构成的元组作为基本标注单位,自底向上逐级标注,最终标注成一棵完整的修辞结构关系树。为验证标注结果的一致性,选取160篇中文外贸领域语料进行标注,不同标注者同时标注其中50篇,标注一致性达到76.63%。该标注框架可以应用到其他领域语料标注中,已标注的160篇语料可以作为篇章结构理论研究的基础语料库。

关 键 词:自然语言处理  修辞结构理论  修辞结构关系  篇章结构分析  

Chinese-Oriented Rhetorical Structure Relation Taxonomy and Unambiguous Annotation Method
HOU Shengluan,FEI Chaoqun,ZHANG Shuhan. Chinese-Oriented Rhetorical Structure Relation Taxonomy and Unambiguous Annotation Method[J]. Journal of Chinese Information Processing, 2019, 33(7): 20-30
Authors:HOU Shengluan  FEI Chaoqun  ZHANG Shuhan
Affiliation:1.Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China;
2.University of Chinese Academy of Sciences, Beijing 100190, China
Abstract:Rhetorical Structure Theory (RST) is a common discourse structure theories, emphasizing the RSR (rhetorical structure relation). Based on English-oriented RST and the characteristics of Chinese text, this paper presents a hierarchical taxonomy and multiple definitions of Chinese-oriented RSR. Moreover, an annotated method is proposed to deal with the problem of ambiguity. A Java-GUI based tagging tool called RST Tagger is designed and implemented as a bottom-up tagger, whose elementary tagging unit is a subject-predicate structure and tagging result is a full discourse structure tree. To validate our proposed tagging framework, we selected 160 Chinese foreign trade text as the tagging corpus, from which 50 texts were randomly selected to be tagged by different annotators. We got annotator agreement with score 76.63%.
Keywords:natural language processing    phetorical structure theory    rhetorical structure relation    discourse parsing  
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号