首页 | 本学科首页   官方微博 | 高级检索  
     

适应多领域多来源文本的汉语依存句法数据标注规范
引用本文:郭丽娟,李正华,彭雪,张民.适应多领域多来源文本的汉语依存句法数据标注规范[J].中文信息学报,2018,32(10):28.
作者姓名:郭丽娟  李正华  彭雪  张民
作者单位:苏州大学 计算机科学与技术学院,江苏 苏州 215006
基金项目:国家自然科学基金(61502325,61432013,61525205)
摘    要:近十年来,依存句法分析由于具有表示形式简单、灵活、分析效率高等特点,得到了学术界广泛关注。为了支持汉语依存句法分析研究,国内同行分别标注了几个汉语依存句法树库。然而,目前还没有一个公开、完整、系统的汉语依存句法数据标注规范,并且已有的树库标注工作对网络文本中的特殊语言现象考虑较少。为此,该文充分参考了已有的数据标注工作,同时结合实际标注中遇到的问题,制定了一个新的适应多领域多来源文本的汉语依存句法数据标注规范。我们制定规范的目标是准确刻画各种语言现象的句法结构,同时保证标注一致性。利用此规范,我们已经标注了约3万句汉语依存句法树库。

关 键 词:依存句法  标记规范  

Annotation Guideline of Chinese Dependency Treebankfrom Multi-domain and Multi-source Texts
GUO Lijuan,LI Zhenghua,PENG Xue,ZHANG Min.Annotation Guideline of Chinese Dependency Treebankfrom Multi-domain and Multi-source Texts[J].Journal of Chinese Information Processing,2018,32(10):28.
Authors:GUO Lijuan  LI Zhenghua  PENG Xue  ZHANG Min
Affiliation:School of Computer Science and Technology, Soochow University, Suzhou, Jiangsu 215006, China
Abstract:Dependency parsing has attracted much attention in the research community. There is no public, integrated and systematic annotation guideline for Chinese dependency treebank. Considering the special linguistic phenomena in web texts, this paper proposes a new annotation guideline for Chinese dependency treebank, which is adapted to multi-domain and multi-source texts. This annotation guideline aims to accurately depict the syntactic structures of various linguistic phenomena, and to ensure annotation consistency as well. Based on the proposed guideline, we have annotated about 30 000 Chinese sentences with their dependency structures.
Keywords:dependency  annotation guideline  
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号