首页 | 本学科首页   官方微博 | 高级检索  
     

一种面向突发事件的文本语料自动标注方法
引用本文:刘 炜,王 旭,张雨嘉,刘宗田. 一种面向突发事件的文本语料自动标注方法[J]. 中文信息学报, 2017, 31(2): 76-85
作者姓名:刘 炜  王 旭  张雨嘉  刘宗田
作者单位:上海大学 计算机工程与科学学院,上海 200444
基金项目:国家自然科学基金(61305053);国家自然科学基金(61273328)
摘    要:事件语料库是研究语义Web中事件知识的抽取、表示、推理和挖掘的基础和关键技术之一。该文以事件作为文本知识单元,在LTP分析的基础上,用序列模式挖掘算法PrefixSpan从现有的小规模语料库中挖掘事件要素的词性规则等,用同义词词林(扩展版)对触发词表进行了扩充,结合自定义的事件要素词典,采用多遍过滤、逐遍完善的思想提出一种针对大规模突发事件语料库构建的自动标注方法,在实验部分不仅与人工标注做了对比,同时与Stanford CoreNLP NER进行了对比,实验效果理想。

关 键 词:突发事件  语料库  自动标注  

An Automatic-Annotation Method for Emergency Text Corpus
LIU Wei,WANG Xu,ZHANG Yujia,LIU Zongtian. An Automatic-Annotation Method for Emergency Text Corpus[J]. Journal of Chinese Information Processing, 2017, 31(2): 76-85
Authors:LIU Wei  WANG Xu  ZHANG Yujia  LIU Zongtian
Affiliation:School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China
Abstract:Event-based text corpus is the foundation for the research on detection, representation, reasoning and exploitation of events in the Semantic Web. This paper proposes an automatic-annotation method for event-based texts to construct large-scale emergencies news corpus. Firstly, this paper presents an event structure model as event-based knowledge unit; Secondly, on the basis of text process by LTP , we apply the PrefixSpan to mine the rules of event elements from small-scale available corpus; Thirdly, by combining a customized dictionary of event elements, the denoters are expanded by Tonyici Cilin (Extended). In the experiment, the automatic annotation method is compared with manual tagging method and Stanford CoreNLP NER, showing that this method can improve the efficiency of event-based text annotation effectively.
Keywords:emergency events   corpus   automatic   annotation  
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号