首页 | 本学科首页   官方微博 | 高级检索  
     

面向专业文献知识实体类型的抽取和标注
引用本文:温雯,伍思杰,蔡瑞初,郝志峰.面向专业文献知识实体类型的抽取和标注[J].中文信息学报,2018,32(1):102-115.
作者姓名:温雯  伍思杰  蔡瑞初  郝志峰
作者单位:1.广东工业大学 计算机学院,广东 广州 510006;2.佛山科学技术学院,广东 佛山 528000
基金项目:国家自然科学基金(61202269); 博士点基金项目(20134420110010)
摘    要:知识实体的类型标注是专业文献的结构化管理和知识脉络挖掘中的一个重要问题。然而,由于知识实体具有专业性强、类型多样等特点,传统的实体抽取方法并不能很好地实现知识实体的类型标注。为了解决这一问题,该文从数据中发现并总结出知识实体类型的独有特性,根据这些特性首先提出一种基于启发式规则的类型抽取方法、实现部分知识实体的类型标注,进而通过多标签加权的标签传播方法实现对所有知识实体的类型标注。与传统方法相比,该方法能够从数据中获得最有可能的类型标签,在无需人工标注的情况下获得有效的知识实体类型标注。实验结果表明,所提出方法具有较好的灵活性,更适用于专业文献知识实体的类型标注。

关 键 词:类型抽取  类型标注  知识实体  多标签加权  标签传播  

Type Extraction and Labelling of Knowledge Entities in the Field of Professional Literature
WEN Wen,WU Sijie,CAI Ruichu,HAO Zhifeng.Type Extraction and Labelling of Knowledge Entities in the Field of Professional Literature[J].Journal of Chinese Information Processing,2018,32(1):102-115.
Authors:WEN Wen  WU Sijie  CAI Ruichu  HAO Zhifeng
Affiliation:1. Computer School, Guangdong University of Technology, Guangzhou, Guangdong 510006, China; 2. Foshan University, Foshan, Guangdong 528000, China
Abstract:Knowledge-entity type labeling is important for the structural management of literature data. However, since the knowledge entities are highly specialized and have diversified types, traditional entity-extraction and labeling methods do not produce good results on the literature data. To solve this problem, we summarize several characteristics of knowledge-entity by exploring the literature data. And then according to these characteristics, we propose a combination of unsupervised and semi-supervised method, which is based on some heuristic rules and multi-label weighted LPA propagation. This method is able to extract candidate labels from the data and does the knowledge-entity labeling work without manual annotation. Experimental results demonstrate that the proposed method is flexible, and more suitable for the literature data.
Keywords:type extraction  type labelling  knowledge entity  multi-label weighting  label propagation  
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号