首页 | 官方网站   微博 | 高级检索  
     

结合词向量和Bootstrapping的领域实体上下位关系获取与组织
引用本文:马晓军,郭剑毅,线岩团,毛存礼,严馨,余正涛.结合词向量和Bootstrapping的领域实体上下位关系获取与组织[J].计算机科学,2018,45(1):67-72.
作者姓名:马晓军  郭剑毅  线岩团  毛存礼  严馨  余正涛
作者单位:昆明理工大学信息工程与自动化学院 昆明650500,昆明理工大学信息工程与自动化学院 昆明650500;昆明理工大学智能信息处理重点实验室 昆明650500,昆明理工大学信息工程与自动化学院 昆明650500;昆明理工大学智能信息处理重点实验室 昆明650500,昆明理工大学信息工程与自动化学院 昆明650500;昆明理工大学智能信息处理重点实验室 昆明650500,昆明理工大学信息工程与自动化学院 昆明650500;昆明理工大学智能信息处理重点实验室 昆明650500,昆明理工大学信息工程与自动化学院 昆明650500;昆明理工大学智能信息处理重点实验室 昆明650500
基金项目:本文受国家自然科学基金(61562052,61363044,61472168)资助
摘    要:实体上下位关系是构建领域知识图谱不可或缺的一种重要的语义关系,传统抽取上下位关系的方法大多不考虑关系的组织。提出一种结合词向量和Bootstrapping的方法来实现领域实体上下位关系的获取与组织。首先,选取旅游领域的种子语料集;然后,采用基于词向量的相似度计算方法对种子集中包含的上下位关系模式进行聚类,筛选出置信度高的模式并对未标注语料进行上下位关系识别,得到候选关系实例,同时选择置信度高的关系实例加入到种子集中,进行下一轮的迭代,直到得到所有的关系实例;最后,根据领域实体上下位关系对的向量偏移并结合领域实体层级关系的特点,采用映射的学习方法进行领域实体层级关系组织。实验结果表明,与传统的方法相比,所提方法的F值提高了近10%。

关 键 词:上下位关系  关系抽取  Bootstrapping方法  词向量  映射学习  层级关系组织
收稿时间:2017/3/3 0:00:00
修稿时间:2017/6/16 0:00:00

Entity Hyponymy Acquisition and Organization Combining Word Embedding and Bootstrapping in Special Domain
MA Xiao-jun,GUO Jian-yi,XIAN Yan-tuan,MAO Cun-li,YAN Xin and YU Zheng-tao.Entity Hyponymy Acquisition and Organization Combining Word Embedding and Bootstrapping in Special Domain[J].Computer Science,2018,45(1):67-72.
Authors:MA Xiao-jun  GUO Jian-yi  XIAN Yan-tuan  MAO Cun-li  YAN Xin and YU Zheng-tao
Affiliation:School of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China,School of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China;The Key Laboratory of Intelligent Information Processing,Kunming University of Science and Technology,Kunming 650500,China,School of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China;The Key Laboratory of Intelligent Information Processing,Kunming University of Science and Technology,Kunming 650500,China,School of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China;The Key Laboratory of Intelligent Information Processing,Kunming University of Science and Technology,Kunming 650500,China,School of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China;The Key Laboratory of Intelligent Information Processing,Kunming University of Science and Technology,Kunming 650500,China and School of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China;The Key Laboratory of Intelligent Information Processing,Kunming University of Science and Technology,Kunming 650500,China
Abstract:The semantic relation of entity hypomypy is important to build the domain knowledge graphs.The organization of hierarchical relations is not considered in the traditional method of extracting hyponymy.A method of extracting and organizing the entity hyponymy in the specific field was proposed in this paper,which combines the word embedding and bootstrapping method.Firstly,the tourism corpus is selected as seed corpus,then the hyponymy patterns included in the seed corpus are clustered based on the method of word embedding similarity.Thus,the patterns of high-confidence level are filtrated which is used to identify hyponymy in the unlabeled corpus.After that,the high-confidence instances of relation are obtained which are selected to put in the seed sets.And the next iteration is performed until all the instances of relation are obtained.Finally,the mapping learning methods are applied to conduct the hierarchical relation of domain entity based on the character of the entity of domain hierarchical relations and the vector-deviation of the hyponymy pairs of the entity.The experimental results show that the proposed method improves the F-value by 10% compared with the traditional method.
Keywords:Hyponymy relation  Relation extraction  Bootstrapping method  Word embedding  Projection learning  Hierarchical relation organization
点击此处可从《计算机科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号