首页 | 本学科首页   官方微博 | 高级检索  
     

基于实体词语义相似度的中文实体关系抽取
引用本文:徐庆,段利国,李爱萍,阴桂梅.基于实体词语义相似度的中文实体关系抽取[J].山东大学学报(工学版),2015,45(6):7-15.
作者姓名:徐庆  段利国  李爱萍  阴桂梅
作者单位:1. 太原理工大学计算机科学与技术学院, 山西太原 030024;2. 武汉大学软件工程国家重点实验室, 湖北武汉 430072;3. 太原师范学院计算机科学与技术系, 山西太原 030600
基金项目:武汉大学软件工程国家重点实验室开放课题资助项目(SKLSE2012-09-30);山西省自然科学基金资助项目(2013011015-2);山西省基础条件平台资助项目(2014091004-0104)
摘    要:为了探索语义相似度在中文实体关系抽取上的作用,提出由实体词在《同义词词林》中的5层编码构建成的《同义词词林》编码树和由关系实例中的实体词,各个类别中所有实体词计算相似度后求得的平均值构建成的实体词语义相似度树2种新特征,并连同已有的《同义词词林》编码、实体类型信息共4种特征探究其对抽取性能的影响。单一特征的试验中,实体类型特征效果最好,F值达到了小类84.9、大类83.2;组合特征的试验中,实体类型和《同义词词林》编码树的组合特征效果最好,大类小类的F值都比实体类型特征提高了2.5,3种组合特征性能不升反降。试验结果表明《同义词词林》编码树是对实体类型的有效补充,但过多的特征会造成信息冗余,使抽取性能下降。

关 键 词:语法树  语义相似度  树核函数  《同义词词林》  中文实体关系抽取  
收稿时间:2015-05-18

Chinese entity relation extraction based on entity semantic similarity
XU Qing,DUAN Liguo,LI Aiping,YIN Guimei.Chinese entity relation extraction based on entity semantic similarity[J].Journal of Shandong University of Technology,2015,45(6):7-15.
Authors:XU Qing  DUAN Liguo  LI Aiping  YIN Guimei
Affiliation:1. College of Computer Science and Technology, Taiyuan University of Technology, Taiyuan 030024, Shanxi, China;2. State Key Laboratory of Software Engineering, Wuhan University, Wuhan 430072, Hubei, China;3. Department of Computer Science and Technology, Taiyuan Normal University, Taiyuan 030600, Shanxi, China
Abstract:In order to explore the impact of the semantic similarity on the Chinese entity relation extraction, two new features were proposed, which were the "TongYiCi Cilin" code tree constructed with the entities'5 layer code in "TongYiCi Cilin" and the entity semantic similarity tree constructed with the average of the semantic similarity between the entity word in relation instance and all entity words in each category of relation. The impact on the relation extraction performance of these two new features together with the existing "TongYiCi Cilin" code feature and the entity type information feature was explored. In the cases with single features, the entity type feature got the best performance, and the F values of subtype and type were 84.9 and 83.2; In the cases with combination features, the combination of the entity type feature and the "TongYiCi Cilin" code tree feature got the best performance, the F values of both subtype and type were 2.5 higher than the entity type feature. But the performance of three combinations features became poorer instead of better. The results showed that the "TongYiCi Cilin" code tree was an effective supplement of the entity type information, but excessive features may result in information redundancy and poor performance.
Keywords:syntax tree  semantic similarity  tree kernel  TongYiCi CiLin  Chinese entity relation extraction  
本文献已被 万方数据 等数据库收录!
点击此处可从《山东大学学报(工学版)》浏览原始摘要信息
点击此处可从《山东大学学报(工学版)》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号