首页 | 本学科首页   官方微博 | 高级检索  
     

基于自扩展与最大熵的领域实体关系自动抽取
引用本文:雷春雅,郭剑毅,余正涛,毛存礼,张少敏,黄甫.基于自扩展与最大熵的领域实体关系自动抽取[J].山东大学学报(工学版),2010,40(5):141-145.
作者姓名:雷春雅  郭剑毅  余正涛  毛存礼  张少敏  黄甫
作者单位:1. 云南昆明理工大学信息工程与自动化学院, 云南 昆明 650051;2. 云南省计算机技术应用重点实验室智能信息处理研究所, 云南 昆明 650051
基金项目:国家自然科学基金资助项目,云南省自然科学基金重点项目资助项目,云南省中青年学术技术带头人后备人才项目资助项目 
摘    要:实体关系自动获取是信息抽取的难题之一。本文提出自扩展算法和最大熵机器学习算法相结合的方法,以旅游领域为研究对象进行实体关系的自动抽取。首先利用自扩展算法自动获取能体现实体对间大类关系的语义词汇,该词汇作为特征加入最大熵机器学习算法的特征集,并设定阈值实现训练语料的自动标注;然后使用最大熵机器学习算法对训练语料进行学习,构建实体关系抽取的分类器,实现实体关系的自动获取。在收集600篇旅游领域语料的基础上进行实验,4大类实体关系的抽取获得了较好的结果,其中地理位置关系和时节关系的F值分别为82.56%和81.17%。实验结果表明:在人工干预较少的情况下,加入实体对间的语义词汇能有效提高抽取效果。

关 键 词:实体关系抽取  最大熵  自扩展  特征  
收稿时间:2010-03-15

Domain of automatic entity relation extraction based on seed self-expansion and maximum entropy machine learning
LEI Chun-ya,GUO Jian-yi,YU Zheng-tao,MAO Cun-li,ZHANG Shao-min,HUANG Pu.Domain of automatic entity relation extraction based on seed self-expansion and maximum entropy machine learning[J].Journal of Shandong University of Technology,2010,40(5):141-145.
Authors:LEI Chun-ya  GUO Jian-yi  YU Zheng-tao  MAO Cun-li  ZHANG Shao-min  HUANG Pu
Affiliation:1. School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650051, China;  2. Institute of Intelligent Information Processing, Computer Technology Application Key Laboratory of  Yunnan Province, Kunming 650051, China
Abstract:Entity relation extraction is one of difficulties in information extraction’s field.In this paper, a method that seed self expansion and maximum entropy machine learning was proposed to extract entity relation in the filed of tourism. Firstly, used seed self-expansion to get words semantic that express the big types relation between entity pairs, and this words semantic as a characteristic was added to the set of characteristics, meanwhile designed threshold to tag studying corpus automatically; then used maximum entropy machine learning algorithm to learn corpus tagged and built the classifier of entity relation extraction. Experiments based on artificial collection of 600 corpuses obtained a better result for four big types of entity relation extraction, the F values reached 82.56% and 81.17% in which the two big types relation of geographical location and date-season, it showed in the condition of less manual participation, adding the word semantic of entity pairs could effectively improve the performance of classifier.
Keywords:entity relation extraction  maximum entropy  self-expansion  feature
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《山东大学学报(工学版)》浏览原始摘要信息
点击此处可从《山东大学学报(工学版)》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号