首页 | 本学科首页   官方微博 | 高级检索  
     

基于远距离监督和模式匹配的职衔履历属性抽取
引用本文:于东,刘春花,田悦. 基于远距离监督和模式匹配的职衔履历属性抽取[J]. 计算机应用, 2016, 36(2): 455-459. DOI: 10.11772/j.issn.1001-9081.2016.02.0455
作者姓名:于东  刘春花  田悦
作者单位:1. 北京语言大学 大数据与语言教育研究所, 北京 100083;2. 北京语言大学 信息科学学院, 北京 100083
基金项目:国家自然科学基金资助项目(61300081);中央高校基本科研业务费专项资金资助项目(北京语言大学科研项目:15YJ030006)。
摘    要:针对从非结构化文本中抽取指定人物职衔履历属性问题,提出一种基于远距离监督和模式匹配的属性抽取方法。该方法从字符串模式和依存模式两个层面描述人物职衔履历特征,将问题分为两阶段。首先利用远距离监督知识和人工标注知识,挖掘具有高覆盖度的模式库,用于发现职衔履历属性和抽取候选集;其次利用职衔机构等属性间的文字接续关系,以及特定人物与候选属性的依存关系,设计候选集的过滤规则对候选项进行筛选,实现高准确度的属性抽取。实验结果显示,所提方法在CLP2014-PAE测试集上的F值达到55.37%,显著高于评测最好成绩(F值34.38%)和基于条件随机场(CRF)的有监督序列标注方法(F值43.79%),表明该方法能高覆盖度挖掘并抽取非结构化文档中的职衔履历属性。

关 键 词:人物属性抽取  职衔履历信息  远距离监督  模式匹配  规则过滤  
收稿时间:2015-09-15
修稿时间:2015-09-22

Personal title and career attributes extraction based on distant supervision and pattern matching
YU Dong,LIU Chunhua,TIAN Yue. Personal title and career attributes extraction based on distant supervision and pattern matching[J]. Journal of Computer Applications, 2016, 36(2): 455-459. DOI: 10.11772/j.issn.1001-9081.2016.02.0455
Authors:YU Dong  LIU Chunhua  TIAN Yue
Affiliation:1. Institute of Big Data and Language Education, Beijing Language and Culture University, Beijing 100083, China;2. School of Information Science, Beijing Language and Culture University, Beijing 100083, China
Abstract:Focusing on the issue of extracting title and career attributes from unstructured text for specific person, an distant supervision and pattern matching based method was proposed. Features of personal attributes were described from two aspects of string pattern and dependency pattern. Title and career attributes were extracted by two stages. At first, both distant supervision and human annotated knowledge were used to build high coverage pattern base to discover and extract a candidate attribute set. Then the literal connections among multiple attributes and dependency relations between the specific person and candidate attributes were used to design a filtering rule set. Test on CLP-2014 PAE share task shows that the F-score of the proposed method reaches 55.37%, which is significantly higher than the best result of the evaluation (F-measure 34.38%), and it also outperforms the method based on supervised Conditional Random Field (CRF) sequence tagging method with F-measure of 43.79%. The experimental results show that by carrying out a filter process, the proposed method can mine and extract title and career attributes from unstructured document with a high coverage rate.
Keywords:personal attributes extraction   title and career information   distant supervision   pattern matching   rule filtering
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号