首页 | 本学科首页   官方微博 | 高级检索  
     

结合从句级远程监督与半监督集成学习的关系抽取方法*
引用本文:余小康,陈 岭,郭敬,蔡雅雅,吴勇,王敬昌.结合从句级远程监督与半监督集成学习的关系抽取方法*[J].模式识别与人工智能,2017,30(1):54-63.
作者姓名:余小康  陈 岭  郭敬  蔡雅雅  吴勇  王敬昌
作者单位:1.浙江大学 计算机科学与技术学院 杭州 310027
2. 浙江鸿程计算机系统有限公司 杭州 310053
基金项目:国家自然科学基金项目(No.61332017,60703040)、浙江省重大科技专项(No.2015C33002,2013C01046,2011C13042)、中国工程科技知识中心项目(No.CKCEST-2014-1-5)资助
摘    要:针对传统基于远程监督的关系抽取方法中存在噪声和负例数据利用不足的问题,提出结合从句级远程监督和半监督集成学习的关系抽取方法.首先通过远程监督构建关系实例集,使用基于从句识别的去噪算法去除关系实例集中的噪声.然后抽取关系实例的词法特征并转化为分布式表征向量,构建特征数据集.最后选择特征数据集中所有正例数据和部分负例数据组成标注数据集,其余的负例数据组成未标注数据集,通过改进的半监督集成学习算法训练关系分类器.实验表明,相比基线方法,文中方法可以获得更高的分类准确率和召回率.

关 键 词:关系抽取  远程监督  从句识别  去噪  半监督集成学习  
收稿时间:2016-09-19

Relation Extraction Method Combining Clause Level Distant Supervision and Semi-supervised Ensemble Learning
YU Xiaokang,CHEN Ling,GUO Jing,CAI Yaya,WU Yong,WANG Jingchang.Relation Extraction Method Combining Clause Level Distant Supervision and Semi-supervised Ensemble Learning[J].Pattern Recognition and Artificial Intelligence,2017,30(1):54-63.
Authors:YU Xiaokang  CHEN Ling  GUO Jing  CAI Yaya  WU Yong  WANG Jingchang
Affiliation:1. College of Computer Science and Technology, Zhejiang University, Hangzhou 310027
2. Zhejiang Hongcheng Computer System Co.,Ltd., Hangzhou 310053
Abstract:Aiming at noisy data in training data and the insufficient use of negative instances in traditional distant supervision relation extraction methods, a relation extraction method combining clause level distant supervision and semi-supervised ensemble learning is proposed. Firstly, the relation instance set is generated by distant supervision. Secondly, based on clause identification, a denoising algorithm is used to reduce the wrongly labeled data in the relation instance set. Thirdly, the lexical features are extracted from relation instances and are transformed into distributed vectors to establish feature dataset. Finally, all positive data and part of negative data in feature dataset are chosen to form labeled dataset, and the other part of negative data are chosen to form unlabeled dataset. A relation classifier is trained through improved semi-supervised ensemble learning algorithm. Experiments show that compared with baseline methods the proposed method achieves higher accuracies and recall.
Keywords:Relation Extraction  Distant Supervision  Clause Identification  Noise Reduction  Semi supervised  Ensemble Learning  
点击此处可从《模式识别与人工智能》浏览原始摘要信息
点击此处可从《模式识别与人工智能》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号