首页 | 本学科首页   官方微博 | 高级检索  
     

基于Tri-training与噪声过滤的弱监督关系抽取
引用本文:贾 真,冶忠林,尹红风,何大可. 基于Tri-training与噪声过滤的弱监督关系抽取[J]. 中文信息学报, 2016, 30(4): 142-149
作者姓名:贾 真  冶忠林  尹红风  何大可
作者单位:1. 西南交通大学 信息科学与技术学院, 四川 成都 610031;
2. DOCOMO Innovations 公司,美国 帕罗奥图 94304
基金项目:国家自然科学基金(61170111, 61202043, 61262058)
摘    要:弱监督关系抽取利用已有关系实体对从文本集中自动获取训练数据,有效解决了训练数据不足的问题。针对弱监督训练数据存在噪声、特征不足和不平衡,导致关系抽取性能不高的问题,文中提出NF-Tri-training(Tri-training with Noise Filtering)弱监督关系抽取算法。它利用欠采样解决样本不平衡问题,基于Tri-training从未标注数据中迭代学习新的样本,提高分类器的泛化能力,采用数据编辑技术识别并移除初始训练数据和每次迭代产生的错标样本。在互动百科采集数据集上实验结果表明NF-Tri-training算法能够有效提升关系分类器的性能。

关 键 词:关系抽取   弱监督学习   Tri-training   数据编辑  

Weakly Supervised Relation Extraction Based on Tri-training and Noise Filtering
JIA Zhen,YE Zhonglin,YIN Hongfeng,HE Dake. Weakly Supervised Relation Extraction Based on Tri-training and Noise Filtering[J]. Journal of Chinese Information Processing, 2016, 30(4): 142-149
Authors:JIA Zhen  YE Zhonglin  YIN Hongfeng  HE Dake
Affiliation:1. School of Information and Science Technology, Southwest Jiaotong University, Chengdu, Sichuan 610031, China;
2. DOCOMO Innovations Inc.,Palo Alto 94304, USA
Abstract:Weakly supervised relation extraction utilizes entity pairs to obtain training data from texts automatically, which can effectively deal with the problem of inadequate training data. However, there are many problems in the weakly supervised training data such as noise, inadequate features, and imbalance samples, leading to low performance of relation extraction. In this paper, a weakly supervised relation extraction algorithm named NF-Tri-training (Tri-training with Noise Filtering) is proposed. NF-Tri-training employs an under-sampling approach to solve the problem of imbalance samples, learns new samples iteratively from unlabeled data and uses a data editing technique to identify and discard possible mislabeled samples both in initial training data and in new samples generating at each iteration. The experiment on dataset of Hudong encyclopedia indicates the proposed method can improve the performance of relation classifiers.
Keywords:relation extraction   weakly supervised learning   Tri-training   data editing  
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号