首页 | 本学科首页   官方微博 | 高级检索  
     

基于迁移学习的蛋白质交互关系抽取
引用本文:李丽双,郭 瑞,黄德根,周惠巍.基于迁移学习的蛋白质交互关系抽取[J].中文信息学报,2016,30(2):160-167.
作者姓名:李丽双  郭 瑞  黄德根  周惠巍
作者单位:大连理工大学 计算机学院,辽宁 大连 116023
基金项目:国家自然科学基金(61173101, 61173100, 61272375)
摘    要:作为生物医学信息抽取领域的重要分支,蛋白质交互关系(Protein-Protein Interaction,PPI)抽取具有重要的研究意义。目前的研究大多采用统计机器学习方法,需要大规模标注语料进行训练。训练语料过少,会降低关系抽取系统的性能,而人工标注语料需要耗费巨大的成本。该文采用迁移学习的方法,用大量已标注的源领域(其它领域)语料来辅助少量标注的目标领域语料(本领域)进行蛋白质交互关系抽取。但是,不同领域的数据分布存在差异,容易导致负迁移,该文借助实例的相对分布来调整权重,避免了负迁移的发生。在公共语料库AIMed上实验,两种迁移学习方法获得了明显优于基准算法的性能;同样方法在语料库IEPA上实验时,TrAdaboost算法发生了负迁移,而改进的DisTrAdaboost算法仍保持良好迁移效果。

关 键 词:蛋白质交互关系抽取  迁移学习  负迁移  

Protein-Protein Interaction Extraction Based on Transfer Learning
LI Lishuang,GUO Rui,HUANG Degen,ZHOU Huiwei.Protein-Protein Interaction Extraction Based on Transfer Learning[J].Journal of Chinese Information Processing,2016,30(2):160-167.
Authors:LI Lishuang  GUO Rui  HUANG Degen  ZHOU Huiwei
Affiliation:(School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116023, China)
Abstract:As an important branch of biomedical information extraction, Protein-Protein Interaction (PPI) extraction has great research significance. Currently, research of PPI mainly focuses on traditional machine learning, which requires the use of large amounts of annotated corpus for training and makes it costly to label the new data. This paper employs Transfer Learning in extracting PPI with a small amount of labeled data of target domain (in-domain), drawing support from annotated data of source domain (out-of-domain). To avoid the negative transfer caused by large differences between the distributions of different domains, we adjust the weights of each instance from source domain, depending on its relative distribution. Experiments on the AIMed corpus and on IEPA corpus reveals the efficiency of our alogrithems.
Keywords:PPI  transfer learning  negative transfer  
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号