首页 | 本学科首页   官方微博 | 高级检索  
     

基于联合训练的蛋白质互作用信息抽取方法*
引用本文:钱伟中,邓蔚,傅翀,秦志光. 基于联合训练的蛋白质互作用信息抽取方法*[J]. 计算机应用研究, 2011, 28(5): 1738-1741. DOI: 10.3969/j.issn.1001-3695.2011.05.041
作者姓名:钱伟中  邓蔚  傅翀  秦志光
作者单位:电子科技大学,计算机科学与工程学院,成都,610054
基金项目:国家高科技发展规划项目(“863”计划)
摘    要:摘 要:针对生物文献库中人工标注样本数量缺乏的问题,提出一种半监督类型的基于联合训练的方法。在样本预处理的基础上,基于词特征的机器学习方法和基于模式学习的方法选择样本的不同特征子集,并被合成到联合训练方法中。在训练过程中每种方法能够利用少量初始标注样本和大量未标注样本进行学习,并用另一方法的学习结果扩充标注样本集。该方法在AIMED语料库中获得了63.9%的F1值,比较实验结果表明,该方法性能优于监督方法,且能有效利用未标注样本以适应实际抽取任务。

关 键 词:蛋白质互作用;半监督;联合训练;词特征;模式学习
收稿时间:2010-10-18
修稿时间:2010-11-11

Protein-protein interaction extraction method using co-training
QIAN Wei-zhong,DENG Wei,FU Chong,QIN Zhi-guang. Protein-protein interaction extraction method using co-training[J]. Application Research of Computers, 2011, 28(5): 1738-1741. DOI: 10.3969/j.issn.1001-3695.2011.05.041
Authors:QIAN Wei-zhong  DENG Wei  FU Chong  QIN Zhi-guang
Affiliation:(School of Computer Science & Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China)
Abstract:Abstract: In order to solve the problem of lack of manually labeled samples, a semi-supervised co-training based method is proposed. After preprocessing, the bag of words based method and the pattern learning based method select different subset of features in samples and are incorporated into co-training. In the training stage, each method can utilize a small set of initial labeled samples and a large set of unlabeled samples to learn and the results of the other method to enlarge labeled sample set. Tested in the AIMED corpus, this method achieved F1 value of 63.9%.The comparative experimental results showed that the method outperforms supervised methods and can utilize unlabeled samples efficiently to be adaptive to the real extraction tasks.
Keywords:protein-protein interaction   semi-supervised   co-training   bag of words feature   pattern learning
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号