首页 | 本学科首页   官方微博 | 高级检索  
     

基于XGBoost与拓扑结构信息的蛋白质复合物识别算法
引用本文:徐周波,杨健,刘华东,黄文文.基于XGBoost与拓扑结构信息的蛋白质复合物识别算法[J].计算机应用,2020,40(5):1510-1514.
作者姓名:徐周波  杨健  刘华东  黄文文
作者单位:1.广西可信软件重点实验室(桂林电子科技大学),广西桂林 541004 2.桂林电子科技大学 机电工程学院,广西桂林 541004
基金项目:国家自然科学基金资助项目(61762027);广西自然科学基金资助项目(2017GXNSFAA198172)。
摘    要:蛋白质相互作用(PPI)网络中存在大量不确定性及已知蛋白质复合物数据的不完整性,单独地根据结构信息进行搜索或对已知复合物进行监督学习的方法在识别蛋白质复合物的准确性上存在不足。对此,提出一种XGBoost模型与复合物拓扑结构信息相结合的搜索方法(XGBP)。首先,根据复合物拓扑结构信息进行特征提取;然后,把所提取的特征用XGBoost模型进行训练;最后,将拓扑结构信息与监督学习方法相结合,建立特征与复合物之间的映射关系以提高蛋白质复合物预测的准确性。该算法分别与目前流行的马尔可夫聚类算法(MCL)、极大团聚类方法(CMC)、基于核心-附属结构算法(COACH)、快速层级聚类算法(HC-PIN)、基于重叠邻居的扩展聚类(ClusterONE)、分子复合物检测算法(MCODE)、基于不确定图模型的蛋白质复合物检测方法(DCU)和加权核心-附属算法(WCOACH)这八种非监督学习算法和三种监督学习方法贝叶斯网络(BN)、支持向量机(SVM)、回归模型(RM)进行比较,所提方法在精准度、敏感度、F-measure方面显示出良好的性能。

关 键 词:蛋白质复合物  XGBoost模型  蛋白质相互作用网络  图数据挖掘  机器学习
收稿时间:2019-11-25
修稿时间:2020-01-19

Protein complex identification algorithm based on XGboost and topological structural information
XU Zhoubo,YANG Jian,LIU Huadong,HUANG Wenwen.Protein complex identification algorithm based on XGboost and topological structural information[J].journal of Computer Applications,2020,40(5):1510-1514.
Authors:XU Zhoubo  YANG Jian  LIU Huadong  HUANG Wenwen
Affiliation:1.Guangxi Key Laboratory of Trusted Software (Guilin University of Electronic Technology), GuilinGuangxi 541004, China
2.School of Mechanical and Electrical Engineering, Guilin University of Electronic Technology, GuilinGuangxi 541004, China
Abstract:Large amount of uncertainty in PPI network and the incompleteness of the known protein complex data add inaccuracy to the methods only considering the topological structural information to search or performing supervised learning to the known complex data. In order to solve the problem, a search method called XGBoost model for Predicting protein complex (XGBP) was proposed. Firstly, feature extraction was performed based on the topological structural information of complexes. Then, the extracted features were trained by XGBoost model. Finally, a mapping relationship between features and protein complexes was constructed by combining topological structural information and supervised learning method, in order to improve the accuracy of protein complex prediction. Comparisons were performed with eight popular unsupervised algorithms: Markov CLustering (MCL), Clustering based on Maximal Clique (CMC), Core-Attachment based method (COACH), Fast Hierarchical clustering algorithm for functional modules discovery in Protein Interaction (HC-PIN), Cluster with Overlapping Neighborhood Expansion (ClusterONE), Molecular COmplex DEtection (MCODE), Detecting Complex based on Uncertain graph model (DCU), Weighted COACH (WCOACH); and three supervisedmethods Bayesian Network (BN), Support Vector Machine (SVM), Regression Model (RM). The results show that the proposed algorithm has good performance in terms of precision, sensitivity and F-measure.
Keywords:protein complex                                                                                                                        XGBoost model                                                                                                                        Protein-Protein Interaction (PPI) network                                                                                                                        graph data mining                                                                                                                        machine learning
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号