首页 | 本学科首页   官方微博 | 高级检索  
     

基于模糊谱聚类的不确定蛋白质相互作用网络功能模块挖掘
引用本文:毛伊敏,刘银萍,梁田,毛丁慧. 基于模糊谱聚类的不确定蛋白质相互作用网络功能模块挖掘[J]. 计算机应用, 2019, 39(4): 1032-1040. DOI: 10.11772/j.issn.1001-9081.2018091880
作者姓名:毛伊敏  刘银萍  梁田  毛丁慧
作者单位:江西理工大学信息工程学院,江西赣州,341000;江西理工大学应用科学学院,江西赣州,341000;中陕核工业集团二一一大队有限公司,西安,710000
基金项目:国家自然科学基金资助项目(41562019);江西省教育厅科技项目(GJJ161566)。
摘    要:针对谱聚类融合模糊C-means(FCM)聚类的蛋白质相互作用(PPI)网络功能模块挖掘方法准确率不高、执行效率较低和易受假阳性影响的问题,提出一种基于模糊谱聚类的不确定PPI网络功能模块挖掘(FSC-FM)方法。首先,构建一个不确定PPI网络模型,使用边聚集系数给每一条蛋白质交互作用赋予一个存在概率测度,克服假阳性对实验结果的影响;第二,利用基于边聚集系数流行距离(FEC)策略改进谱聚类中的相似度计算,解决谱聚类算法对尺度参数敏感的问题,进而利用谱聚类算法对不确定PPI网络数据进行预处理,降低数据的维数,提高聚类的准确率;第三,设计基于密度的概率中心选取策略(DPCS)解决模糊C-means算法对初始聚类中心和聚类数目敏感的问题,并对预处理后的PPI数据进行FCM聚类,提高聚类的执行效率以及灵敏度;最后,采用改进的边期望稠密度(EED)对挖掘出的蛋白质功能模块进行过滤。在酵母菌DIP数据集上运行各个算法可知,FSC-FM与基于不确定图模型的检测蛋白质复合物(DCU)算法相比,F-measure值提高了27.92%,执行效率提高了27.92%;与在动态蛋白质相互作用网络中识别复合物的方法(CDUN)、演化算法(EA)、医学基因或蛋白质预测算法(MGPPA)相比也有更高的F-measure值和执行效率。实验结果表明,在不确定PPI网络中,FSC-FM适合用于功能模块的挖掘。

关 键 词:不确定数据  蛋白质相互作用  谱聚类算法  模糊  C-MEANS  功能模块  期望稠密度
收稿时间:2018-09-10
修稿时间:2018-11-04

Functional module mining in uncertain protein-protein interaction network based on fuzzy spectral clustering
MAO Yimin,LIU Yinping,LIANG Tian,MAO Dinghui. Functional module mining in uncertain protein-protein interaction network based on fuzzy spectral clustering[J]. Journal of Computer Applications, 2019, 39(4): 1032-1040. DOI: 10.11772/j.issn.1001-9081.2018091880
Authors:MAO Yimin  LIU Yinping  LIANG Tian  MAO Dinghui
Affiliation:1. School of Information Engineering, Jiangxi University of Science and Technology, Ganzhou Jiangxi, 341000, China;2. College of Applied Science, Jiangxi University of Science and Technology, Ganzhou Jiangxi, 341000, China;3
Abstract:Aiming at the problem that Protein-Protein Interaction (PPI) network functional module mining method based on spectral clustering and Fuzzy C-Means (FCM) clustering has low accuracy and low running efficiency, and is susceptible to false positive, a method for Functional Module mining in uncertain PPI network based on Fuzzy Spectral Clustering (FSC-FM) was proposed. Firstly, in order to overcome the effect of false positives, an uncertain PPI network was constructed, in which every protein-protein interaction was endowed with a existence probability measure by using edge aggregation coefficient. Secondly, based on edge aggregation coefficient and flow distance, the similarity calculation of spectral clustering was modified using Flow distance of Edge Clustering coefficient (FEC) strategy to overcome the sensitivity problem of the spectral clustering to the scaling parameters. Then the spectral clustering algorithm was used to preprocess the uncertain PPI network data, reducing the dimension of the data and improving the accuracy of clustering. Thirdly, Density-based Probability Center Selection (DPCS) strategy was designed to solve the problem that FCM algorithm was sensitive to the initial cluster center and clustering numbers, and the processed PPI data was clustered by using FCM algorithm to improve the running efficiency and sensitivity of the clustering. Finally, the mined functional module was filtered by Edge-Expected Density (EED) strategy. Experiments on yeast DIP dataset show that, compared with Detecting protein Complexes based on Uncertain graph model (DCU) algorithm, FSC-FM has F-measure increased by 27.92%, running efficiency increased by 27.92%; compared with an uncertain model-based approach for identifying Dynamic protein Complexes in Uncertain protein-protein interaction Networks (CDUN), Evolutionary Algorithm (EA) and Medical Gene or Protein Prediction Algorithm (MGPPA), FSC-FM also has higher F-measure and running efficiency. The experimental results show that FSC-FM is suitable for the functional module mining in the uncertain PPI network.
Keywords:uncertain data  Protein-Protein Interaction (PPI)  spectral clustering algorithm  Fuzzy C-Means (FCM)  functional module  expected density  
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号