首页 | 本学科首页   官方微博 | 高级检索  
     

CL-RBF:一种基于改进ML-RBF的蛋白质亚细胞多点定位预测算法
引用本文:薛卫,洪晓宇,胡雪娇,陈行健,张梁.CL-RBF:一种基于改进ML-RBF的蛋白质亚细胞多点定位预测算法[J].食品与生物技术学报,2020,39(2):66-73.
作者姓名:薛卫  洪晓宇  胡雪娇  陈行健  张梁
作者单位:南京农业大学 信息科学技术学院,江苏 南京 210095;江南大学 粮食发酵工艺与技术国家工程实验室,江苏 无锡 214122
基金项目:国家重点研发计划项目(2017YFD0800204);国家“十二五”科技支撑计划项目(2015BAK36B05);江苏省自然科学基金项目(BK2012363);中央高校基本科研业务费专项资金项目(Y0201600175)。
摘    要:综合考虑标记内和标记间的聚类结果对多目标学习径向基神经网络算法(RBF Neural Networks for Multi-Label Learning,ML-RBF)的影响,提出CL-RBF算法并应用到蛋白质亚细胞多点定位预测中。通过引入轮廓系数(Silhouette Coefficient)对ML-RBF隐层中心的个数进行优化,并通过分析标记间聚类结果的关系,对小于某一阈值的标记间的聚类中心重新聚类,使用梯度下降算法进行参数调整,最后依据测试样本与标记L的隐层中心和不属于标记L的样本生成的聚类中心的欧式距离差调整预测结果。在10折交叉验证下,采用词袋模型(Bag of Words)和氨基酸组成法(Amino acid composition,AAC)结合的方式提取特征向量,选取另外4种多目标学习算法作对比实验,根据不同评价指标的结果,得出CL-RBF算法在4个多标记数据集上的综合性能最优的结论。本研究预测算法通过网站https://njau.applinzi.com/homepage_final.jsp实现。

关 键 词:ML-RBF  亚细胞定位  轮廓系数  词袋模型

CL-RBF:An Improved ML-RBF Method for Prediction of Protein Subcellular Location
XUE Wei,HONG Xiaoyu,HU Xuejiao,CHEN Xingjian,ZHANG Liang.CL-RBF:An Improved ML-RBF Method for Prediction of Protein Subcellular Location[J].Journal of Food Science and Biotechnology,2020,39(2):66-73.
Authors:XUE Wei  HONG Xiaoyu  HU Xuejiao  CHEN Xingjian  ZHANG Liang
Affiliation:(School of Information Science and Technology,Nanjing Agricultural University,Nanjing 210095,China;National Engineering Laboratory for Cereal Fermention Technology,Jiangnan University,Wuxi 214122,China)
Abstract:CL-RBF algorithm was proposed to predict the protein subcellular localization,which is considered about cluster results within one label and between different labels of the ML-RBF method.Silhouette coefficient was introduced to get the optimal number of centroids on hidden layer.The previous approach only considered optimization of clustering algorithms within the same label.In this paper,larger distance between two centroids which were generated from two labels was taken into account,when there were less samples covering these two labels.Besides,gradient descent algorithm was used to adjust the parameters.The final adjustment was made by analyzing the distance between train samples,the hidden centers obtained by label L and the clustering centers not belonging to label L.Bag of words and AAC method were employed to extract the feature of protein sequence.Compared with the methods which have been introduced previously for bacterial protein subcellular localization prediction via 10-fold cross-validation test,the new predictor performed more powerful and flexible on four different multi-label datasets.The prediction server was available on https://njau.applinzi.com/homepage_final.jsp.
Keywords:ML-RBF  protein subcellular localization  silhouette coefficient  Bag of Words
本文献已被 CNKI 维普 等数据库收录!
点击此处可从《食品与生物技术学报》浏览原始摘要信息
点击此处可从《食品与生物技术学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号