首页 | 本学科首页   官方微博 | 高级检索  
     

基于协同进化信息和深度学习的蛋白质功能预测
引用本文:王金雷,丁学明,秦琪琪,彭博雅.基于协同进化信息和深度学习的蛋白质功能预测[J].计算机应用研究,2023,40(12):3572-3577.
作者姓名:王金雷  丁学明  秦琪琪  彭博雅
作者单位:上海理工大学光电信息与计算机工程学院
基金项目:国家自然科学基金资助项目(11502145);
摘    要:蛋白质的功能对于理解细胞和生物的活动机制、研究疾病机理等至关重要。面对序列数据库的快速增长,传统的实验和序列对比方法不足以支撑大规模的蛋白质功能标注。为此,提出EGNet(evolutionary graph network)模型,采用蛋白质预训练语言模型ESM2和one-hot编码得到蛋白质序列编码,通过序列自注意力和物理计算整合出残基间的协同进化信息PI(paired interaction)和SPI(strong paired interaction);之后将两种进化信息和序列编码作为多层串联图卷积网络输入,学习序列编码节点特征,实现端到端的蛋白质功能预测。与早期方法相比,在ENZYME数据库中的EC(Enzyme Commission)类别标签上,EGNet获得了更好的性能,其F-score达到0.89,AUPR值达到0.91。结果表明,EGNet仅仅采用单条序列来预测蛋白质功能就可以得到良好的结果,从而能够提供快速且有效的蛋白质功能注释。

关 键 词:蛋白质功能  深度学习  协同进化信息  语言模型  图卷积神经网络
收稿时间:2023/4/3 0:00:00
修稿时间:2023/11/12 0:00:00

Protein function prediction based on coevolutionary information and deep learning
wangjinlei,dingxueming,qinqiqi and pengboya.Protein function prediction based on coevolutionary information and deep learning[J].Application Research of Computers,2023,40(12):3572-3577.
Authors:wangjinlei  dingxueming  qinqiqi and pengboya
Affiliation:University of Shanghai for Science and Technology,,,
Abstract:The function of protein is crucial for understanding the mechanisms of cellular and biological activities, as well as for studying the mechanisms of diseases. Traditional experimental and sequence alignment methods are insufficient to support large-scale protein functional annotation when in the face of the rapid growth of sequence databases. For this situation, this paper proposed EGNet model, which utilized the protein pre-training language model ESM2 and one-hot encoding to obtain the protein sequence encoding. The model integrated the coevolutionary information between residues, including PI and SPI, through sequence self-attention and physical calculations. Subsequently, the two types of coevolutionary information and the sequence encoding used in inputs for a multi-layered cascaded graph convolutional network to learn the node features of the sequence encoding and achieve end-to-end protein function prediction. Compared with earlier methods, EGNet achieves better performance on the EC category labels in the ENZYME database, which reaches 0.89 in the F-score and 0.91 in the AUPR. The results indicate that EGNet can achieve good performance by using only a single sequence to predict protein function, providing a rapid and effective method for protein function annotation.
Keywords:protein function  deep learning  coevolutionary information  language model  graph convolutional neural network
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号