首页 | 本学科首页   官方微博 | 高级检索  
     

基于表示学习和深度森林的长链非编码RNA编码短肽预测模型
引用本文:纪腾其,孟军,赵思远,胡鹤还.基于表示学习和深度森林的长链非编码RNA编码短肽预测模型[J].计算机应用,2021,41(12):3614-3619.
作者姓名:纪腾其  孟军  赵思远  胡鹤还
作者单位:大连理工大学 计算机科学与技术学院,辽宁 大连 116024
基金项目:国家自然科学基金资助项目(61872055)
摘    要:长链非编码RNA(lncRNA)中的小开放阅读框(sORFs)能够编码长度不超过100个氨基酸的短肽。针对短肽预测研究中lncRNA中的sORFs特征不鲜明且高可信度数据尚不充分的问题,提出一种基于表示学习的深度森林(DF)模型。首先,使用常规lncRNA特征提取方法对sORFs进行编码;其次,通过自编码器(AE)进行表示学习来获得输入数据的高效表示;最后,训练DF模型实现对lncRNA编码短肽的预测。实验结果表明,该模型在拟南芥数据集上能够达到92.08%的准确率,高于传统机器学习模型、深度学习模型以及组合模型,且具有较好的稳定性;此外,在大豆与玉米数据集上进行的模型测试中,该模型的准确率分别能达到78.16%和74.92%,验证了所提模型良好的泛化能力。

关 键 词:长链非编码RNA  小开放阅读框  短肽  表示学习  深度森林  预测  
收稿时间:2021-05-12
修稿时间:2021-06-24

Prediction model of lncRNA-encoded short peptides based on representation learning and deep forest
JI Tengqi,MENG Jun,ZHAO Siyuan,HU Hehuan.Prediction model of lncRNA-encoded short peptides based on representation learning and deep forest[J].journal of Computer Applications,2021,41(12):3614-3619.
Authors:JI Tengqi  MENG Jun  ZHAO Siyuan  HU Hehuan
Affiliation:School of Computer Science and Technology,Dalian University of Technology,Dalian Liaoning 116024,China
Abstract:Small Open Reading Frames (sORFs) in long non-coding RNA (lncRNA) can encode short peptides with length no more than 100 amino acids. Aiming at the problem that the features of sORFs in lncRNA are not distinct and the data with high reliability are not enough in short peptide prediction research, a Deep Forest (DF) model based on representation learning was proposed. Firstly, the conventional lncRNA feature extraction method was used to encode the sORFs. Secondly, the AutoEncoder (AE) was used to perform representation learning to obtain highly efficient representation of the input data. Finally, a DF model was trained to predict the short peptides encoded by lncRNA. Experimental results show that the accuracy of this model can achieve 92.08% on Arabidopsis thalianadataset, which is higher than those of the traditional machine learning models , deep learning models and combined models, and this model has better stability. In addition, the prediction accuracy of this method can reach 78.16% and 74.92% on Glycine max and Zea mays datasets respectively, verifying the good generalization ability of the proposed model.
Keywords:long non-coding RNA (lncRNA)  small Open Reading Frames (sORFs)  short peptide  representation learning  Deep Forest (DF)  prediction  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号