首页 | 官方网站   微博 | 高级检索  
     

多特征融合的lncRNA识别与其功能预测
引用本文:常征,孟军,施云生,莫冯然.多特征融合的lncRNA识别与其功能预测[J].智能系统学报,2018,13(6):928-934.
作者姓名:常征  孟军  施云生  莫冯然
作者单位:大连理工大学 计算机科学与技术学院, 辽宁 大连 116023
摘    要:针对传统的基于单一特征的植物lncRNA识别的局限性,提出了融合RNA序列的开放阅读框、二级结构以及k-mers等多特征方法,训练高斯朴素贝叶斯、支持向量机和梯度提升决策树3种经典的分类模型,并实现分类结果的集成,利用交叉验证对模型的性能进行了评估,整体性能优于目前较流行的CPAT、CNCI和PLEK预测软件,在拟南芥数据集上总体的准确率达到了89%。另外,基于内源性竞争规则以及RNA结构信息,分别对lncRNA-microRNA和microRNA-mRNA进行靶向预测、筛选,再通过整合预测数据建立互作网络,并对网络模块中的lncRNA进行功能预测。通过GO术语分析,对与mRNA相关的lncRNA可能参与的生物调控过程进行预测,推测它们的相应功能。

关 键 词:lncRNA  识别  特征提取  多特征融合  机器学习  互作关系  网络构建  功能预测

LncRNA recognition by fusing multiple features and its function prediction
CHANG Zheng,MENG Jun,SHI Yunsheng,MO Fengran.LncRNA recognition by fusing multiple features and its function prediction[J].CAAL Transactions on Intelligent Systems,2018,13(6):928-934.
Authors:CHANG Zheng  MENG Jun  SHI Yunsheng  MO Fengran
Affiliation:School of Computer Science and Technology, Dalian University of Technology, Dalian 116023, China
Abstract:Considering the limitations of the traditional plant lncRNA identification based on a single feature, in this paper, a method, in which the open reading frame, secondary structure, and k-mers features of RNA sequences are integrated, is proposed. It involves the training of three classical classification models, Gaussian naive Bayes, support vector machines, and gradient lifting decision tree, and integrating the classification results. The performance of the method was evaluated using cross-validation, and it exhibited superior performance. The accuracy of the proposed method reached 89% when tested with the Arabidopsis thaliana dataset. Using the same dataset, the proposed method outperformed the popular CPAT, CNCI, and PLEK prediction software. In addition, based on the endogenous competition rules and RNA structure information, target prediction and filter rules for lncRNA-microRNA and microRNA-mRNA pairs were executed, and then related tools were used to establish RNA interaction regulatory networks, and the regulatory relationship was analyzed to predict the functions of lncRNAs in modules. Through Gene Ontology term analysis, the possible biological regulation function of lncRNAs can be predicted, and their corresponding functions can be inferred.
Keywords:lncRNA  identification  feature extraction  multiple features fusion  machine learning  interrelationship  network construction  function prediction
点击此处可从《智能系统学报》浏览原始摘要信息
点击此处可从《智能系统学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号