首页 | 本学科首页   官方微博 | 高级检索  
     

基于单核苷酸统计和支持向量机集成的人类基因启动子识别
引用本文:徐文轩,张莉.基于单核苷酸统计和支持向量机集成的人类基因启动子识别[J].计算机应用,2015,35(10):2808-2812.
作者姓名:徐文轩  张莉
作者单位:1. 苏州大学 计算机科学与技术学院系, 江苏 苏州 215006;2. 江苏省计算机信息处理技术重点实验室(苏州大学), 江苏 苏州 215006
基金项目:国家自然科学基金资助项目(61373093);国家级大学生创新创业训练计划项目(201410285032);江苏省自然科学基金资助项目(BK20140008,BK201222725);江苏省高校自然科学研究项目(13KJA520001);江苏省"青蓝工程"资助项目;苏州大学大学生课外学术科研基金资助项目(KY2014687B,KY2015544B,KY2015818B);苏州大学敬文书院"3I工程"项目(29)。
摘    要:为高效地判别人类基因启动子,提出了一种基于单核苷酸统计和支持向量机集成的人类基因启动子识别算法。首先通过基因单核苷酸统计,从而将一个基因数据集分为C偏好和G偏好两个子集;然后分别对这两个子集提取DNA刚性特征、词频统计特征和CpG岛特征;最后采用多个支持向量机(SVM)集成的方式来学习这三种特征,并讨论了三种集成方式,包括单层SVM集成、双层SVM集成和级联SVM集成。实验结果表明所提算法能够提高人类基因启动子识别的敏感性和特异性,其中双层SVM集成的敏感性达到79.51%,且级联SVM集成的特异性高达84.58%。

关 键 词:CpG岛  DNA刚性  人类启动子识别  KL散度  单核苷酸统计  支持向量机  
收稿时间:2015-06-15
修稿时间:2015-06-27

Human promoter recognition based on single nucleotide statistics and support vector machine ensemble
XU Wenxuan,ZHANG Li.Human promoter recognition based on single nucleotide statistics and support vector machine ensemble[J].journal of Computer Applications,2015,35(10):2808-2812.
Authors:XU Wenxuan  ZHANG Li
Affiliation:1. School of Computer Science and Technology, Soochow University, Suzhou Jiangsu 215006, China;2. Provincial Key Laboratory for Computer Information Processing Technology (Soochow University), Suzhou Jiangsu 215006, China
Abstract:To efficiently discriminate the promoter in human genome, an algorithm for human promoter recognition based on single nucleotide statistics and Support Vector Machine (SVM) ensemble was proposed. Firstly, a gene dataset was divided into two subsets such as C-preferred and G-perferred subsets by using single nucleotide statistics. Secondly, DNA rigidity feature, word-based feature and CpG-island feature were extracted for each subset. Finally, these features were combined by using SVM ensemble learning. In addition, three ensemble ways were discussed, including single SVM ensemble, double-layer SVM ensemble and cascaded SVM ensemble. The experimental result shows that the proposed method can improve the sensitivity and specificity of human propoter recognition. Especially, the double-layer SVM ensemble can achieve the highest sensitivity of 79.51%, while the cascaded SVM ensemble has the highest specificity of 84.58%.
Keywords:CpG-island  DNA rigidity  human promoter recognition  Kullback-Leibler divergence  nucleotide statistics  Support Vector Machine (SVM)  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号