首页 | 本学科首页   官方微博 | 高级检索  
     

邻域种子的启发式454序列聚类方法
引用本文:陈伟,程咏梅,张绍武,潘泉.邻域种子的启发式454序列聚类方法[J].软件学报,2014,25(5):929-938.
作者姓名:陈伟  程咏梅  张绍武  潘泉
作者单位:西北工业大学 自动化学院, 陕西 西安 710072;Department of Biostatistics, Yale University, USA;西北工业大学 自动化学院, 陕西 西安 710072;西北工业大学 自动化学院, 陕西 西安 710072;西北工业大学 自动化学院, 陕西 西安 710072
基金项目:国家自然科学基金(61170134,61135001);航空基金(20100853010);西安市科技计划(CXY1350(2));西北工业大学博士创新基金(cx201017)
摘    要:随着二代测序技术的发展,产生了海量16S rRNA基因序列数据.如何有效地挖掘这些数据中隐藏的基因组学信息,是当前研究的热点与难点.序列聚类研究如何将来源于同一物种的序列合并在一起,其构成了物种多样性、结构及功能多样性研究的基础.针对454测序误差的来源特点,提出一种基于邻域种子序列的启发式序列聚类算法(NbHClust).实验结果表明,该算法具有良好的鲁棒性能.与传统启发式序列聚类算法相比,该算法能够降低操作分类单元(operational taxonomy unit,简称OTU)过估计问题,提高聚类精度,有效地进行操作分类单元计算.

关 键 词:二代测序技术  操作分类单元  物种多样性  16S  rRNA基因  序列聚类
收稿时间:2013/7/10 0:00:00
修稿时间:2013/12/3 0:00:00

Heuristic Clustering Method Based on Neighbor-Seeds for 454 Sequencing Data
CHEN Wei,CHENG Yong-Mei,ZHANG Shao-Wu and PAN Quan.Heuristic Clustering Method Based on Neighbor-Seeds for 454 Sequencing Data[J].Journal of Software,2014,25(5):929-938.
Authors:CHEN Wei  CHENG Yong-Mei  ZHANG Shao-Wu and PAN Quan
Affiliation:College of Automation, Northwestern Polytechnical University, Xi'an 710072, China;Department of Biostatistics, Yale University, USA;College of Automation, Northwestern Polytechnical University, Xi'an 710072, China;College of Automation, Northwestern Polytechnical University, Xi'an 710072, China;College of Automation, Northwestern Polytechnical University, Xi'an 710072, China
Abstract:With the development of next-generation sequencing technology, a large number of 16S rRNA gene reads have been collected. A key and important issue is to develop novel methods for mining the hidden information among those data. Sequence clustering aims to find the natural groups of large-scale data which can help us to understand the species, functional and structural diversity of microbial communities. This present work proposes a heuristic clustering method based on Neighbor-seeds, named NbHClust, for 454 sequencing data. The results show that this method can reduce extent of overestimation of operational taxonomy unit (OTU) and have a good robust and high clustering accuracy.
Keywords:second-generation sequencing technology  operational taxonomy unit  species diversity  16S rRNA gene  sequenceclustering
本文献已被 CNKI 等数据库收录!
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号