首页 | 本学科首页   官方微博 | 高级检索  
     

CuMen:基于最大频繁序列模式的聚类算法及其在基因拼接中的应用
引用本文:黄东,唐俊,汪卫,施伯乐.CuMen:基于最大频繁序列模式的聚类算法及其在基因拼接中的应用[J].计算机科学,2005,32(10):149-153.
作者姓名:黄东  唐俊  汪卫  施伯乐
作者单位:复旦大学计算机与信息技术系,上海200433
基金项目:本课题得到教委高校网格项目200309和上海科委重大项目03dz15027资助.
摘    要:基因组序列拼接的主流方法是将整条序列随机打断成小片段,然后根据片段间重叠关系连接成长序列.由于较多噪音存在,算法复杂度高,加之生物数据的海量增长,序列拼接处理导致巨大的时空开销而无法完成.本文提出一种基于最大频繁序列模式的聚类算法,将整个数据集分成若干个子集,分别高效地处理,实现了一个基因拼接网格系统、透明动态的资源管理,大大扩展了基因拼接计算能力.基于最大频繁序列模式聚类算法及挖掘算法,针对生物数据的特性做出了优化.

关 键 词:最大频繁序列模式  序列聚类  序列拼接  网格  基因组序列  序列模式  拼接处理  聚类算法  应用  生物数据  算法复杂度  网格系统  资源管理

CuMen: Clustering Sequences Based on Maximal Frequent Sequential Pattern and its Application in Genome Sequence Assembly
HUANG Dong,TANG Jun,WANG Wei,SHI Bai-Le.CuMen: Clustering Sequences Based on Maximal Frequent Sequential Pattern and its Application in Genome Sequence Assembly[J].Computer Science,2005,32(10):149-153.
Authors:HUANG Dong  TANG Jun  WANG Wei  SHI Bai-Le
Affiliation:Department of Compater Science and Engineering, Fudan University, Shanghai 200433
Abstract:Sequencing genomes is a fundamental aspect of biological research. A variety of assembly programs have been previously proposed and implemented. Because of great computational complexity and increasingly large size, they incur great time and space overhead. In realistic applications, sequencing process might come to become unacceptably slow for insufficient memory even with a mainframe with huge RAM. This paper offeres a clustering algorithm based on maximal frequent sequential patterns,aiming at divide the whole dataset into several parts which can be processed independently and efficiently in limited memory. Some techniques are applied to optimize the mining and clustering procedure. This approach is introduced into grid environment, exploiting parallelism and distribution for improving scalability further.
Keywords:Maximal frequent sequential pattern  Sequence clustering  Sequence assembly  Grid
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《计算机科学》浏览原始摘要信息
点击此处可从《计算机科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号