首页 | 本学科首页   官方微博 | 高级检索  
     

DNA序列数据挖掘技术
引用本文:朱扬勇,熊赟.DNA序列数据挖掘技术[J].软件学报,2007,18(11):2766-2781.
作者姓名:朱扬勇  熊赟
作者单位:1. 复旦大学,计算机与信息技术系,上海,200433;上海生物信息技术研究中心,上海,201203
2. 复旦大学,计算机与信息技术系,上海,200433
基金项目:国家自然科学基金;国家高技术研究发展计划(863计划)
摘    要:DNA序列数据是一类重要的生物数据.研究DNA序列数据解读其含义是后基因组时代的主要研究任务.数据挖掘是目前最有效的数据分析手段之一,用于发现大量数据所隐含的各种规律,也是生物信息学采用的主要数据分析技术.将数据挖掘技术用于DNA序列数据分析,已得到了广泛关注和快速发展,并取得了许多研究成果.综述了DNA序列数据挖掘领域的研究状况和进展,提出了3个研究阶段:基于统计的挖掘方法应用阶段、一般化挖掘方法应用阶段和专门的DNA序列数据挖掘方法设计阶段.阐述了DNA序列数据挖掘的基础是序列相似性,评述了DNA序列数据挖掘领域所采用的关键技术,包括DNA序列模式、关联、聚类、分类和异常挖掘等,分析讨论了其相应的生物应用背景和意义.最后给出DNA序列数据挖掘进一步研究的热点问题,包括DNA序列数据新的存储和索引机制的设计、根据生物领域知识的数据挖掘新模型和算法的设计等.

关 键 词:DNA序列  数据挖掘  生物信息学  序列模式  序列相似性
收稿时间:2007-01-23
修稿时间:2007-04-25

DNA Sequence Data Mining Technique
ZHU Yang-Yong and XIONG Yun.DNA Sequence Data Mining Technique[J].Journal of Software,2007,18(11):2766-2781.
Authors:ZHU Yang-Yong and XIONG Yun
Affiliation:1.Department of Computer and Information Technology, Fudan University, Shanghai 200433, China; 2.Shanghai Center for Bioinformation Technology, Shanghai 201203, China
Abstract:DNA sequence is one of the basic and important data among biological data.Researching DNA sequence data and then comprehending life essential is a necessary task in post-genomie era.At present,data mining technique is one of the most efficient data analysis means,which finds out information hidden in data.It has also become main data analysis technique adopted in Bioinformatics.It has been applied in DNA sequence analysis, which has got wide attention and rapid development.And considerable research achievements have emerged. Provides an overview of research progress in DNA sequence data mining field.In more detail,it proposes three research phases including statistics-based data mining methods application,general data mining methods application,and specialized DNA sequence-oriented data mining methods design,and then elaborates that sequence similarity is foundation of DNA sequence data mining technique.It also analyzes and comments some key techniques in this field by combining with biological background,such as DNA sequential pattern,association, clustering,classification and outlier mining.Finally,future work and open issues are given,including the research of a novel storage model and index methods,the design of data mining algorithm based on biological domain knowledge.
Keywords:DNA sequence  data mining  bioinformatics  sequential pattern  sequence similarity
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号