首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于"基因表达谱"的并行聚类算法
引用本文:郎显宇,陆忠华,迟学斌.一种基于"基因表达谱"的并行聚类算法[J].计算机学报,2007,30(2):311-316.
作者姓名:郎显宇  陆忠华  迟学斌
作者单位:1. 中国科学院计算机网络信息中心超级计算中心,北京,100080;中国科学院研究生院,北京,100080
2. 中国科学院计算机网络信息中心超级计算中心,北京,100080
基金项目:国家自然科学基金 , 科技部科研项目
摘    要:跨物种的生物序列比较已经被广泛应用于基因功能预测,而越来越多的实验表明序列相似性并不足以保证基因功能相似.为了精确确定基因功能,不仅需要考虑序列性质,还需探索基因表达信息的特性,因为基因表达的改变往往伴随着基因功能的改变.通过聚类分析基因表达谱,可以直观判断协同表达基因及其规律,这是考察基因功能的重要一步.由于生物组织基因表达的复杂性,以及识别表达的microarray技术和理念的不断更新,表达数据的规模也呈指数规律递增,聚类分析遭遇了巨大瓶颈--过高的时空复杂度.根据"基因表达谱"的数据特征,对处理表达谱数据的分层聚类提出了一种并行分层聚类算法--PHCA,主要解决了并行设计的负载平衡问题,并实现了MPI平台的并行程序设计.并行程序性能分析表明,PHCA算法较大幅度降低了分层聚类算法的时空复杂度.

关 键 词:聚类分析  基因表达谱  分层聚类  负载平衡  基因表达谱  并行聚类算法  Patterns  Gene  Expression  Clustering  Algorithm  分析表  程序性能  程序设计  平台  平衡问题  负载  并行设计  分层聚类  谱数据  处理  数据特征  时空复杂度  瓶颈  递增  指数规律
修稿时间:2005-06-202006-09-18

A Parallel Clustering Algorithm of Gene Expression Patterns
LANG Xian-Yu,LU Zhong-Hua,CHI Xue-Bin.A Parallel Clustering Algorithm of Gene Expression Patterns[J].Chinese Journal of Computers,2007,30(2):311-316.
Authors:LANG Xian-Yu  LU Zhong-Hua  CHI Xue-Bin
Affiliation:1. Supercomputing Center , Computer Network Information Center, Chinese Academy of Sciences, Beijing 100080;2.Graduate University of Chinese Academy of Sciences, Beijing 100080
Abstract:Cross-species sequence comparison has been widely used to infer gene function, however, an increasing number of genetic studies apparently indicate that sequence similarity is not always proportional to gene functional similarity. In order to determine the function of a gene precisely, we need to investigate not only its sequence characteristics but also its expression information, since changes in gene expression may often be associated with changes in gene function. It is believed that clusters of gene expression patterns help to identify co-expressed genes and its regulations. Due to the complexity of gene expression as well as the updating microarray technology, the multi-dimensional dataset of gene expression patterns shows exponential increase and the performances of clustering algorithms are very critical. This paper proposes a Parallel Hierarchical Clustering Algorithm (PHCA) based on hierarchical clustering method and implements it via MPI. The algorithm focuses on solving the problem of load balance. The parallel performance analysis indicates that PHCA decreases the complexities of time and memory to a great extent.
Keywords:clustering analysis  gene expression patterns  hierarchical clustering  load balance
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号