首页 | 本学科首页   官方微博 | 高级检索  
     

Parallel divide and conquer bio-sequence comparison based on Smith-Waterman algorithm
作者姓名:ZHANG Fa  QIAO Xiangzhen & LIU Zhiyong . Institute of Computing Technology  Chinese Academy of Sciences  Beijing  China  . National Natural Science Foundation of China  Beijing  China
作者单位:ZHANG Fa1,QIAO Xiangzhen1 & LIU Zhiyong2 1. Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100080,China; 2. National Natural Science Foundation of China,Beijing 100083,China
摘    要:Biological sequence (e.g. DNA sequence) can be treated as strings over some fixed alphabet of characters (a, c, t and g)1]. Sequence alignment is a procedure of comparing two or more sequences by searching for a series of individual characters that are in the same order in the sequences. Two-sequence alignment, pair-wise alignment, is a way of stacking one sequence above the other and matching characters from the two sequences thaat lie in the same column: identical characters are placed in …

收稿时间:22 April 2008

Parallel divide and conquer bio-sequence comparison based on smith-waterman algorithm
ZHANG Fa,QIAO Xiangzhen & LIU Zhiyong . Institute of Computing Technology,Chinese Academy of Sciences,Beijing ,China, . National Natural Science Foundation of China,Beijing ,China.Parallel divide and conquer bio-sequence comparison based on Smith-Waterman algorithm[J].Science in China(Information Sciences),2004,47(2):221-231.
Authors:Zhang Fa  Qiao Xiangzhen  Liu Zhiyong
Affiliation:Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100080, China
Abstract:Tools for pair-wise bio-sequence alignment have for long played a central role in computation biology. Several algorithms for bio-sequence alignment have been developed. The Smith-Waterman algorithm, based on dynamic programming, is considered the most fundamental alignment algorithm in bioinformatics. However the existing parallel Smith-Waterman algorithm needs large memory space, and this disadvantage limits the size of a sequence to be handled. As the data of biological sequences expand rapidly, the memory requirement of the existing parallel Smith- Waterman algorithm has become a critical problem. For solving this problem, we develop a new parallel bio-sequence alignment algorithm, using the strategy of divide and conquer, named PSW-DC algorithm. In our algorithm, first, we partition the query sequence into several subsequences and distribute them to every processor respectively, then compare each subsequence with the whole subject sequence in parallel, using the Smith-Waterman algorithm, and get an interim result, finally obtain the optimal alignment between the query sequence and subject sequence, through the special combination and extension method. Memory space required in our algorithm is reduced significantly in comparison with existing ones. We also develop a key technique of combination and extension, named the C&E method, to manipulate the interim results and obtain the final sequences alignment. We implement the new parallel bio-sequences alignment algorithm, the PSW-DC, in a cluster parallel system.
Keywords:biological sequence alignment  dynamic programming  divide and conquer  parallel  
本文献已被 CNKI 万方数据 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号