SGA: a grammar-based alignment algorithm |
| |
Authors: | Hu Guangyue Shen Shiyi Ruan Jishou |
| |
Affiliation: | College of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, PR China. huguangyue@gmail.com |
| |
Abstract: | As the cost of genome sequencing continues to drop, comparison of large sequences becomes tantamount to our understanding of evolution and gene function. Rapid genome alignment stands to play a fundamental role in furthering biological understanding. In 2002, a fast algorithm based on statistical estimation called super pairwise alignment (SPA) was developed by Shen et al. The method was proved to be much faster than traditional dynamic programming algorithms, while it suffered small drop in accuracy. In this paper, we propose a new method based on SPA that target analysis of large-scale genomes. The new method, named super genome alignment (SGA), applies Yang-Keiffer coding theory to alignment and results in a grammar-based algorithm. SGA has the same computational complexity as its predecessor SPA, and it can process large-scale genomes. SGA is tested by using numerous pairs of microbial and eukaryotic genomes, which serve as the benchmark to compare it with the competing BLASTZ method. When compared with BLASTZ, the result shows that SGA is significantly faster by at least an order of magnitude (for some genome pairs the differences is as large at two orders of magnitude), and suffers on average only about 1% loss of the similarity of alignment. |
| |
Keywords: | |
本文献已被 PubMed 等数据库收录! |
|