基于Hash索引的高通量基因序列比对并行加速技术研究 Parallel Accelerator Design for High-Throughput DNA Sequence Alignment with Hash-Index期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于Hash索引的高通量基因序列比对并行加速技术研究

引用本文：	王文迪, 汤文, 段勃, 张春明, 张佩珩, 孙凝晖. 基于Hash索引的高通量基因序列比对并行加速技术研究[J]. 计算机研究与发展, 2013, 50(11): 2463-2471.

作者姓名：	王文迪汤文段勃张春明张佩珩孙凝晖

摘要：	近年来随着高通量基因测序技术的迅速发展，测序成本和周期都得到了大幅降低.然而，新一代测序技术海量数据生成能力以及各类测序算法蕴含的高并发性却对现有计算机的运算能力提出了新挑战.以一个基于Hash索引算法实现的开源重测序程序(PerM)为例，研究了在商用多核CPU上加速该应用程序的关键技术.在一个64核SMP系统上的实验结果证明，提出的优化技术可以使Cache缺失率降低90％，性能提升4~11倍.接下来探讨了在一个包含Xilinx LX330 FPGA的加速卡上设计实现专用并行加速系统的相关问题.作为原型验证系统，在基于FPGA的PCIe加速卡上设计并实现了包含11个处理单元的脉动陈列并行计算系统.和Intel Xeon X7550 8核CPU相比，提出的并行加速器有30~65倍性能功耗比优势.
关键词：	Hash索引生物信息学高通量测序 FPGA 并行加速器
Parallel Accelerator Design for High-Throughput DNA Sequence Alignment with Hash-Index

Wang Wendi, Tang Wen, Duan Bo, Zhang Chunming, Zhang Peiheng, Sun Ninghui. Parallel Accelerator Design for High-Throughput DNA Sequence Alignment with Hash-Index[J]. Journal of Computer Research and Development, 2013, 50(11): 2463-2471.

Authors:	Wang Wendi Tang Wen Duan Bo Zhang Chunming Zhang Peiheng Sun Ninghui

Abstract:	In recent years, due to the rapid development of high-throughput next generation sequencing (NGS) technologies, the sequencing cost and time have been greatly reduced. However, both the explosion of the generated NGS data and the massively parallel computation pose great challenges to the capability of existing computers. We take an open-source re-sequencing algorithm based on hash-index, called PerM, as an example to investigate the optimizations for accelerating NGS with commercial multi-core CPUs as well as with customized parallel architectures. Firstly, we optimize the original algorithm by reordering the bucket accessing sequences so that data locality in shared cache is improved. Secondly, to exclude the empty hash buckets, we propose a hash-index compression algorithm, which coincides with the sequential access nature of the optimized algorithm. The experiments on a 64-cores SMP (Intel Xeon X7550) show that the optimized algorithm reduces LLC miss ratio to about 10% of the original algorithm, therefore the overall performance can be improved by 4 to 11 times. Furthermore, a parallel accelerator architecture is designed and evaluated on our customized FPGA accelerator card with a Xilinx LX330 FPGA resident. As a prototype, a systolic array of 100 PEs is built, which operates at 175MHz. The performance of the proposed parallel accelerator architecture is justified by the reported speedup of 30 to 65 times over an 8-cores CPU.

Keywords:	Hash-index bioinformatics high-throughput sequencing FPGA parallel accelerator

	点击此处可从《计算机研究与发展》浏览原始摘要信息
	点击此处可从《计算机研究与发展》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏