首页 | 本学科首页   官方微博 | 高级检索  
     

基于GPU的高性能稀疏矩阵向量乘及CG求解器优化
引用本文:王迎瑞,任江勇,田 荣. 基于GPU的高性能稀疏矩阵向量乘及CG求解器优化[J]. 计算机科学, 2013, 40(3): 46-49
作者姓名:王迎瑞  任江勇  田 荣
作者单位:(中国科学院计算技术研究所 北京100190)
摘    要:以有限元/有限差分等为代表的一类数值方法,其总体矩阵常常具有“带状”、稀疏的特点。针对“带状”稀疏矩阵,提出和实现了一种高效的矩阵向量乘存储格式和算法“bDIA"。基于nVidia的GTX280系列GPU对其进行了测试,结果显示:与CUSP支持的5种常见稀疏矩阵存储格式和算法相比较,所提出的bDIA格式以及相应的spMV算法的单双精度浮点效率均可以提高1倍以上,并突破了该系列GPU在spMV计算时4%的单精度浮点效率上限和22.2%的双精度浮点效率上限;应用于共扼梯度(CG)与稳定双共扼梯度(BiCGStab)求解器,相对于DIA格式均有1.5倍左右的加速。

关 键 词:带状稀疏矩阵向量乘  bDIA  广义有限元  CPU   CG求解器优化

Efficient Sparse Matrix-vector Multiplication and CG Solver Optimization on GPU
Abstract:Numerical methods of PDEs are mostly "compactly supported",say finite element,and finite difference methods etc. Due to the compact support, the global matrix associated with those numerical methods for scientific and engineering are sparse and very often also band shaped. We proposed and developed a high performance spMV algorithm for this specific but widely used sparse matrix type. The new algorithm, termed "bDIA (banded diagnal)", is implemented on NVIDIA GTX 285. Detailed comparisons with the five other mostly used sparse matrix formats/algorithms supported in the open source cuda linear algebra library (CUSP) show that bDIA doubles the best performance of the other five algorithms, breaking the float point efficiency limit of 4 0 o for single precision and 22.2% for double-precision. The conjugate gradient (CG) and the bi-conjugate gradient stabilized (BiCGStab) solvers both gain a speedup of around 1.5 using the proposed "bDIA" format/algorithm.
Keywords:Banded sparse matrix-vector multiplication   bDIA   GFEM   GPU   CG solver optimization
点击此处可从《计算机科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号