Auto-tuned Krylov methods on cluster of graphics processing unit期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Auto-tuned Krylov methods on cluster of graphics processing unit

Authors:	Frédéric Magoulès Abal-Kassim Cheik Ahamed Roman Putanowicz

Affiliation:	1. Ecole Centrale Paris, Paris, France;2. Institute for Computational Civil Engineering, Cracow University of Technology, Cracow, Poland

Abstract:	Exascale computers are expected to have highly hierarchical architectures with nodes composed by multiple core processors (CPU; central processing unit) and accelerators (GPU; graphics processing unit). The different programming levels generate new difficult algorithm issues. In particular when solving extremely large linear systems, new programming paradigms of Krylov methods should be defined and evaluated with respect to modern state of the art of scientific methods. Iterative Krylov methods involve linear algebra operations such as dot product, norm, addition of vectors and sparse matrix–vector multiplication. These operations are computationally expensive for large size matrices. In this paper, we aim to focus on the best way to perform effectively these operations, in double precision, on GPU in order to make iterative Krylov methods more robust and therefore reduce the computing time. The performance of our algorithms is evaluated on several matrices arising from engineering problems. Numerical experiments illustrate the robustness and accuracy of our implementation compared to the existing libraries. We deal with different preconditioned Krylov methods: Conjugate Gradient for symmetric positive-definite matrices, and Generalized Conjugate Residual, Bi-Conjugate Gradient Conjugate Residual, transpose-free Quasi Minimal Residual, Stabilized BiConjugate Gradient and Stabilized BiConjugate Gradient (L) for the solution of sparse linear systems with non symmetric matrices. We consider and compare several sparse compressed formats, and propose a way to implement effectively Krylov methods on GPU and on multicore CPU. Finally, we give strategies to faster algorithms by auto-tuning the threading design, upon the problem characteristics and the hardware changes. As a conclusion, we propose and analyse hybrid sub-structuring methods that should pave the way to exascale hybrid methods.

Keywords:	Krylov methods iterative methods linear algebra sparse matrix–vector product GPU CUDA auto-tuning compressed-sparse row format ELLPACK (ELL) format hybrid (HYB) format coordinate (Coo) format Cusp CUSPARSE CUBLAS

设为首页 | 免责声明 | 关于勤云 | 加入收藏