首页 | 本学科首页   官方微博 | 高级检索  
     


Performance modeling and optimization of sparse matrix-vector multiplication on NVIDIA CUDA platform
Authors:Shiming Xu  Wei Xue  Hai Xiang Lin
Affiliation:1. Mekelweg 4, 2628 CD, Delft, The Netherlands
2. Tsinghua University, RM. 8-210, East Main Bldg., 100084, Beijing, China
3. Mekelweg 4, 2628 CD, Delft, The Netherlands
Abstract:In this article, we discuss the performance modeling and optimization of Sparse Matrix-Vector Multiplication ( ) on NVIDIA GPUs using CUDA. has a very low computation-data ratio and its performance is mainly bound by the memory bandwidth. We propose optimization of based on ELLPACK from two aspects: (1) enhanced performance for the dense vector by reducing cache misses, and (2) reduce accessed matrix data by index reduction. With matrix bandwidth reduction techniques, both cache usage enhancement and index compression can be enabled. For GPU with better cache support, we propose differentiated memory access scheme to avoid contamination of caches by matrix data. Performance evaluation shows that the combined speedups of proposed optimizations for GT-200 are 16% (single-precision) and 12.6% (double-precision) for GT-200 GPU, and 19% (single-precision) and 15% (double-precision) for GF-100 GPU.
Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号