首页 | 本学科首页   官方微博 | 高级检索  
     


Performance analysis and optimization strategies for a D3Q19 lattice Boltzmann kernel on nVIDIA GPUs using CUDA
Authors:J. Habich  T. ZeiserG. Hager  G. Wellein
Affiliation:Erlangen Regional Computing Center (RRZE), University of Erlangen-Nuremberg, Martensstr. 1, 91058 Erlangen, Germany
Abstract:This paper presents implementation strategies and optimization approaches for a D3Q19 lattice Boltzmann flow solver on nVIDIA graphics processing units (GPUs). Using the STREAM benchmarks we demonstrate the GPU parallelization approach and obtain an upper limit for the flow solver performance. We discuss the GPU-specific implementation of the solver with a focus on memory alignment and register shortage. The optimized code is up to an order of magnitude faster than standard two-socket x86 servers with AMD Barcelona or Intel Nehalem CPUs. We further analyze data transfer rates for the PCI-express bus to evaluate the potential benefits of multi-GPU parallelism in a cluster environment.
Keywords:CFD   GPU   Lattice Boltzmann   HPC   STREAM benchmarks   CUDA
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号