首页 | 本学科首页   官方微博 | 高级检索  
     


A Hybrid Circular Queue Method for Iterative Stencil Computations on GPUs
Authors:Yang Yang    Hui-Min Cui    Xiao-Bing Feng    Jing-Ling Xue
Affiliation:1. State Key Laboratory of Computer Architecture,Institute of Computing Technology,Chinese Academy of Sciences Beijing 100190,China;Graduate University of Chinese Academy of Sciences,Beijing 100190,China
2. State Key Laboratory of Computer Architecture,Institute of Computing Technology,Chinese Academy of Sciences Beijing 100190,China
3. Programming Languages and Compilers Group,School of Computer Science and Engineering University of New South Wales,Sydney,NSW 2052,Australia
Abstract:In this paper, we present a hybrid circular queue method that can significantly boost the performance of stencil computations on GPU by carefully balancing usage of registers and shared-memory. Unlike earlier methods that rely on circular queues predominantly implemented using indirectly addressable shared memory, our hybrid method exploits a new reuse pattern spanning across the multiple time steps in stencil computations so that circular queues can be implemented by both shared memory and registers effectively in a balanced manner. We describe a framework that automatically finds the best placement of data in registers and shared memory in order to maximize the performance of stencil computations. Validation using four different types of stencils on three different GPU platforms shows that our hybrid method achieves speedups up to 2.93X over methods that use circular queues implemented with shared-memory only.
Keywords:stencil computation  circular queue  GPU  occupancy  register
本文献已被 CNKI 万方数据 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号