Concurrent warp execution: improving performance of GPU-likely SIMD architecture by increasing resource utilization |
| |
Authors: | Hong Jun Choi Dong Oh Son Jong Myon Kim Cheol Hong Kim |
| |
Affiliation: | 1. School of Electronics and Computer Engineering, Chonnam National University, Gwangju, Korea 2. School of Electrical Engineering, University of Ulsan, Ulsan, Korea
|
| |
Abstract: | Hardware parallelism should be exploited to improve the performance of computing systems. Single instruction multiple data (SIMD) architecture has been widely used to maximize the throughput of computing systems by exploiting hardware parallelism. Unfortunately, branch divergence due to branch instructions causes underutilization of computational resources, resulting in performance degradation of SIMD architecture. Graphics processing unit (GPU) is a representative parallel architecture based on SIMD architecture. In recent computing systems, GPUs can process general-purpose applications as well as graphics applications with the help of convenient APIs. However, contrary to graphics applications, general-purpose applications include many branch instructions, resulting in serious performance degradation of GPU due to branch divergence. In this paper, we propose concurrent warp execution (CWE) technique to reduce the performance degradation of GPU in executing general-purpose applications by increasing resource utilization. The proposed CWE enables selecting co-warps to activate more threads in the warp, leading to concurrent execution of combined warps. According to our simulation results, the proposed architecture provides a significant performance improvement (5.85 % over PDOM, 91 % over DWF) with little hardware overhead. |
| |
Keywords: | |
本文献已被 SpringerLink 等数据库收录! |
|