首页 | 本学科首页   官方微博 | 高级检索  
     

GPU平台上面向性能和功耗的分支优化
引用本文:于齐,王博千,沈立,王志英,陈微.GPU平台上面向性能和功耗的分支优化[J].计算机科学,2016,43(5):22-26.
作者姓名:于齐  王博千  沈立  王志英  陈微
作者单位:国防科学技术大学计算机学院 长沙410073,国防科学技术大学计算机学院 长沙410073,国防科学技术大学计算机学院 长沙410073,国防科学技术大学计算机学院 长沙410073,国防科学技术大学计算机学院 长沙410073
基金项目:本文受国家自然科学基金项目(61472431,61202121),教育部高等学校博士点新教师基金项目(20114307120013)资助
摘    要:强大的计算能力使得GPGPU在通用计算领域得到了广泛的应用。然而,GPGPU的SIMT(Single Instruction Multiple Threads)工作方式,使其执行效率受到应用中不一致分支行为(Branch Divergence)的严重影响。虽然人们提出了线程交换方法来减小分支带来的性能损失,但这种方法往往会引入额外的访存操作,不仅在一定程度上减少了线程交换优化的性能收益,还增加了功耗。首先举例说明线程交换范围对程序性能和功耗的影响;然后提出了一种减少线程交换所引入的额外访存操作的方法。实验表明,对于Reduction程序,当交换范围为256时,在性能平均损失为4%的情况下功耗降低幅度最大为7%;而对于Bitonic程序,当交换范围为256和512时,在没有功耗开销的情况下,性能分别最大提升了6.4%和5.3%。

关 键 词:不一致分支行为  访存  线程交换
收稿时间:2015/5/18 0:00:00
修稿时间:2015/7/18 0:00:00

Branch Divergence Optimization for Performance and Power Consumption on GPU Platform
YU Qi,WANG Bo-qian,SHEN Li,WANG Zhi-ying and CHEN Wei.Branch Divergence Optimization for Performance and Power Consumption on GPU Platform[J].Computer Science,2016,43(5):22-26.
Authors:YU Qi  WANG Bo-qian  SHEN Li  WANG Zhi-ying and CHEN Wei
Affiliation:School of Computer,National University of Defense Technology,Changsha 410073,China,School of Computer,National University of Defense Technology,Changsha 410073,China,School of Computer,National University of Defense Technology,Changsha 410073,China,School of Computer,National University of Defense Technology,Changsha 410073,China and School of Computer,National University of Defense Technology,Changsha 410073,China
Abstract:Because of the tremendous computing power,general purpose graphics processing units(GPGPUs) have been widely accepted in general purpose computing area.However,as GPGPUs using an execution model called SIMT(Single Instruction Multiple Threads),their efficiency is subject to the presence of branch divergence in a GPU application.People have proposed a method based on thread swapping to reduce the performance loss brought by branch divergence,but these methods always bring extra memory accesses in return,which not only decrease the performance gains to a certain degree,but also increase power consumption.Firstly,an example was used to explain the influence thread swapping range has on performance and power consumption of a program.Secondly,a method was proposed to reduce the extra memory accesses brought by thread swapping.Experiments show that,for Reduction,this method reduces power consumption by 7% with average performance loss by 4% when swapping range is 256.While for Bitonic,this method improves performance by 6.4% and 5.3% when swapping range is 256 and 512 with no power consumption overheads,respectively.
Keywords:Branch divergence  Memory access  Thread swapping
点击此处可从《计算机科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号