首页 | 本学科首页   官方微博 | 高级检索  
     

面向规则DOACROSS循环的流水并行代码自动生成
引用本文:刘晓娴,赵荣彩,赵捷,徐金龙.面向规则DOACROSS循环的流水并行代码自动生成[J].软件学报,2014,25(6):1154-1168.
作者姓名:刘晓娴  赵荣彩  赵捷  徐金龙
作者单位:中国人民解放军信息工程大学, 河南 郑州 450002;数学工程与先进计算国家重点实验室, 河南 郑州 450002;中国人民解放军信息工程大学, 河南 郑州 450002;数学工程与先进计算国家重点实验室, 河南 郑州 450002;中国人民解放军信息工程大学, 河南 郑州 450002;数学工程与先进计算国家重点实验室, 河南 郑州 450002;中国人民解放军信息工程大学, 河南 郑州 450002;数学工程与先进计算国家重点实验室, 河南 郑州 450002
基金项目:“核高基”国家科技重大专项(2009ZX01036-001-001-2)
摘    要:发掘DOACROSS 循环中蕴含的并行性,选择合适的策略将其并行执行,对提升程序的并行性能非常重要.流水并行方式是规则DOACROSS 循环并行的重要方式.自动生成性能良好的流水并行代码是一项困难的工作,并行编译器对程序自动并行时常常对DOACROSS 循环作保守处理,损失了DOACROSS 循环包含的并行性,限制了程序的并行性能.针对上述问题,设计了一种选择计算划分循环层和循环分块层的启发式算法,给出了一个基于流水并行代价模型的循环分块大小计算公式,并使用计数信号量进行并行线程之间的同步,实现了基于OpenMP 的规则DOACROSS 循环流水并行代码的自动生成.通过对有限差分松弛法(finite difference relaxation,简称FDR)的波前(wavefront)循环和时域有限差分法(finite difference time domain,简称FDTD)中典型循环以及程序Poisson,LU 和Jacobi 的测试,算法自动生成的流水并行代码能够在多核处理器上获得明显的性能提升,使用的流水分块大小计算公式能够较为精确地计算出循环流水并行时的最佳分块大小.自动生成的流水并行代码与基于手工选择的最优分块大小的流水并行代码相比,加速比达到手工选择加速比的89%.

关 键 词:流水并行  自动并行  DOACROSS  循环  代价模型
收稿时间:2012/12/10 0:00:00
修稿时间:3/7/2013 12:00:00 AM

Automatic Generation of Pipeline Parallel Code for Regular DOACROSS Loops
LIU Xiao-Xian,ZHAO Rong-Cai,ZHAO Jie and XU Jin-Long.Automatic Generation of Pipeline Parallel Code for Regular DOACROSS Loops[J].Journal of Software,2014,25(6):1154-1168.
Authors:LIU Xiao-Xian  ZHAO Rong-Cai  ZHAO Jie and XU Jin-Long
Affiliation:PLA Information Engineering University, Zhengzhou 450002, China;State Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou 450002, China;PLA Information Engineering University, Zhengzhou 450002, China;State Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou 450002, China;PLA Information Engineering University, Zhengzhou 450002, China;State Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou 450002, China;PLA Information Engineering University, Zhengzhou 450002, China;State Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou 450002, China
Abstract:To obtain as much performance improvement as possible for sequential applications, it is important to exploit parallelism lurking in DOACROSS loops and find good schemes for their parallel execution. Pipelining is such a parallelizing method which can work well for regular DOACROSS loops. However, it is so hard to maintain high performance pipeline parallel codes automatically that parallel compilers always treat DOACROSS loops conservatively. Compilers usually serialize DOACROSS loops, which loses the inherent parallelism of DOACROSS loops and affects the performance of generated parallel programs. To solve this problem, automatic generation of pipeline parallel code for regular DOACROSS loops is implemented for multicore platform based on OpenMP. Firstly, a heuristic is proposed to choose the partition loop and the tiling loop of regular DOACROSS loops. Secondly, a formula based on pipelining cost model is given to compute the optimal tiling size. Lastly, the synchronization between threads is implemented with counter semaphores. Measuring against the wavefront loops of finite difference relaxation, the representative loops of finite difference time domain, and Poisson, LU and Jacobi procedures, the pipeline parallel loops automatically generated by the proposed method increase execution efficiency on multicore platform with the average speedup up to 89% of the optimal speedup obtained manually.
Keywords:pipeline parallel  automatic parallelization  DOACROSS loops  cost model
本文献已被 CNKI 等数据库收录!
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号