首页 | 本学科首页   官方微博 | 高级检索  
     

基于LLVM Pass的复杂嵌套循环自动并行化框架
引用本文:马春燕,吕炳旭,叶许姣,张雨.基于LLVM Pass的复杂嵌套循环自动并行化框架[J].软件学报,2023,34(7):3022-3042.
作者姓名:马春燕  吕炳旭  叶许姣  张雨
作者单位:西北工业大学 软件学院, 陕西 西安 710129;海南大学 计算机科学与技术学院, 海南 海口 570228
基金项目:国家自然科学基金(62192733,62062030);航空基金(20185853038,201907053004)
摘    要:随着多核处理器的普及应用,针对嵌入式遗留系统中串行代码的自动并行化方法是研究热点.其中,针对具有非完美嵌套结构、非仿射依赖关系特征的复杂嵌套循环的自动并行化方法存在技术挑战.提出了一种基于LLVMPass的复杂嵌套循环的自动并行化框架(CNLPF).首先,提出了一种复杂嵌套循环的表示模型,即循环结构树,并将嵌套循环的正则区域自动转换为循环结构树表示;然后,对循环结构树进行数据依赖分析,构建循环内和循环间的依赖关系;最后,基于OpenMP共享内存的编程模型生成并行的循环程序.针对SPEC2006数据集中包含近500个复杂嵌套循环的6个程序案例,分别对其进行复杂嵌套循环占比统计和并行性能加速测试.结果表明,提出的自动并行化框架可以处理LLVMPolly无法优化的复杂嵌套循环,增强了LLVM的并行编译优化能力,且该方法结合Polly的组合优化,比单独采用Polly优化的加速效果提升了9%-43%.

关 键 词:复杂嵌套循环  自动并行化  LLVM  Pass  依赖分析
收稿时间:2022/9/5 0:00:00
修稿时间:2022/10/8 0:00:00

Automatic Parallelization Framework for Complex Nested Loops Based on LLVM Pass
Ma Chun-Yan,Lv Bing-Xu,Ye Xu-Jiao,Zhang Yu.Automatic Parallelization Framework for Complex Nested Loops Based on LLVM Pass[J].Journal of Software,2023,34(7):3022-3042.
Authors:Ma Chun-Yan  Lv Bing-Xu  Ye Xu-Jiao  Zhang Yu
Affiliation:Software College, Northwestern Polytechnical University, Shanxi 710129, China; College of Computer Science and Technology, Hainan unitersity, Hainan 570228, China
Abstract:With the popularization of multi-core processors, automatic parallelization of serial codes in embedded legacy systems is a research hotspot. Among them, there are technical challenges in the automatic parallelization method for complex nested loops with imperfect nested structure and non-affine dependency characteristics. This paper proposes an automatic parallelization framework (CNLPF) for complex nested loops based on LLVM Pass. Firstly, a representation model of complex nested loops, namely loop structure tree, is proposed, and the regular region of nested loops is automatically converted into a loop structure tree representation. Then, the data dependency analysis is carried out on the loop structure tree to construct intra-loop and inter-loop dependency relationship. Finally, the parallel loop program is generated based on the OpenMP shared memory programming model. For the 6 program cases in the SPEC2006 data set containing nearly 500 complex nested loops, the statistics of the proportion of complex nested loops and the parallel performance acceleration test were carried out respectively. The results show that the automatic parallelization framework proposed in this paper can deal with complex nested loops that cannot be optimized by LLVM Polly, which enhances the parallel compilation and optimization capabilities of LLVM, and the method combined with Polly optimization improves the acceleration effect of Polly optimization alone by 9%~43%.
Keywords:complex nested loops  automatic parallelization  LLVM Pass  dependency analysis  
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号