首页 | 本学科首页   官方微博 | 高级检索  
     

基于指令Cache和寄存器压力的循环展开优化
引用本文:王翠霞,韩林,刘浩浩.基于指令Cache和寄存器压力的循环展开优化[J].计算机工程与科学,2022,44(12):2111-2119.
作者姓名:王翠霞  韩林  刘浩浩
作者单位:(中原工学院前沿信息技术研究院,河南 郑州 450007)
摘    要:循环展开是一种常用的编译优化技术,能够有效减少循环开销,提升指令级并行程度和数据局部性,提升循环的执行效能。然而,过度的循环展开会造成指令Cache溢出,增大寄存器压力,循环展开次数太少又会浪费潜在的性能提升机会,因此寻找恰当的展开因子是研究循环展开问题的核心。基于GCC开源编译器,面向循环展开问题开展深入的分析与研究,针对指令Cache和寄存器资源对循环展开的影响,提出了一种基于指令Cache和寄存器压力的循环展开因子计算方法,并在GCC编译器中实现了该计算方法。申威和海光平台上的实验结果显示,相较于目前GCC中存在的其它展开因子计算方法,所提出的方法可以获得更为有效的循环展开因子,提升了程序性能。在SPEC CPU 2006测试集上的平均性能分别提升了2.7%和3.1%,在NPB-3.3.1测试集上的分别为5.4%和6.1%。

关 键 词:编译优化  循环展开  展开因子  指令Cache  寄存器压力  
收稿时间:2021-11-25
修稿时间:2022-01-30

Optimization of loop unrolling based oninstruction Cache and register pressure
WANG Cui-xia,HAN Lin,LIU Hao-hao.Optimization of loop unrolling based oninstruction Cache and register pressure[J].Computer Engineering & Science,2022,44(12):2111-2119.
Authors:WANG Cui-xia  HAN Lin  LIU Hao-hao
Affiliation:(Research Institute of Front Information Technology,Zhongyuan University of Technology,Zhengzhou 450007,China)
Abstract:Loop unrolling is a common compiler optimization technique, which can effectively reduce loop overhead, improve instruction-level parallelism and register locality, and improve the execution efficiency of loop. However, excessive loop unrolling will cause instruction Cache overflow and increase register pressure, and too little loop unrolling will waste potential performance improvement opportunities. Therefore, finding an appropriate unroll factor is the core of the study of loop unrolling. Based on the open-source compiler GCC, the loop unrolling problems are deeply analyzed and studied. In view of the influence of instruction Cache and register resources on the loop unrolling, a loop unrolling factor calculation method based on instruction Cache and register pressure is proposed and implemented in GCC compiler. Experiments on Sunway and Hygon platforms show that, compared with the current loop unrolling factor calculation method in GCC, this method can obtain more effective unrolling factor and improve the program performance. The average performance of the SPEC CPU 2006 is increased by 2.7% and 3.1%, respectively, and NPB-3.3.1 is increased by 5.4% and 6.1%.
Keywords:compiler optimization  loop unrolling  unrolling factor  instruction Cache  register pressure  
点击此处可从《计算机工程与科学》浏览原始摘要信息
点击此处可从《计算机工程与科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号