首页 | 本学科首页   官方微博 | 高级检索  
     

乱序执行机器上的load指令调度
引用本文:周谦 冯晓兵 张兆庆. 乱序执行机器上的load指令调度[J]. 计算机科学, 2007, 34(11): 298-300
作者姓名:周谦 冯晓兵 张兆庆
作者单位:中国科学院计算技术研究所系统结构重点实验室,北京,100080
摘    要:随着处理器和存储器速度差距的不断拉大,访存指令尤其是频繁cache miss的指令成为影响性能的重要瓶颈。编译器由于无法得知访存指令动态执行的拍数,一般假定这些指令的延迟为cache命中或者cache miss的延迟,所以并不准确。我们引入cache profiling技术来收集访存指令运行时的cache miss或者命中的信息,利用这些信息来计算访存的延迟。乱序机器上硬件的指令调度对于发射窗口内的指令能进行很好的动态调度,编译器则对更长的范围内的指令调度更有优势。在reorder buffer中cache miss一旦发生,容易引起reorder buffer满,导致流水线阻塞。调度容易cache miss的指令。使其并行执行,从而隐藏cache miss的长延迟,就可以提高程序性能。因此,我们针对load指令,一方面修改频繁miss的指令的延迟,一方面修改调度策略,提高存储级并行度。实验证明,我们的调度对于bzip2有高达4.8%的提升,art有4%的提升,整体平均提高1.5%。

关 键 词:指令调度  cache  profiling  存储级并行

Scheduling Load Instruction on Out-of-Order Machine
ZHOU Qian,FENG Xiao-Bing,ZHANG Zhao-Qing (Institute of Computing Technology,Chinese Academy of Seienees,Beijing. Scheduling Load Instruction on Out-of-Order Machine[J]. Computer Science, 2007, 34(11): 298-300
Authors:ZHOU Qian  FENG Xiao-Bing  ZHANG Zhao-Qing (Institute of Computing Technology  Chinese Academy of Seienees  Beijing
Abstract:With the gap between the speed of processor and memory become wider and wider, memory access instruc- tions especially frequently cause cache miss are the bottleneck of the performance. As the compiler does not know the exact cycles of memory access instructions, it assumes the memory access instructions always hit or miss. So this is not accurate. We introduces cache profiling, which collects run time cache miss or hit information of memory access in- structions. Then we use this information to calculate the latency of these instructions. On out-of-order machine the hardware instruction scheduler can schedule the instruction inside issue window well, and the compiler have advantage on scheduling the instruction of long distance. Once cache miss occurs, reorder buffer may easily be full, causing the stall of pipeline. Then scheduling the instruction frequently causes cache misses, trying to paralleling them, can hide the long latency of cache misses and improve the performance. So we adjust the latency of the instruction frequently cause cache misses, and modify the scheduling policy, trying to improve memory level parallelism. Experiment shows that our scheduling policy can improve the performance 1.5%on average, with bzip2 4.8%and art 4%.
Keywords:Scheduling   Cache profiling   MLP
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《计算机科学》浏览原始摘要信息
点击此处可从《计算机科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号