乱序执行机器上的load指令调度 Scheduling Load Instruction on Out-of-Order Machine期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

乱序执行机器上的load指令调度

引用本文：	周谦冯晓兵张兆庆. 乱序执行机器上的load指令调度[J]. 计算机科学, 2007, 34(11): 298-300

作者姓名：	周谦冯晓兵张兆庆

作者单位：	中国科学院计算技术研究所系统结构重点实验室,北京,100080

摘要：	随着处理器和存储器速度差距的不断拉大，访存指令尤其是频繁cache miss的指令成为影响性能的重要瓶颈。编译器由于无法得知访存指令动态执行的拍数，一般假定这些指令的延迟为cache命中或者cache miss的延迟，所以并不准确。我们引入cache profiling技术来收集访存指令运行时的cache miss或者命中的信息，利用这些信息来计算访存的延迟。乱序机器上硬件的指令调度对于发射窗口内的指令能进行很好的动态调度，编译器则对更长的范围内的指令调度更有优势。在reorder buffer中cache miss一旦发生，容易引起reorder buffer满，导致流水线阻塞。调度容易cache miss的指令。使其并行执行，从而隐藏cache miss的长延迟，就可以提高程序性能。因此，我们针对load指令，一方面修改频繁miss的指令的延迟，一方面修改调度策略，提高存储级并行度。实验证明，我们的调度对于bzip2有高达4．8％的提升，art有4％的提升，整体平均提高1．5％。
关键词：	指令调度 cache profiling 存储级并行
Scheduling Load Instruction on Out-of-Order Machine

ZHOU Qian,FENG Xiao-Bing,ZHANG Zhao-Qing (Institute of Computing Technology,Chinese Academy of Seienees,Beijing. Scheduling Load Instruction on Out-of-Order Machine[J]. Computer Science, 2007, 34(11): 298-300

Authors:	ZHOU Qian FENG Xiao-Bing ZHANG Zhao-Qing (Institute of Computing Technology Chinese Academy of Seienees Beijing

Abstract:	With the gap between the speed of processor and memory become wider and wider, memory access instruc- tions especially frequently cause cache miss are the bottleneck of the performance. As the compiler does not know the exact cycles of memory access instructions, it assumes the memory access instructions always hit or miss. So this is not accurate. We introduces cache profiling, which collects run time cache miss or hit information of memory access in- structions. Then we use this information to calculate the latency of these instructions. On out-of-order machine the hardware instruction scheduler can schedule the instruction inside issue window well, and the compiler have advantage on scheduling the instruction of long distance. Once cache miss occurs, reorder buffer may easily be full, causing the stall of pipeline. Then scheduling the instruction frequently causes cache misses, trying to paralleling them, can hide the long latency of cache misses and improve the performance. So we adjust the latency of the instruction frequently cause cache misses, and modify the scheduling policy, trying to improve memory level parallelism. Experiment shows that our scheduling policy can improve the performance 1.5%on average, with bzip2 4.8%and art 4%.

Keywords:	Scheduling Cache profiling MLP
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《计算机科学》浏览原始摘要信息
	点击此处可从《计算机科学》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏