首页 | 本学科首页   官方微博 | 高级检索  
     

基于简化Trace的动态隐式断言执行
引用本文:唐遇星,邓鹍,窦勇,周兴铭.基于简化Trace的动态隐式断言执行[J].计算机学报,2007,30(11):1972-1981.
作者姓名:唐遇星  邓鹍  窦勇  周兴铭
作者单位:国防科技大学计算机学院分布与并行处理国家重点实验室,长沙,410073
摘    要:分支指令与分支预测失败限制了处理器发掘指令级并行(ILP)的潜力.通过If-conversion或Predicated执行将程序中的控制相关转化为数据相关,能较好地降低分支预测开销.提出一种基于简化Trace结构的动态隐式断言执行机制(Dynamic Implicit Predication,DIP),而早期的相关研究主要集中于由编译器显式为宽发射处理器产生静态Predicated指令.无需编译器或者其他二进制工具的帮助,DIP可以在程序运行过程中识别可以进行断言变换的指令片断,完成指令转换与优化,并在以后的执行中使用优化后的指令Trace.基于SPEC2000模拟测试表明DIP可以有效避免错误的分支预测,提高并行度,单个程序的IPC平均提高10.3%,基准程序的平均加速比可达7.59%.

关 键 词:指令级并行  断言  动态隐式断言执行  踪迹缓冲  流水线  简化  Trace  Cache  动态  Lite  Based  Predication  Implicit  加速比  基准程序  并行度  错误  模拟测试  使用  优化  转换  变换  行断  识别  运行过程  二进制
修稿时间:2005-04-29

Dynamic Implicit Predication Based on Lite Trace Cache
TANG Yu-Xing,DENG Kun,DOU Yong,ZHOU Xing-Ming.Dynamic Implicit Predication Based on Lite Trace Cache[J].Chinese Journal of Computers,2007,30(11):1972-1981.
Authors:TANG Yu-Xing  DENG Kun  DOU Yong  ZHOU Xing-Ming
Affiliation:National Key Laboratory of Parallel and Distributed Processing, School of Computer, National University of Defense Technology, Changsha 410073
Abstract:To exploit instruction level parallelism,modern microprocessor usually converts control dependences into data dependences.If-conversion and predicated execution are widely adopted to eliminate branch misprediction penalty.In this paper,a trace-based predicate mechanism named DIP(Dynamic Implicit Predication) is discussed.Previous predication execution depends on compiler to generate explicit predicated instructions.The candidates of if-conversion will be identified during dynamic execution.Classical trace cache has been modified to store DIP traces,which include instructions both from fall-through and target block behind the conditional branch.Hardware will add predication to DIP trace automatically.With the help of DIP,legacy applications can benefit from predication mechanism without recompiling source code.Simulation of DIP under various hardware configurations is presented in the paper.Results have shown promising performance improvement.For SPEC INT2000 benchmark,average IPC(Instruction Per Cycle) improvement achieves 10.3%,and average speedup of execution time is 7.59%.
Keywords:ILP  predication  dynamic implicit predication  trace cache  pipelining
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号