首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
指令间的依赖关系是阻碍指令调度发挥作用,进而影响指令级并行的主要障碍。寄存器重命名是解决控制依赖和数据依赖的一种重要技术。研究并实现了一种指令调度中的寄存器重命名技术。它在164.gzip和186.crafty上分别取得了约5%和3%的加速比。  相似文献   

2.
Speculative execution is the execution of instructions before it is known whether these instructions should be executed. In the speculative execution for instruction level parallelism (ILP) processors, the concept of shadow register provides a hardware solution to maintain semantics of a program from the pollution of boosted instructions that are incorrectly predicted. In a recent study, Chang and Lai proposed a special register file based on shadow register, named conjugate register file (CRF), to support multilevel boosting in speculative execution. They also proposed a scheduling heuristic named frequency-driven scheduling to incorporate with CRF for execution. However, the ability of boosting is still constrained since the concept of register pair will force the results produced speculatively be stored in dedicated locations. Moreover, when the parallelism potential increases to tens through the advancement of hardware techniques, the heavy demand on register usage and the complexity of register file may well become a serious bottleneck for the exploitation of ILP.In this paper, the algorithm of frequency-driven scheduling is modified by replacing the function of hardware CRF with the technique of variable renaming during compilation. The new scheduling technique, named LESS, can exploit the parallelism efficiently with limited number of registers. Moreover, since the technique can benefit ILP without any special hardware support, it can be incorporated with any other ILP architecture without changing its instruction set architecture (ISA).Simulation results show that the performance achievable by LESS is better than other existing methods. For example, under the ILP model with an issue rate of 8, the speculative execution can achieve an increase of 34% in parallelism, as compared to 18% in CRF scheme.  相似文献   

3.
传统的谓词优化技术是在冯·诺伊曼体系结构计算机上实施的,仅对数据流进行优化,并没有考虑哈佛体系结构下指令和数据分开的情况.BWDSP10x是指令和数据分开的哈佛体系结构,它支持超长指令字,不仅提供了对数据谓词执行的支持也提供了对地址谓词执行的支持.特此提出了一种在区域上对两种谓词模式优化支持的方法,在进行两种比较之前,通过判断比较操作的两个操作数类型来分别实施两种模式的谓词优化,使得对地址的比较不用传输到通用寄存器中.实验结果表明该优化方法能显著地节省CPU的时间和带宽,大大减少了分支指令,使程序性能提高了28.4%.  相似文献   

4.
编译器提高程序并行性的主要障碍是:频繁的控制转移和模棱两可的内存访问。推断和推测是vliw处理器体系结构的新特点,为了消除分支或访存对指令级并行性识别的影响。指令调度是编译器挖掘程序指令级并行性的关键技术之一,本文论述了如何在指令调度中有效地利用推断和推测技术,提高程序的性能。  相似文献   

5.
This paper presents a set of efficient graph transformations for local instruction scheduling. These transformations to the data-dependency graph prune redundant and inferior schedules from the solution space of the problem. Optimally scheduling the transformed problems using an enumerative scheduler is faster and the number of problems solved to optimality within a bounded time is increased. Furthermore, heuristic scheduling of the transformed problems often yields improved schedules for hard problems. The basic node-based transformation runs in O(ne) time, where n is the number of nodes and e is the number of edges in the graph. A generalized subgraph-based transformation runs in O(n2 e) time. The transformations are implemented within the Gnu Compiler Collection (GCC) and are evaluated experimentally using the SPEC CPU2000 floating-point benchmarks targeted to various processor models. The results show that the transformations are fast and improve the results of both heuristic and optimal scheduling.  相似文献   

6.
指令调度通过调整指令之间的顺序来提高指令级并行度(ILP)。然而基本块通常很小,因而潜在的ILP也很小。随着芯片设计技术的发展,现代的处理机所包含的资源却越来越丰富。指令调度只有跨越基本块的边界(即全局指令调度)才能够充分发挥处理机潜在的和程序中固有的ILP。全局指令调度可划分为有环和无环两种。该文介绍了无环全局指令调度的几种影响力较大的算法。同时还简单介绍了有关全局指令调度的新的热点。  相似文献   

7.
Mueller  Frank 《Real-Time Systems》2000,18(2-3):217-247
This paper contributes a comprehensive study of a framework to bound worst-case instruction cache performance for caches with arbitrary levels of associativity. The framework is formally introduced, operationally described and its correctness is shown. Results of incorporating instruction cache predictions within pipeline simulation show that timing predictions for set-associative caches remain just as tight as predictions for direct-mapped caches. The low cache simulation overhead allows interactive use of the analysis tool and scales well with increasing associativity.The approach taken is based on a data-flow specification of the problem and provides another step toward worst-case execution time prediction of contemporary architectures and its use in schedulability analysis for hard real-time systems.  相似文献   

8.
张垚 《计算机工程》2009,35(8):122-124
Java语言的精确异常要求和Java程序中频繁出现的异常检测严重阻碍或限制了指令调度在Java本地代码编译中的应用,从而减少了代码的指令级并行度。提出的算法可以使指令调度打破Java精确异常要求,能最大程度地发挥作用,并在有效提高代码执行效率的同时确保精确异常要求在异常发生时不被破坏。实验结果证明该算法的有效性和正确性。  相似文献   

9.
代码优化与指令调度的集成   总被引:1,自引:0,他引:1  
在开发指令级并行性的编译器中,如果代码优化和指令调度各自独立进行,将导致代码优化效果的下降甚至产生副作用,文中针对这一问题,提出了代码优化和指令调度集成的思想,在此思想的基础上,介绍了一个适合于代码优化集成的指令调度算框架;并从优化的有效性、是否可逆和优化机会的产生等方面进行了分析,选出了适合集成入指令调度的传统优化种类;最后给出了这些优化的具体集成方法,该文提出的方法已经在一个指令级并行编译器上进行了实验,实验数据证明,这种优化集成方法能使优化的效果明显改善。  相似文献   

10.
随着线延迟的逐渐增加,指令调度技术作为一种可以有效减少处理器片上通信的技术日益重要。本文介绍一种分片式处理器结构上基于加权路径的指令调度算法,该算法利用已经放置好的指令——锚指令信息精确计算路径长度,再用指令所在路径长度作为权值对指令进行调度。实验结果表明,本算法实现的调度器IPC比已有的两种TRIPS调度算法的IPC分别提高了21%和3%。  相似文献   

11.
介绍了一种为即时编译器和时空受限系统设计的轻量级线性复杂指令调度算法。该算法进行指令调度时,不基于传统的DAG图或表达式树,而是基于一种独创的数据结构扩展关联矩阵,其时间复杂性在最坏情况下也能与全部指令长度构成严格的线性关系,仅占用不到1 KB的内存空间。该算法已被Intel为Xscale设计的高性能J2ME虚拟机XORP采用为即时编辑器中的缺省指令调度算法。  相似文献   

12.
指令级并行是现代高性能代理器的重要特征,对于发挥这类处理器所具有的并行处理能力来说,编译器有至关重要的影响。文中讨论指令级并行编译中的核心问题-全局指令调度与 器分配,并以作者为一种新型的显式并行体系结构微处理器的编译系统为背景,介绍了此类编译器后端设计中面临的指令调度与寄存器分配的时序问题,以及为解决这一问题而提出了的一种协作式全局指令调度与寄存器分配方法。  相似文献   

13.
The frequent occurrence of implicitly thrown exceptions poses one of the challenges present in a Java compiler. Not only do these implicitly thrown exceptions directly affect the performance by requiring explicit checks, they also indirectly impact the performance by restricting code movement in order to satisfy the precise exception model in Java. In particular, instruction scheduling is one transformation that is restricted by implicitly thrown exceptions due to the heavy reliance on reordering instructions to exploit maximum hardware performance. The goal of this study is two-fold: first, investigate the degree to which implicitly thrown exceptions in Java hinder instruction scheduling, and second, find new techniques for allowing more efficient execution of Java programs containing implicitly thrown exceptions. Experimental results show that with aggressive scheduling techniques, such as superblock scheduling, the negative performance impact can be greatly reduced.  相似文献   

14.
超长指令字(Very Long Instruction Word,VLIW)处理器一般采用总线互连的多簇结构,每个簇中的功能单元共享一个本地寄存器堆,簇间采用总线传输数据,以避免功能单元增多时,全连通结构的延时、面积和功耗的快速增长;但簇间数据共享时的拷贝和延时,使得处理器在性能上有所下降.文中提出了一种寄存器堆互连的多簇VLIW结构,采用寄存器堆来连接各个簇,从而可以避免簇间数据传输的延时和额外的数据拷贝操作.同时也提出了针对这种结构的指令调度算法,以提高指令调度的性能.实验结果表明,与全连通的VLIW结构相比,寄存器堆互连结构在性能上仅有13%左右的性能下降,代码长度则基本不变;这都优于总线互连的多簇结构.  相似文献   

15.
IA-64是一种崭新的体系结构.它为挖掘程序中潜在的指令级并行提供了丰富的硬件支持,例如:大寄存器组、(控制/数据)投机、谓词等.Itanium是IA-64的一个具体实现.该文作者将Bernstein的基于超标量处理机的全局指令调度算法应用于显式并行(EPIC)的Itanium处理机上.在结合Itanium处理机特性的同时,作者对Bernstein的算法有以下两点创新:(1)应用层次化区域.相对于传统的扁平区域,这样的区域具有很强的灵活性并提供了调度器大小合适的调度范围,使其既能充分利用硬件资源又能够有效地控制调度的时间和空间开销.(2)集成P—Readyr指令调度.P—Ready是在与Bernstein算法框架差异很大的上下文中提出的.P—Ready指令调度能够把优先级高的指令尽早调度即使这条指令并没有在所有经过它的执行路径上解除数据依赖.集成P—Readyr指令调度到Betnstein的算法框架上是十分有意义的.作者在“基于Itanium处理机的开放源码编译器ORC”中实现了该文介绍的算法,实验结果显示全局指令调度器对CPU2000int基准测试例平均有8.4%的运行时加速比.作为应用层次化区域的优越性的一个反映,调度指令跨越嵌套循环最高可取得12.9%的运行时加速比.此外,P—Ready指令调度对CPU2000int的测试例平均有1.37%的运行时加速比,最高可达7.6%.  相似文献   

16.
Simulation results are presented using the hardware-implemented, trace-based dynamic instruction scheduler of our single process DTSVLIW architecture to schedule instructions from several processes into multiple streams of VLIW instructions for execution by a wide-issue, simultaneous multi-threading (SMT) execution engine. The scheduling process involves single instruction execution of each process, dynamically scheduling executed instructions into blocks of VLIW instructions cached for subsequent SMT execution: SMT provides a mechanism to reduce the impact of horizontal and vertical waste, and variable memory latencies, seen in the DTSVLIW. Preliminary experiments explore this extended model. Results achieve PE utilization of up to 87% on a 4-thread, 1-scalar, 8 PE design, with speed-ups of up to 6.3 that of a single processor. Noticeably it only needs a single scalar process to be scheduled at any time, with main memory fetches being 1–4% that of a single processor.  相似文献   

17.
分簇超标量处理器将硬件资源分区来避免大的单体部件导致的功耗与周期惩罚,动态多核处理器融合多个物理核的硬件资源提供适应程序需求的计算能力,这些结构合理使用空间分布的硬件资源实现高能效的计算.空间分区结构中指令负载不均衡和跨区操作数传递延迟等问题可导致性能惩罚,需要有效的指令调度方法将计算在分区间进行分布.提出了基于数据流块(data-flow block, DFB)的空间指令调度方法.DFB是动态构建、缓存并重用的一个或数个顺序执行的指令基本块的调度模式.DFB调度算法建模动态指令流中的数据流约束和硬件资源定义的调度空间,然后根据指令量化的相对关键性完成调度决策.介绍了DFB调度的微结构框架和算法.通过对分区数、分区间延迟和调度窗口容量等与调度方法密切相关的微结构参数的实验,证明了DFB调度的性能和稳定性优于负载均衡调度和基于依赖的调度.最后举例证明结合一种数据流块缓存实现的DFB调度达到的调度效果接近理想化的DFB调度.  相似文献   

18.
为了提高资源行为动态异构的云环境中工作流任务的调度效率,提出了一种基于动态关键路径的工作流调度算法CWS-DCP。算法将工作流任务结构定义为有向无循环图DAG模型,改进了传统关键路径的一次性搜索模式,结合云资源可用性动态可变的特征,以动态自适应方式搜索关键路径,并确定关键任务。同时,在关键任务调度后,局部DAG的关键路径搜索根据资源可用性再次迭代更新,从而动态决策任务与资源间的调度方案。通过仿真实验,构建了三种不同类型的工作流结构作为测试数据源,并与其他六种同类型的启发式和元启发式算法进行了性能比较。实验结果表明,在资源可用性动态改变和工作流规模不断增大的情况下,CWS-DCP算法在多数工作流结构中均能得到执行跨度更好的调度方案和更少的调度开销。  相似文献   

19.
投机是指令调度克服指令间控制依赖的一种重要手段.投机一方面可以提高指令级并行带来性能改善,另一方面,它也可能拉长变量活跃区间,增大寄存器压力,导致变量溢出,从而恶化性能.前人的寄存器压力敏感的指令调度的方法,往往当调度区域内活跃变量个数超过阈值时一味保守地调度.考虑到每调度一条指令的收益和代价是不同的,通过具体分析一次投机调度的性能收益和溢出代价来有选择地投机指令,而不是仅仅考虑活跃变量的数目.实验表明,该方法能有效提高程序性能,对SPEC2000的整数例子,比不考虑寄存器压力的投机调度平均性能提高1.44%.  相似文献   

20.
寄存器分配与指令调度是编译器优化过程中的两项重要任务.由于这两个阶段通常是独立完成的,寄存器分配往往会引入不必要的伪相关,从而影响指令调度的效率和结果,影响最终性能的提高.本文提出了寄存器队列模型,并在其基础上提出了一种结合实现寄存器分配和指令调度的算法,该算法能够在保证每条指令的执行时间最早的同时使用最少数目的寄存器.它的另外一个优点是具有线性的时间和空间复杂度,而且易于硬件实现.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号