期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

张军超张兆庆《计算机工程》2005,31(23):8-10

指令间的依赖关系是阻碍指令调度发挥作用，进而影响指令级并行的主要障碍。寄存器重命名是解决控制依赖和数据依赖的一种重要技术。研究并实现了一种指令调度中的寄存器重命名技术。它在164．gzip和186．crafty上分别取得了约5％和3％的加速比。相似文献

2.

On the Boosting of Instruction Scheduling by Renaming

Wang L. Yang Ted C. 《The Journal of supercomputing》2001,19(2):173-197

Speculative execution is the execution of instructions before it is known whether these instructions should be executed. In the speculative execution for instruction level parallelism (ILP) processors, the concept of shadow register provides a hardware solution to maintain semantics of a program from the pollution of boosted instructions that are incorrectly predicted. In a recent study, Chang and Lai proposed a special register file based on shadow register, named conjugate register file (CRF), to support multilevel boosting in speculative execution. They also proposed a scheduling heuristic named frequency-driven scheduling to incorporate with CRF for execution. However, the ability of boosting is still constrained since the concept of register pair will force the results produced speculatively be stored in dedicated locations. Moreover, when the parallelism potential increases to tens through the advancement of hardware techniques, the heavy demand on register usage and the complexity of register file may well become a serious bottleneck for the exploitation of ILP.In this paper, the algorithm of frequency-driven scheduling is modified by replacing the function of hardware CRF with the technique of variable renaming during compilation. The new scheduling technique, named LESS, can exploit the parallelism efficiently with limited number of registers. Moreover, since the technique can benefit ILP without any special hardware support, it can be incorporated with any other ILP architecture without changing its instruction set architecture (ISA).Simulation results show that the performance achievable by LESS is better than other existing methods. For example, under the ILP model with an issue rate of 8, the speculative execution can achieve an increase of 34% in parallelism, as compared to 18% in CRF scheme. 相似文献

3.

指令调度中推断和推测技术的研究

叶崴马杰侯朝焕《微计算机应用》2006,27(6):691-693

编译器提高程序并行性的主要障碍是：频繁的控制转移和模棱两可的内存访问。推断和推测是vliw处理器体系结构的新特点，为了消除分支或访存对指令级并行性识别的影响。指令调度是编译器挖掘程序指令级并行性的关键技术之一，本文论述了如何在指令调度中有效地利用推断和推测技术，提高程序的性能。相似文献

4.

Data-Dependency Graph Transformations for Instruction Scheduling

Mark?Heffernan Email author Kent?Wilken 《Journal of Scheduling》2005,8(5):427-451

This paper presents a set of efficient graph transformations for local instruction scheduling. These transformations to the data-dependency graph prune redundant and inferior schedules from the solution space of the problem. Optimally scheduling the transformed problems using an enumerative scheduler is faster and the number of problems solved to optimality within a bounded time is increased. Furthermore, heuristic scheduling of the transformed problems often yields improved schedules for hard problems. The basic node-based transformation runs in O(ne) time, where n is the number of nodes and e is the number of edges in the graph. A generalized subgraph-based transformation runs in O(n² e) time. The transformations are implemented within the Gnu Compiler Collection (GCC) and are evaluated experimentally using the SPEC CPU2000 floating-point benchmarks targeted to various processor models. The results show that the transformations are fast and improve the results of both heuristic and optimal scheduling. 相似文献

5.

全局指令调度综述

杨书鑫张兆庆《计算机工程与应用》2004,40(21):44-48,89

指令调度通过调整指令之间的顺序来提高指令级并行度(ILP)。然而基本块通常很小,因而潜在的ILP也很小。随着芯片设计技术的发展,现代的处理机所包含的资源却越来越丰富。指令调度只有跨越基本块的边界(即全局指令调度)才能够充分发挥处理机潜在的和程序中固有的ILP。全局指令调度可划分为有环和无环两种。该文介绍了无环全局指令调度的几种影响力较大的算法。同时还简单介绍了有关全局指令调度的新的热点。相似文献

6.

Timing Analysis for Instruction Caches

Mueller Frank 《Real-Time Systems》2000,18(2-3):217-247

This paper contributes a comprehensive study of a framework to bound worst-case instruction cache performance for caches with arbitrary levels of associativity. The framework is formally introduced, operationally described and its correctness is shown. Results of incorporating instruction cache predictions within pipeline simulation show that timing predictions for set-associative caches remain just as tight as predictions for direct-mapped caches. The low cache simulation overhead allows interactive use of the analysis tool and scales well with increasing associativity.The approach taken is based on a data-flow specification of the problem and provides another step toward worst-case execution time prediction of contemporary architectures and its use in schedulability analysis for hard real-time systems. 相似文献

7.

代码优化与指令调度的集成 总被引：1，自引：0，他引：1

连瑞琦吴承勇张兆庆《计算机学报》2001,24(7):694-701

在开发指令级并行性的编译器中,如果代码优化和指令调度各自独立进行,将导致代码优化效果的下降甚至产生副作用,文中针对这一问题,提出了代码优化和指令调度集成的思想,在此思想的基础上,介绍了一个适合于代码优化集成的指令调度算框架;并从优化的有效性、是否可逆和优化机会的产生等方面进行了分析,选出了适合集成入指令调度的传统优化种类;最后给出了这些优化的具体集成方法,该文提出的方法已经在一个指令级并行编译器上进行了实验,实验数据证明,这种优化集成方法能使优化的效果明显改善。相似文献

8.

Exploring the Interaction between Java's Implicitly Thrown Exceptions and Instruction Scheduling

Matthew Arnold Michael Hsiao Ulrich Kremer Barbara G. Ryder 《International journal of parallel programming》2001,29(2):111-137

The frequent occurrence of implicitly thrown exceptions poses one of the challenges present in a Java compiler. Not only do these implicitly thrown exceptions directly affect the performance by requiring explicit checks, they also indirectly impact the performance by restricting code movement in order to satisfy the precise exception model in Java. In particular, instruction scheduling is one transformation that is restricted by implicitly thrown exceptions due to the heavy reliance on reordering instructions to exploit maximum hardware performance. The goal of this study is two-fold: first, investigate the degree to which implicitly thrown exceptions in Java hinder instruction scheduling, and second, find new techniques for allowing more efficient execution of Java programs containing implicitly thrown exceptions. Experimental results show that with aggressive scheduling techniques, such as superblock scheduling, the negative performance impact can be greatly reduced. 相似文献

9.

寄存器堆互连的VLIW结构及其指令调度算法

周志雄何虎杨旭张延军孙义和《计算机学报》2008,31(1):127-132

超长指令字(Very Long Instruction Word,VLIW)处理器一般采用总线互连的多簇结构,每个簇中的功能单元共享一个本地寄存器堆,簇间采用总线传输数据,以避免功能单元增多时,全连通结构的延时、面积和功耗的快速增长;但簇间数据共享时的拷贝和延时,使得处理器在性能上有所下降.文中提出了一种寄存器堆互连的多簇VLIW结构,采用寄存器堆来连接各个簇,从而可以避免簇间数据传输的延时和额外的数据拷贝操作.同时也提出了针对这种结构的指令调度算法,以提高指令调度的性能.实验结果表明,与全连通的VLIW结构相比,寄存器堆互连结构在性能上仅有13%左右的性能下降,代码长度则基本不变;这都优于总线互连的多簇结构. 相似文献

10.

ORC的全局指令调度技术

杨书鑫张兆庆《计算机学报》2004,27(5):577-586

IA-64是一种崭新的体系结构．它为挖掘程序中潜在的指令级并行提供了丰富的硬件支持,例如：大寄存器组、(控制／数据)投机、谓词等．Itanium是IA-64的一个具体实现．该文作者将Bernstein的基于超标量处理机的全局指令调度算法应用于显式并行(EPIC)的Itanium处理机上．在结合Itanium处理机特性的同时,作者对Bernstein的算法有以下两点创新：(1)应用层次化区域．相对于传统的扁平区域,这样的区域具有很强的灵活性并提供了调度器大小合适的调度范围,使其既能充分利用硬件资源又能够有效地控制调度的时间和空间开销．(2)集成P—Readyr指令调度．P—Ready是在与Bernstein算法框架差异很大的上下文中提出的．P—Ready指令调度能够把优先级高的指令尽早调度即使这条指令并没有在所有经过它的执行路径上解除数据依赖．集成P—Readyr指令调度到Betnstein的算法框架上是十分有意义的．作者在“基于Itanium处理机的开放源码编译器ORC”中实现了该文介绍的算法,实验结果显示全局指令调度器对CPU2000int基准测试例平均有8．4％的运行时加速比．作为应用层次化区域的优越性的一个反映,调度指令跨越嵌套循环最高可取得12．9％的运行时加速比．此外,P—Ready指令调度对CPU2000int的测试例平均有1．37％的运行时加速比,最高可达7．6％．相似文献

11.

Dynamic Instruction Scheduling in a Trace-based Multi-threaded Architecture

Peter A. Rounce Alberto F. De Souza 《International journal of parallel programming》2008,36(2):184-205

Simulation results are presented using the hardware-implemented, trace-based dynamic instruction scheduler of our single process DTSVLIW architecture to schedule instructions from several processes into multiple streams of VLIW instructions for execution by a wide-issue, simultaneous multi-threading (SMT) execution engine. The scheduling process involves single instruction execution of each process, dynamically scheduling executed instructions into blocks of VLIW instructions cached for subsequent SMT execution: SMT provides a mechanism to reduce the impact of horizontal and vertical waste, and variable memory latencies, seen in the DTSVLIW. Preliminary experiments explore this extended model. Results achieve PE utilization of up to 87% on a 4-thread, 1-scalar, 8 PE design, with speed-ups of up to 6.3 that of a single processor. Noticeably it only needs a single scalar process to be scheduled at any time, with main memory fetches being 1–4% that of a single processor. 相似文献

12.

一种寄存器压力敏感的指令投机调度技术

黄磊冯晓兵吕方《计算机研究与发展》2009,46(3)

投机是指令调度克服指令间控制依赖的一种重要手段.投机一方面可以提高指令级并行带来性能改善,另一方面,它也可能拉长变量活跃区间,增大寄存器压力,导致变量溢出,从而恶化性能.前人的寄存器压力敏感的指令调度的方法,往往当调度区域内活跃变量个数超过阈值时一味保守地调度.考虑到每调度一条指令的收益和代价是不同的,通过具体分析一次投机调度的性能收益和溢出代价来有选择地投机指令,而不是仅仅考虑活跃变量的数目.实验表明,该方法能有效提高程序性能,对SPEC2000的整数例子,比不考虑寄存器压力的投机调度平均性能提高1.44%. 相似文献

13.

通过寄存器队列模型实现寄存器分配和指令调度

沈立肖晓强戴葵王志英《小型微型计算机系统》2004,25(4):757-761

寄存器分配与指令调度是编译器优化过程中的两项重要任务．由于这两个阶段通常是独立完成的，寄存器分配往往会引入不必要的伪相关，从而影响指令调度的效率和结果，影响最终性能的提高．本文提出了寄存器队列模型，并在其基础上提出了一种结合实现寄存器分配和指令调度的算法，该算法能够在保证每条指令的执行时间最早的同时使用最少数目的寄存器．它的另外一个优点是具有线性的时间和空间复杂度，而且易于硬件实现．相似文献

14.

Rod Adams Sue Gray 《Software》1995,25(9):1003-1020

Multiple-instruction-issue processors seek to improve performance over scalar RISC processors by providing multiple pipelined functional units in order to fetch, decode and execute several instructions per cycle. The process of identifying instructions which can be executed in parallel and distributing them between the available functional units is referred to as instruction scheduling. This paper describes a simple compile-time scheduling technique, called conditional compaction, which uses the concept of conditional execution to move instructions across basic block boundaries. It then presents the results of an investigation into the performance of the scheduling technique using C benchmark programs scheduled for machines with different functional unit configurations. This paper represents the culmination of our investigation into how much performance improvement can be obtained using conditional execution as the sole scheduling technique. 相似文献

15.

一种基于谓词执行优化技术的寄存器分配算法

王凤芹胡定磊刘春林《计算机研究与发展》2006,43(8):1471-1476

对采用谓词执行优化技术后的编译代码,为了更高效地进行寄存器分配,首先介绍了Sias等人提出的一种基于二进制决策图（BDD）的谓词分析系统;然后在其基础上,对传统寄存器分配算法进行改进,给出了一种建立精化干涉图的新算法;最后将算法在学院研制的YHFT—DSP／700芯片的编译器上实现,实验结果表明,减少了所需寄存器数目,缩短了代码执行时间,获得了较好的性能提高．相似文献

16.

MAX(1)和MARG(1)中公式改名的复杂性 总被引：1，自引：0，他引：1

许道云董改芳王健《软件学报》2006,17(7)

改名是一个将变元映射到变元本身或它的补的函数,变元改名是公式变元集合上的一个置换,文字改名是一个改名和一个变元改名的组合.研究CNF公式的改名有助于改进DPLL算法.考虑判定问题"对于给定的CNF公式H和F是否存在一个变元(或文字)改名ψ使得ψ(H)=F?"的计算复杂性.MAX(1)和MARG(1)是极小不可满足公式的两个子类,这两个子类中的公式可以用树表示.树同构的判定问题在线性时间内是可解的.证明了对于MAX(1)和MARG(1)中的公式,文字改名问题在线性时间内可解,变元改名问题在平方次时间内可解. 相似文献

17.

多模式智能教学系统教学规划与模式调度研究

李益才张小真《计算机工程与设计》2005,26(4):1083-1087

具有多种教学模式的智能教学系统为适应不同学生的学习风格和学习要求，具有其它智能教学系统不可替代的优势，其关键问题之一是要解决教学规划和教学模式调度问题。提出了多模式智能教学系统的教学规划及教学模式调度的解决方案，以知识点关系图为基础提出了全局教学规划算法，根据对学生学习的评价和教学资源的相关特征为学生选择合适的教学模式并激活之。实验证明这种解决方案能取得令人满意的效果。相似文献

18.

实时CORBA端到端调度的研究 总被引：4，自引：0，他引：4

谭浩骆志刚刘锦德《小型微型计算机系统》2003,24(7):1165-1168

当前，对实时CORBA作端到端调度分析的研究还比较少．本文在进行分析了对实时CORBA作端到端调度分析的研究基础上，提出了一种新的方法，旨在提高系统的并发程度，从而提高系统的可调度性。相似文献

19.

面向MES的炼钢-连铸协同调度系统 总被引：3，自引：0，他引：3

王秀英郑秉霖柴天佑《控制工程》2005,12(6):573-576

针对某大型钢铁联合企业炼钢．连铸主辅设备协同调度的难题,采用基于“数据、模型、知识、人机交互”四维一体的调度方法,提出了冶金MES中炼钢．连铸协同调度系统的设计原则,系统结构和功能。研发了炼钢．连铸生产调度、钢包优化配备、行车和台车协同调度系统。实际数据模拟效果良好,验证了炼钢．连铸协同调度系统的结构和方法的可行性,并受到现场调度专家的好评。相似文献

20.

一种批优化调度策略的实时异构系统的集成动态调度算法 总被引：1，自引：0，他引：1

鲁志辉李建国陈松乔王建新《小型微型计算机系统》2008,29(3):461-468

针对实时异构多任务调度的特点,提出了软、硬实时任务形式化描述非精确计算的统一任务模型,在此基础上,提出了一种基于批优化调度策略的实时异构系统的集成动态调度算法.该算法以启发式搜索为基础,引入软实时任务服务质量降级策略,在每次扩充当前局部调度时,按制定的规则选取一批任务,计算其在各处理器上运行的目标函数,采用指派问题解法对任务优化分配.模拟实验表明,该算法与同类算法相比,提高了调度成功率. 相似文献