首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 281 毫秒
1.
The registers constraints are usually taken into account during the scheduling pass of an acyclic data dependence graph (DAG): any schedule of the instructions inside a basic block must bound the register requirement under a certain limit. In this work, we show how to handle the register pressure before the instruction scheduling of a DAG. We mathematically study an approach which consists in managing the exact upper-bound of the register need for all the valid schedules of a considered DAG, independently of the functional unit constraints. We call this computed limit the register saturation (RS) of the DAG. Its aim is to detect possible obsolete register constraints, i.e., when RS does not exceed the number of available registers. If it does, we add some serial edges to the original DAG such that the worst register need does not exceed the number of available registers. We propose an appropriate mathematical formalism for this problem. Our generic processor model takes into account superscalar, VLIW and EPIC/IA64 architectures. Our deeper analysis of the problem and our formal methods enable us to provide nearly optimal heuristics and strategies for register optimization in the face of ILP.  相似文献   

2.
The rapid advances in high-performance computer architecture and compilation techniques provide both challenges and opportunities to exploit the rich solution space of software pipelined loop schedules. In this paper, we develop a framework to construct a software pipelined loop schedule which runs on the given architecture (with a fixed number of processor resources) at the maximum possible iteration rate (a la rate-optimal) while minimizing the number of buffers-a close approximation to minimizing the number of registers. The main contributions of this paper are: First, we demonstrate that such problem can be described by a simple mathematical formulation with precise optimization objectives under a periodic linear scheduling framework. The mathematical formulation provides a clear picture which permits one to visualize the overall solution space (for rate-optimal schedules) under different sets of constraints. Secondly, we show that a precise mathematical formulation and its solution does make a significant performance difference. We evaluated the performance of our method against three leading contemporary heuristic methods. Experimental results show that the method described in this paper performed significantly better than these methods. The techniques proposed in this paper are useful in two different ways: 1) As a compiler option which can be used in generating faster schedules for performance-critical loops (if the interested users are willing to trade the cost of longer compile time with faster runtime). 2) As a framework for compiler writers to evaluate and improve other heuristics-based approaches by providing quantitative information as to where and how much their heuristic methods could be further improved  相似文献   

3.
Modulo scheduling is an efficient technique for exploiting instruction level parallelism in a variety of loops, resulting in high performance code but increased register requirements. We present an approach that schedules the loop operations for minimum register requirements, given a modulo reservation table. Our method determines optimal register requirements for machines with finite resources and for general dependence graphs. Measurements on a benchmark suite of 1327 loops from the Perfect Club, SPEC-89, and the Livermore Fortran Kernels show that the register requirements decrease by 24.8% on average when applying the optimal stage scheduler to the MRT-schedules of a register-insensitive modulo scheduler.  相似文献   

4.
Multi-cluster tools are widely used in majority of wafer fabrication processes in semiconductor industry. Smaller lot production, thinner circuit width in wafers, larger wafer size, and maintenance have resulted in a large quantity of their start-up and close-down transient periods. Yet, most of existing efforts have been concentrated on scheduling their steady states. Different from such efforts, this work schedules their transient and steady-state periods subject to wafer residency constraints. It gives the schedulability conditions for the steady-state scheduling of dual-blade robotic multi-cluster tools and a corresponding algorithm for finding an optimal schedule. Based on the robot synchronization conditions, a linear program is proposed to figure out an optimal schedule for a start-up period, which ensures a tool to enter the desired optimal steady state. Another linear program is proposed to find an optimal schedule for a close-down period that evolves from the steady state period. Finally, industrial cases are presented to illustrate how the provided method outperforms the existing approach in terms of system throughput improvement.   相似文献   

5.
3种提高软件流水有效性的算法:比较和结合   总被引:1,自引:0,他引:1  
李文龙  陈彧  林海波  汤志忠 《软件学报》2005,16(10):1822-1832
软件流水是开发循环程序指令级并行性的技术,它通过并行执行连续的多个循环体来加快循环的执行速度.在软件流水中,循环体的重叠增加了寄存器需求,导致寄存器压力增大,当目标处理机所提供的寄存器不足时,软件流水可能失败.在Itanium处理机上评估了NAS和SPEC2000基准程序中的软件流水循环的寄存器需求,发现静态寄存器不足是造成软件流水失败的主要原因,提出了3种增加软件流水个数、提高软件流水有效性的算法:限制循环展开因子的算法(register sensitive unrolling,简称RSU)、堆栈寄存器分配算法(stacked registerallocation,简称SRA)以及变量类型转换的算法(variabletype conversion,简称VTC).RSU根据静态寄存器需求确定一个合理的展开因子,增加了软件流水的成功率;SRA和VTC分别使用空闲的堆栈寄存器和旋转寄存器来充当静态寄存器,提高了寄存器的利用率.在面向Itanium处理器的开放源码编译器ORC(open research compiler)上实现了这3种算法,通过NAS程序的测试比较了这3种算法的有效性,同时对它们的结合应用进行了研究和实验.  相似文献   

6.
Software pipelining methods based on an ILP (integer linear programming) framework have been successfully applied to derive rate-optimal schedules under resource constraints. However, like many other previous works on software pipelining, ILP-based work has focused on resource constraints of simple function units, e.g., “clean pipelines”—pipelines without structural hazards. The problem for architectures beyond such clean pipelines remains open. One challenge is how to represent such resource constraints for unclean pipelines, i.e., pipelined function units, but having structural hazards.In this paper, we propose a method to constructrate-optimalsoftware pipelined schedules for pipelined architectures with structural hazards. A distinct feature of this work is that it provides a unified ILP framework for two challenging and interrelated aspects of software pipelining—the scheduling of instructions at particular times and the mapping of those instructions to specific function units. Solving both of these aspects is essential to finding schedules which will work both on VLIW machines which map instructions to fixed function units and on dynamic out-of-order superscalars. We propose two ILP formulations to solve the integrated scheduling and mapping problem. Both adopt principles of graph coloring in an ILP framework, and one usesforbidden latenciesin an elegant extension of classical hardware pipeline control theory.We have run experiments on four variations of our proposed formulations. As input we used a set of 415 “unique” loops taken from several benchmark suites, and we targeted an architecture whose function units contain many structural hazards. All four of our variations did well, with the best finding a rate-optimal schedule for 65% of the loops. This compares favorably with a leading heuristic, Huff'sSlack Scheduling—the ILP approaches found a schedule with smaller initiation interval for over 50% of the loops, with a mean improvement of almost 30%. Finally, we have found that reusing pipeline stages—and thus adding hazards—results in only a 10% drop in performance, while permitting significant savings in area.  相似文献   

7.
A treelike hybrid multi-cluster tool is composed of both single-arm and dual-arm cluster tools with a treelike topology. Scheduling such a tool is challenging. For a hybrid treelike multi-cluster tool whose bottleneck individual tool is process-bound, this work aims at finding its optimal one-wafer cyclic schedule. It is modeled with Petri nets such that a onewafer cyclic schedule is parameterized as its robots' waiting time. Based on the model, this work proves the existence of its onewafer cyclic schedule that features with the ease of industrial implementation. Then, computationally efficient algorithms are proposed to find the minimal cycle time and optimal onewafer cyclic schedule. Multi-cluster tool examples are given to illustrate the proposed approach. The use of the found schedules enables industrial multi-cluster tools to operate with their highest productivity.   相似文献   

8.
K. Kalpakis  Y. Yesha 《Algorithmica》1999,23(2):159-179
We find, in polynomial time, a schedule for a complete binary tree directed acyclic graph (dag) with n unit execution time tasks on a linear array whose makespan is optimal within a factor of 1+o(1) . Further, given a binary tree dag T with n tasks and height h , we find, in polynomial time, a schedule for T on a linear array whose makespan is optimal within a factor of 5 + o(1) . On the other hand, we prove that explicit lower and upper bounds on the makespan of optimal schedules of binary tree dags on linear arrays differ at least by a factor of 1+ . We also find, in polynomial time, schedules for bounded tree dags with n unit execution time tasks, degree d , and height on a linear array which are optimal within a factor of 1+o(1) , this time under the assumption of links with unlimited bandwidth. Finally, we compute an improved upper bound on the makespan of an optimal schedule for a tree dag on the architecture independent model of Papadimitriou and Yannakakis [14], provided that its height is not too large. Received January 21, 1997; revised June 5, 1997.  相似文献   

9.
一种支持多重循环软件流水的寄存器结构   总被引:1,自引:0,他引:1  
容红波  汤志忠 《软件学报》2000,11(3):401-409
寄存器结构及其分配是软件流水算法的关键之一.为支持多重循环的软件流水,该文提出一种新颖的寄存器结构:半共享跳跃式流水寄存器堆.它可以有效地解决多重循环软件流水下的特殊问题,即:同层次和跨层次的寄存器重命名问题以及断流问题;有效地消除外层循环的体间读写相关,提高程序的指令级并行度.它有3种分配方式可供灵活使用:单个寄存器、流水寄存器和寄存器组方式.流水寄存器方式对生存期确定的、局限于一个循环层次的寄存器重命名问题提供简单而有效的支持.寄存器组分配方式解决了多重循环软件流水时变量生存期不确定的情况.跳跃操作为  相似文献   

10.
In this paper, we present a scheduling and a variable binding technique for improved testability in high level synthesis. The scheduling technique called cost based scheduling system (CBSS), is time constrained which minimizes the number of resources (operations) and the number of registers based on a cost function. The CBSS improves the life time of primary input and primary output variables, reduces the life times of intermediate variables and hence improves the controllability and observability. The testability of the register transfer level (RTL) structure generated by this schedule is therefore improved. CBSS considers all the variables and operations jointly for scheduling. CBSS supports various scheduling modes such as multicycled and chained operations, and pipelining. The complexity of our scheduling algorithm is O(c·n2) where c is the number of control steps and n is the number of operations to be scheduled. To generate a highly testable RTL structure, the CBSS is followed by a variable binding technique to bind the variables into registers. An integer linear programming (ILP) approach is proposed with an objective function that minimizes the number of registers and a set of constraints that improves the testability of the RTL structure. Various case studies are presented and the results on different benchmark examples show the potential of our approach.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号