期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Enhanced Co-Scheduling: A Software Pipelining Method Using Modulo-Scheduled Pipeline Theory

R. Govindarajan N. S. S. Narasimha Rao E. R. Altman Guang R. Gao 《International journal of parallel programming》2000,28(1):1-46

Instruction scheduling methods which use the concepts developed by the classical pipeline theory have been proposed for architectures involving deeply pipelined function units. These methods rely on the construction of state diagrams (or automatons) to (i) efficiently represent the complex resource usage pattern; and (ii) analyze legal initiation sequences, i.e., those which do not cause a structural hazard. In this paper, we propose a state-diagram based approach for modulo scheduling or software pipelining, an instruction scheduling method for loops. Our approach adapts the classical pipeline theory for modulo scheduling, and, hence, the resulting theory is called Modulo-Scheduled pipeline (MS-pipeline) theory. The state diagram, called the Modulo-Scheduled (MS) state diagram is helpful in identifying legal initiation or latency sequences, that improve the number of instructions initiated in a pipeline. An efficient method, called Co-scheduling, which uses the legal initiation sequences as guidelines for constructing software pipelined schedules has been proposed in this paper. However, the complexity of the constructed MS-state diagram limits the usefulness of our Co-scheduling method. Further analysis of the MS-pipeline theory, reveals that the space complexity of the MS-state diagram can be significantly reduced by identifying primary paths. We develop the underlying theory to establish that the reduced MS-state diagram consisting only of primary paths is complete; i.e., it retains all the useful information represented by the original state diagram as far as scheduling of operations is concerned. Our experiments show that the number of paths in the reduced state diagram is significantly lower—by 1 to 3 orders of magnitude—compared to the number of paths in the original state diagram. The reduction in the state diagram facilitate the Co-scheduling method to consider multiple initiations sequences, and hence obtain more efficient schedules. We call the resulting method, enhanced Co-scheduling. The enhanced Co-scheduling method produced efficient schedules when tested on a set of 1153 benchmark loops. Further the schedules produced by this method are significantly better than those produced by Huff's Slack Scheduling method, a competitive software pipelining method, in terms of both the initiation interval of the schedules and the time taken to construct them. 相似文献

2.

A framework for resource-constrained rate-optimal softwarepipelining

Govindarajan R. Altman E.R. Gao G.R. 《Parallel and Distributed Systems, IEEE Transactions on》1996,7(11):1133-1149

The rapid advances in high-performance computer architecture and compilation techniques provide both challenges and opportunities to exploit the rich solution space of software pipelined loop schedules. In this paper, we develop a framework to construct a software pipelined loop schedule which runs on the given architecture (with a fixed number of processor resources) at the maximum possible iteration rate (a la rate-optimal) while minimizing the number of buffers-a close approximation to minimizing the number of registers. The main contributions of this paper are: First, we demonstrate that such problem can be described by a simple mathematical formulation with precise optimization objectives under a periodic linear scheduling framework. The mathematical formulation provides a clear picture which permits one to visualize the overall solution space (for rate-optimal schedules) under different sets of constraints. Secondly, we show that a precise mathematical formulation and its solution does make a significant performance difference. We evaluated the performance of our method against three leading contemporary heuristic methods. Experimental results show that the method described in this paper performed significantly better than these methods. The techniques proposed in this paper are useful in two different ways: 1) As a compiler option which can be used in generating faster schedules for performance-critical loops (if the interested users are willing to trade the cost of longer compile time with faster runtime). 2) As a framework for compiler writers to evaluate and improve other heuristics-based approaches by providing quantitative information as to where and how much their heuristic methods could be further improved 相似文献

3.

On the design and performance of conventional pipelined architectures

Nigel P. Topham Amos Omondi Roland N. Ibbett 《The Journal of supercomputing》1988,1(4):353-393

Pipelining is a widely used technique for implementing architectures that have inherent temporal parallelism when there is an operational requirement for high throughput. Many variations on the basic theme have been proposed, with varying degrees of success. The aim of this paper is to present a critical review of conventional pipelined architectures and put some well-known problems in sharp relief. It is argued that conventional pipelined architectures have underlying limitations that can only be dealt with by adopting a different view of pipelining. These limitations are explained in terms of discontinuities in the flow of instructions and data, and representative machines are examined in support of this argument. In a companion paper [Topham, Omondi and Ibbett, 1988] we examine an alternative approach to the design of pipelined architectures and introduce an alternative theory of pipelining, which we call Context Flow. 相似文献

4.

Software Data Prefetching for Software Pipelined Loops

《Journal of Parallel and Distributed Computing》1999,58(2):236-259

This paper focuses on the interaction between software prefetching (both binding and nonbinding prefetch) and software pipelining for statically scheduled machines. First, it is shown that evaluating software pipelined schedules without considering memory effects can be rather inaccurate due to stalls caused by dependences with memory instructions (even if a lockup-free cache is considered). It is also shown that the penalty of the stalls is in general higher than the effect of spill code. Second, we show that, in general, binding schemes are more powerful than nonbinding ones for software pipelined schedules. Finally, the main contribution of this paper is an heuristic scheme that schedules some memory operations according to the locality estimated at compile time and other attributes of the dependence graph. The proposed scheme is shown to outperform other heuristic approaches since it achieves a better trade-off between compute and stall time than the others. 相似文献

5.

Specification of software pipelining using petri nets 总被引：1，自引：0，他引：1

M. Rajagopalan V. H. Allan 《International journal of parallel programming》1994,22(3):273-301

This paper presents a flexible model for software pipelining using the petri nets. Our technique, called the Petri Net Pacemaker (PNP), can create near optimal pipelines with less algorithmic effort than other techniques. The pacemaker is a novel idea which exploits the cyclic behavior of petri nets to model the problem of scheduling operations of a loop body for software pipelining. A way of improving the performance of loops containing predicates is given. The PNP technique also shows how nested loops can be pipelined. A comparison with some of the other techniques is presented. THis work was partially supported by the National Science Foundation under grants CDA-9100788 and CDA-9200371. 相似文献

6.

A pipelined interface for high floating-point performance with precise exceptions

Iacobovici S. 《Micro, IEEE》1988,8(3):77-87

Two options are presented that were considered for a pipelined interface between a central processing unit (CPU) and a floating-point coprocessor (FPU), along with the CPU recovery mechanisms that provide precise floating-point exceptions for each option. The first option supports parallel execution of both floating-point and integer instructions, while the second option pipelines only the execution of floating-point instructions. The use of the second option in National Semiconductor's 32532/32580 processor cluster because it offers high performance with significantly lower complexity. The 32532 microprocessor features a pipelined slave protocol that hides the CPU-FPU communication overhead for most floating-point instructions by pipelining their execution. A simple recovery mechanism implemented within the CPU maintains the precision of floating-point exceptions. As a result, the 32532 microprocessor supports very high floating point performance without sacrificing software compatibility with previous Series 32000 CPU-FPU clusters.<> 相似文献

7.

Resource-constrained software pipelining

Aiken A. Nicolau A. Novack S. 《Parallel and Distributed Systems, IEEE Transactions on》1995,6(12):1248-1270

This paper presents a software pipelining algorithm for the automatic extraction of fine-grain parallelism in general loops. The algorithm accounts for machine resource constraints in a way that smoothly integrates the management of resource constraints with software pipelining. Furthermore, generality in the software pipelining algorithm is not sacrificed to handle resource constraints, and scheduling choices are made with truly global information. Proofs of correctness and the results of experiments with an implementation are also presented 相似文献

8.

Quantitative Evaluation of Register Pressure on Software Pipelined Loops 总被引：3，自引：0，他引：3

Josep Llosa Eduard Ayguadé Mateo Valero 《International journal of parallel programming》1998,26(2):121-142

Software Pipelining is a loop scheduling technique that extracts loop parallelism by overlapping the execution of several consecutive iterations. One of the drawbacks of software pipelining is its high register requirements, which increase with the number of functional units and their degree of pipelining. This paper analyzes the register requirements of software pipelined loops. It also evaluates the effects on performance of the addition of spill code. Spill code is needed when the number of registers required by the software pipelined loop is larger than the number of registers of the target machine. This spill code increases memory traffic and can reduce performance. Finally, compilers can apply transformations in order to reduce the number of memory accesses and increase functional unit utilization. The paper also evaluates the negative effect on register requirements that some of these transformations might produce on loops. 相似文献

9.

IA-64中软件流水的寄存器需求研究 总被引：1，自引：0，他引：1

林海波李文龙汤志忠《计算机研究与发展》2004,41(1):22-27

软件流水是开发循环程序指令级并行性的重要方法之一，IA-64是支持软件流水的EPIC体系结构，通过对NAS Benchmarks中可软件流水循环所需的寄存器进行量化分析，提出了一种限制循环展开因子的启发式算法，有效地解决了因可用寄存器不足而导致软件流水失败的问题，并提高了应用程序的执行速度。相似文献

10.

Fine-grain compilation for pipelined machines

Alexandru Nicolau Keshav Pingali Alexander Aiken 《The Journal of supercomputing》1988,2(3):279-295

Computer architecture design requires careful attention to the balance between the complexity of code scheduling problems and the cost and feasibility of building a machine. In this paper, we show that recently developed software pipelining algorithms produce optimal or near-optimal code for a large class of loops when the target architecture is a clean pipelined parallel machine. The important feature of these machines is the absence of structural hazards. We argue that the robustness of the scheduling algorithms and relatively simple hardware make these machines realistic and cost-effective. To illustrate the delicate balance between architecture and scheduling complexity, we show that scheduling with structural hazards is NP-hard, and that there are machines with simple structural hazards for which vectorization and the software pipelining techniques generate poor code.Supported in part by NSF Grants DCR-8502884, CCR-8704367, ONR Grant N00014-86-K-0215, and the Cornell NSF Supercomputing Center.Supported by NSF Grant CCR-8702668 and an IBM Faculty Development Award. 相似文献