期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Constructive methods for scheduling uniform loop nests 总被引：1，自引：0，他引：1

Darte A. Robert Y. 《Parallel and Distributed Systems, IEEE Transactions on》1994,5(8):814-822

This paper surveys scheduling techniques for loop nests with uniform dependences. First, we introduce the hyperplane method and related variants. Then we extend it by using a different affine scheduling for each statement within the nest. In both cases, we present a new, constructive, and efficient method to determine optimal solutions, i.e., schedules whose total execution time is minimum 相似文献

2.

A cost-effective implementation of multilevel tiling

Jimenez M. Llaberia J.M. Fernandez A. 《Parallel and Distributed Systems, IEEE Transactions on》2003,14(10):1006-1020

This paper presents a new cost-effective algorithm to compute exact loop bounds when multilevel tiling is applied to a loop nest having affine functions as bounds (nonrectangular loop nest). Traditionally, exact loop bounds computation has not been performed because its complexity is doubly exponential on the number of loops in the multilevel tiled code and, therefore, for certain classes of loops (i.e., nonrectangular loop nests), can be extremely time consuming. Although computation of exact loop bounds is not very important when tiling only for cache levels, it is critical when tiling includes the register level. This paper presents an efficient implementation of multilevel tiling that computes exact loop bounds and has a much lower complexity than conventional techniques. To achieve this lower complexity, our technique deals simultaneously with all levels to be tiled, rather than applying tiling level by level as is usually done. For loop nests having very simple affine functions as bounds, results show that our method is between 15 and 28 times faster than conventional techniques. For loop nests caving not so simple bounds, we have measured speedups as high as 2,300. Additionally, our technique allows eliminating redundant bounds efficiently. Results show that eliminating redundant bounds in our method is between 22 and 11 times faster than in conventional techniques for typical linear algebra programs. 相似文献

3.

A consistent generation of pipeline parallelism and distribution of operations and data among processors

E. V. Adutskevich N. A. Likhoded 《Programming and Computer Software》2006,32(3):166-176

The problem of mapping affine loop nests onto parallel computers with distributed memory is considered. A technique for algorithm scheduling and distributing operations and data over processors is proposed. This technique makes it possible to generate pipeline parallelism and minimize the amount of data exchanges between the processors. The method is adapted for automation and explicitly allows for dependence on outer variables of loops. 相似文献

4.

面向SIMD机器的全局自动数据分割

林进朱宁宁张兆庆乔如良《计算机学报》1999,22(6):596-602

提出了一种面向ＳＩＭＤ机器的全局数据自动分割算法,该算法能处理多个非紧嵌折循环嵌套,并且数组下标存取为循环变量的线性式,首先通过数据与迭代映射抽象了计算中的通信方式,然事提出识别规则模式通信模式的形式比条件,接着建立包含对准信息和相应通信开销的数据迭代图,并在数据迭代图的基础上提出了一个启发式算法来计算较优的数据分布和迭代分布,以优化处理单元之间的通信开销,通过发析多个循环嵌套所涉及的多个数组映和相似文献

5.

Some efficient solutions to the affine scheduling problem. Part II. Multidimensional time

Paul Feautrier 《International journal of parallel programming》1992,21(6):389-420

This paper extends the algorithms which were developed in Part I to cases in which there is no affine schedule, i.e. to problems whose parallel complexity is polynomial but not linear. The natural generalization is to multidimensional schedules with lexicographic ordering as temporal succession. Multidimensional affine schedules, are, in a sense, equivalent to polynomial schedules, and are much easier to handle automatically. Furthermore, there is a strong connection between multidimensional schedules and loop nests, which allows one to prove that a static control program always has a multidimensional schedule. Roughly, a larger dimension indicates less parallelism. In the algorithm which is presented here, this dimension is computed dynamically, and is just sufficient for scheduling the source program. The algorithm lends itself to a divide and conquer strategy. The paper gives some experimental evidence for the applicability, performances and limitations of the algorithm. 相似文献

6.

Some efficient solutions to the affine scheduling problem. I. One-dimensional time

Paul Feautrier 《International journal of parallel programming》1992,21(5):313-347

Programs and systems of recurrence equations may be represented as sets of actions which are to be executed subject to precedence constraints. In may cases, actions may be labelled by integral vectors in some iterations domains, and precedence constraints may be described by affine relations. A schedule for such a program is a function which assigns an execution data to each action. Knowledge of such a schedule allows one to estimate the intrinsic degree of parallelism of the program and to compile a parallel version for multiprocessor architectures or systolic arrays. This paper deals with the problem of finding closed form schedules as affine or piecewise affine functions of the iteration vector. An algorithm is presented which reduces the scheduling problem to a parametric linear program of small size, which can be readily solved by an efficient algorithm. 相似文献

7.

Mapping of Affine Loop Nests onto Independent Processors

N. A. Likhoded 《Cybernetics and Systems Analysis》2003,39(3):459-466

A method is given for obtaining independent parts of algorithms, represented by affine loop nests (not necessary perfectly nested). The method is based on a modular affine mapping of algorithm operations onto independent virtual processors. The method can select more independent computations than the known procedures based on affine mappings. 相似文献

8.

Minimizing makespan subject to minimum flowtime on two identical parallel machines

《Computers & Operations Research》2001,28(7):705-717

We consider the problem of scheduling jobs on two parallel identical machines where an optimal schedule is defined as one that gives the smallest makespan (the completion time of the last job) among the set of schedules with optimal total flowtime (the sum of the completion times of all jobs). We propose an algorithm to determine optimal schedules for the problem, and describe a modified multifit algorithm to find an approximate solution to the problem in polynomial computational time. Results of a computational study to compare the performance of the proposed algorithms with a known heuristic shows that the proposed heuristic and optimization algorithms are quite effective and efficient in solving the problem.Scope and purposeMultiple objective optimization problems are quite common in practice. However, while solving scheduling problems, optimization algorithms often consider only a single objective function. Consideration of multiple objectives makes even the simplest multi-machine scheduling problems NP-hard. Therefore, enumerative optimization techniques and heuristic solution procedures are required to solve multi-objective scheduling problems. This paper illustrates the development of an optimization algorithm and polynomially bounded heuristic solution procedures for the scheduling jobs on two identical parallel machines to hierarchically minimize the makespan subject to the optimality of the total flowtime. 相似文献

9.

Parallel machine match-up scheduling with manufacturing cost considerations

M. Selim Aktürk Alper Atamtürk Sinan Gürel 《Journal of Scheduling》2010,13(1):95-110

Many scheduling problems in practice involve rescheduling of disrupted schedules. In this study, we show that in contrast to fixed processing times, if we have the flexibility to control the processing times of the jobs, we can generate alternative reactive schedules considering the manufacturing cost implications in response to disruptions. We consider a non-identical parallel machining environment where processing times of the jobs are compressible at a certain manufacturing cost, which is a convex function of the compression on the processing time. In rescheduling it is highly desirable to catch up the original schedule as soon as possible by reassigning the jobs to the machines and compressing their processing times. On the other hand, one must also keep the manufacturing cost due to compression of the jobs low. Thus, one is faced with a tradeoff between match-up time and manufacturing cost criteria. We introduce alternative match-up scheduling problems for finding schedules on the efficient frontier of this time/cost tradeoff. We employ the recent advances in conic mixed-integer programming to model these problems effectively. We further provide a fast heuristic algorithm driven by dual prices of convex subproblems for generating approximate efficient schedules. 相似文献

10.

Enhanced Co-Scheduling: A Software Pipelining Method Using Modulo-Scheduled Pipeline Theory

R. Govindarajan N. S. S. Narasimha Rao E. R. Altman Guang R. Gao 《International journal of parallel programming》2000,28(1):1-46

Instruction scheduling methods which use the concepts developed by the classical pipeline theory have been proposed for architectures involving deeply pipelined function units. These methods rely on the construction of state diagrams (or automatons) to (i) efficiently represent the complex resource usage pattern; and (ii) analyze legal initiation sequences, i.e., those which do not cause a structural hazard. In this paper, we propose a state-diagram based approach for modulo scheduling or software pipelining, an instruction scheduling method for loops. Our approach adapts the classical pipeline theory for modulo scheduling, and, hence, the resulting theory is called Modulo-Scheduled pipeline (MS-pipeline) theory. The state diagram, called the Modulo-Scheduled (MS) state diagram is helpful in identifying legal initiation or latency sequences, that improve the number of instructions initiated in a pipeline. An efficient method, called Co-scheduling, which uses the legal initiation sequences as guidelines for constructing software pipelined schedules has been proposed in this paper. However, the complexity of the constructed MS-state diagram limits the usefulness of our Co-scheduling method. Further analysis of the MS-pipeline theory, reveals that the space complexity of the MS-state diagram can be significantly reduced by identifying primary paths. We develop the underlying theory to establish that the reduced MS-state diagram consisting only of primary paths is complete; i.e., it retains all the useful information represented by the original state diagram as far as scheduling of operations is concerned. Our experiments show that the number of paths in the reduced state diagram is significantly lower—by 1 to 3 orders of magnitude—compared to the number of paths in the original state diagram. The reduction in the state diagram facilitate the Co-scheduling method to consider multiple initiations sequences, and hence obtain more efficient schedules. We call the resulting method, enhanced Co-scheduling. The enhanced Co-scheduling method produced efficient schedules when tested on a set of 1153 benchmark loops. Further the schedules produced by this method are significantly better than those produced by Huff's Slack Scheduling method, a competitive software pipelining method, in terms of both the initiation interval of the schedules and the time taken to construct them. 相似文献

11.

Theoretical proof of edge search strategy applied to power plantstart-up scheduling

Kamiya A. Kawai K. Ono I. Kobayashi S. 《IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics》2002,32(3):316-331

Power plant start-up scheduling is aimed at minimizing the start-up time while limiting maximum turbine rotor stresses. This scheduling problem is highly nonlinear and has a number of local optima. In our previous research, we proposed an efficient search model: genetic algorithms (GAs) with enforcement operation to focus the search along the edge of the feasible space where the optimal schedule is supposed to stay. Based on a nonlinear dynamic simulation and a linear inverse calculation with the iteration method, the enforcement operation is applied to move schedules generated by GA toward the edge. We prove that the optimal schedule lies on the edge, ensuring that searching along the edge instead of the entire space can improve the search efficiency significantly without missing the optimum. Furthermore, we provide a theoretical setting equation for the inverse enforcement gains of the linear inverse calculation, intended to move schedules closer to the edge at each iteration of the enforcement operation. The theoretical setting equation is verified and discussed with the test results. We propose the theoretical setting equation with the test results as a guideline for the use of our proposed search model: GA with enforcement operation. 相似文献

12.

Robust scheduling on a single machine to minimize total flow time

Chung-Cheng LuShih-Wei Lin Kuo-Ching Ying 《Computers & Operations Research》2012,39(7):1682-1691

In a real-world manufacturing environment featuring a variety of uncertainties, production schedules for manufacturing systems often cannot be executed exactly as they are developed. In these environments, schedule robustness that guarantees the best worst-case performance is a more appropriate criterion in developing schedules, although most existing studies have developed optimal schedules with respect to a deterministic or stochastic scheduling model. This study concerns robust single machine scheduling with uncertain job processing times and sequence-dependent family setup times explicitly represented by interval data. The objective is to obtain robust sequences of job families and jobs within each family that minimize the absolute deviation of total flow time from the optimal solution under the worst-case scenario. We prove that the robust single machine scheduling problem of interest is NP-hard. This problem is reformulated as a robust constrained shortest path problem and solved by a simulated annealing-based algorithmic framework that embeds a generalized label correcting method. The results of numerical experiments demonstrate that the proposed heuristic is effective and efficient for determining robust schedules. In addition, we explore the impact of degree of uncertainty on the performance measures and examine the tradeoff between robustness and optimality. 相似文献

13.

A framework for resource-constrained rate-optimal softwarepipelining

Govindarajan R. Altman E.R. Gao G.R. 《Parallel and Distributed Systems, IEEE Transactions on》1996,7(11):1133-1149

The rapid advances in high-performance computer architecture and compilation techniques provide both challenges and opportunities to exploit the rich solution space of software pipelined loop schedules. In this paper, we develop a framework to construct a software pipelined loop schedule which runs on the given architecture (with a fixed number of processor resources) at the maximum possible iteration rate (a la rate-optimal) while minimizing the number of buffers-a close approximation to minimizing the number of registers. The main contributions of this paper are: First, we demonstrate that such problem can be described by a simple mathematical formulation with precise optimization objectives under a periodic linear scheduling framework. The mathematical formulation provides a clear picture which permits one to visualize the overall solution space (for rate-optimal schedules) under different sets of constraints. Secondly, we show that a precise mathematical formulation and its solution does make a significant performance difference. We evaluated the performance of our method against three leading contemporary heuristic methods. Experimental results show that the method described in this paper performed significantly better than these methods. The techniques proposed in this paper are useful in two different ways: 1) As a compiler option which can be used in generating faster schedules for performance-critical loops (if the interested users are willing to trade the cost of longer compile time with faster runtime). 2) As a framework for compiler writers to evaluate and improve other heuristics-based approaches by providing quantitative information as to where and how much their heuristic methods could be further improved 相似文献

14.

Steady-State Throughput and Scheduling Analysis of Multicluster Tools: A Decomposition Approach 总被引：1，自引：0，他引：1

Jingang Yi Shengwei Ding Dezhen Song Zhang M.T. 《Automation Science and Engineering, IEEE Transactions on》2008,5(2):321-336

Cluster tools are widely used as semiconductor manufacturing equipment. While throughput analysis and scheduling of single-cluster tools have been well-studied, research work on multicluster tools is still at an early stage. In this paper, we analyze steady-state throughput and scheduling of multicluster tools. We consider the case where all wafers follow the same visit flow within a multicluster tool. We propose a decomposition method that reduces a multicluster tool problem to multiple independent single-cluster tool problems. We then apply the existing and extended results of throughput and scheduling analysis for each single-cluster tool. Computation of lower-bound cycle time (fundamental period) is presented. Optimality conditions and robot schedules that realize such lower-bound values are then provided using ldquopullrdquo and ldquoswaprdquo strategies for single-blade and double-blade robots, respectively. For an -cluster tool, we present lower-bound cycle time computation and robot scheduling algorithms. The impact of buffer/process modules on throughput and robot schedules is also studied. A chemical vapor deposition tool is used as an example of multicluster tools to illustrate the decomposition method and algorithms. The numerical and experimental results demonstrate that the proposed decomposition approach provides a powerful method to analyze the throughput and robot schedules of multicluster tools. 相似文献

15.

Distribution of Operations and Data Arrays over Processors

N. A. Likhoded 《Programming and Computer Software》2003,29(3):173-179

For algorithms described by loop nests, a method of construction of affine mappings of operations and data arrays onto virtual processors is suggested. Only local communications between the processors are required; in particular, there may be no communications at all. The method makes it possible to find many heuristic solutions, allows for dependence on outer loop indices, and can be used for automated parallelizing sequential programs. 相似文献

16.

同步数据流模型调度序列的空间优化

下载免费PDF全文

刘国鑫谭国强贺也平《计算机工程与应用》2009,45(3):198-201

提出了一种嵌入式DSP系统的存储优化方法。该方法基于同步数据流模型SDF（Synchronous Data Flow）。针对其他优化算法不适用于存在反馈环的同步数据流模型的问题,该方法为反馈环的空间优化设计实现了启发式的调度算法,并提出了将SAS（Single Appearance Schedules）和Non-SAS类型调度序列相结合的层次化的空间优化方案,为同步数据流模型调度序列的空间优化提供一个通用的解决方案。实验结果证实了该方案的有效性。相似文献

17.

Minimizing Register Requirements of a Modulo Schedule via Optimum Stage Scheduling

Alexandre E. Eichenberger Edward S. Davidson Santosh G. Abraham 《International journal of parallel programming》1996,24(2):103-132

Modulo scheduling is an efficient technique for exploiting instruction level parallelism in a variety of loops, resulting in high performance code but increased register requirements. We present an approach that schedules the loop operations for minimum register requirements, given a modulo reservation table. Our method determines optimal register requirements for machines with finite resources and for general dependence graphs. Measurements on a benchmark suite of 1327 loops from the Perfect Club, SPEC-89, and the Livermore Fortran Kernels show that the register requirements decrease by 24.8% on average when applying the optimal stage scheduler to the MRT-schedules of a register-insensitive modulo scheduler. 相似文献

18.

A tight flow control for job-shop fabrication lines with finite buffers

Toba H. 《Automation Science and Engineering, IEEE Transactions on》2005,2(1):78-83

We propose a work-in-process (WIP) estimation flow control method which serves as a countermeasure against the throughput degradation problem caused by the redundant blocking time of conventional flow control. This method is based on a scheduling technique of which the most important features are: 1) breaking down the entire schedule into individual lot schedules; 2) lot scheduling to reduce redundant blocking time; and 3) WIP estimation for contiguous finite buffer scheduling. The method, first, schedules operational lots at each equipment unit in a fabrication line by using our scheduling procedure for contiguous finite buffers to satisfy the limit capacity of the buffers. Next, the method estimates the future WIP at each equipment group based on predetermined schedules for performing operations. Finally, the method improves the operation timings by continuously supplying WIP estimation to the scheduling procedure. In an actual liquid crystal display (LCD) fabrication line simulation, we have confirmed that the proposed WIP estimation method is a promising one from the standpoint of the line throughput which we obtained. 相似文献

19.

A genetic algorithm for the design space exploration of datapaths during high-level synthesis 总被引：2，自引：0，他引：2

Krishnan V. Katkoori S. 《Evolutionary Computation, IEEE Transactions on》2006,10(3):213-229

High-level synthesis is comprised of interdependent tasks such as scheduling, allocation, and module selection. For today's very large-scale integration (VLSI) designs, the cost of solving the combined scheduling, allocation, and module selection problem by exhaustive search is prohibitive. However, to meet design objectives, an extensive design space exploration is often critical to obtaining superior designs. We present a framework for efficient design space exploration during high-level synthesis of datapaths for data-dominated applications. The framework uses a genetic algorithm (GA) to concurrently perform scheduling and allocation with the aim of finding schedules and module combinations that lead to superior designs while considering user-specified latency and area constraints. The GA uses a multichromosome representation to encode datapath schedules and module allocations and efficient heuristics to minimize functional and storage area costs, while minimizing circuit latencies. The framework provides the flexibility to perform resource-constrained scheduling, time-constrained scheduling, or a combination of the two, using a simple and fast list-scheduling technique. A graded penalty function is used as an objective function in evaluating the quality of designs to enable the GA to quickly reach areas of the search space where designs meeting user specified criteria are most likely to be found. Since GAs are population-based search heuristics, a unique feature of our framework is its ability to offer a large number of alternative datapath designs, all of which meet design specifications but differ in module, register, and interconnect configurations. Many experiments on well-known benchmarks show the effectiveness of our approach. 相似文献

20.

Scheduling precedence constrained task graphs with non-negligibleintertask communication onto multiprocessors

Selvakumar S. Siva Ram Murthy C. 《Parallel and Distributed Systems, IEEE Transactions on》1994,5(3):328-336

The multiprocessor scheduling problem is the problem of scheduling the tasks of a precedence constrained task graph (representing a parallel program) onto the processors of a multiprocessor in a way that minimizes the completion time. Since this problem is known to be NP-hard in the strong sense in all but a few very restricted eases, heuristic algorithms are being developed which obtain near optimal schedules in a reasonable amount of computation time. We present an efficient heuristic algorithm for scheduling precedence constrained task graphs with nonnegligible intertask communication onto multiprocessors taking contention in the communication channels into consideration. Our algorithm for obtaining satisfactory suboptimal schedules is based on the classical list scheduling strategy. It simultaneously exploits the schedule-holes generated in the processors and in the communication channels during the scheduling process in order to produce better schedules. We demonstrate the effectiveness of our algorithm by comparing with two competing heuristic algorithms available in the literature 相似文献