期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Scheduling parallelizable tasks to minimize make-span and weighted response time

Monte J.D. Pattipati K.R. 《IEEE transactions on systems, man, and cybernetics. Part A, Systems and humans : a publication of the IEEE Systems, Man, and Cybernetics Society》2002,32(3):335-345

This paper presents a generalization to classical scheduling theory by removing the restriction that only one processor can work on a given task at a particular time. Instead, it is assumed that each task can be allocated any number of identical processors from one to the maximum number available, with each task's completion time being a function of the number of processors allocated. Tasks may be started any time, but once started, a task must not have its processor allocation altered or be preempted. Two objective functions are considered: minimizing the overall completion time for the tasks (make-span) and minimizing a weighted sum of the task completion times (weighted response). Both are considered subject to a constraint on the total number of processors available. Suboptimal algorithms are developed for both of these NP-hard problems using Lagrangian relaxation, and their performances are analyzed through extensive simulations. Duality gaps for all problems tested ranged from under 1% to 92%, depending more on the problem size than the specific problem. 相似文献

2.

A scalable scheduling scheme for functional parallelism ondistributed memory multiprocessor systems

Pande S. Agrawal D.P. Mauney J. 《Parallel and Distributed Systems, IEEE Transactions on》1995,6(4):388-399

We attempt a new variant of the scheduling problem by investigating the scalability of the schedule length with the required number of processors, by performing scheduling partially at compile time and partially at run time. Assuming infinite number of processors, the compile time schedule is found using a new concept of the threshold of a task that quantifies a trade-off between the schedule-length and the degree of parallelism. The schedule is found to minimize either the schedule length or the number of required processors and it satisfies: A feasibility condition which guarantees that the schedule delay of a task from its earliest start time is below the threshold, and an optimality condition which uses a merit function to decide the best task-processor match for a set of tasks competing for a given processor. At run time, the tasks are merged producing a schedule for a smaller number of available processors. This allows the program to be scaled down to the processors actually available at run time. Usefulness of this scheduling heuristic has been demonstrated by incorporating the scheduler in the compiler backend for targeting Sisal (Streams and Iterations in a Single Assignment Language) on iPSC/860 相似文献

3.

Lower bound on the number of processors and time for schedulingprecedence graphs with communication costs

Al-Mouhamed M.A. 《IEEE transactions on pattern analysis and machine intelligence》1990,16(12):1390-1401

A lower bound on the number of processors and finish time for the problem of scheduling precedence graphs with communication costs is presented. The notion of the earliest starting time of a task is formulated for the context of lower bounds. A lower bound on the completion time is proposed. A task delay which does not increase the earliest completion time of a schedule is defined. Each task can then be scheduled within a time interval without affecting the lower bound performance on the finish time. This leads to definition of a new lower bound on the number of processors required to process the task graph. A derivation of the minimum time increase over the earliest completion time is also proposed for the case of a smaller number of processors. A lower bound on the minimum number of interprocessor communication links required to achieve optimum performance is proposed. Evaluation had been carried out by using a set of 360 small graphs. The bound on the finish time deviates at most by 5% from the optimum solution in 96% of the cases and performs well with respect to the minimum number of processors and communication links 相似文献

4.

Scheduling preemptive multiprocessor tasks on dedicated processors

《Performance Evaluation》1994,20(4):361-371

In the classical scheduling theory it is widely assumed that any task requires for its processing only one processor at a time. In this paper the problem of deterministic scheduling of tasks requiring for their processing more than one processor at a time, i.e., a constant set of dedicated processors, is analyzed. Schedule length is assumed to be a performance measure. Tasks are assumed to be preemptable and independent. Low order polynomial algorithms for simple cases of the problem are given. Then a method to solve the general version of the problem for a limited number of processors is presented, while the case of an arbitrary number of processors is known to be NP-hard. Finally, a version of the problem, where besides processors every task can also require additional resources, is considered. 相似文献

5.

Optimal granularity and scheme of parallel test generation in adistributed system

Fujiwara H. Inoue T. 《Parallel and Distributed Systems, IEEE Transactions on》1995,6(7):677-686

Client-Agent-Server model (CAS model) which can decrease the work load of the client by adding agent processors to the Client-Server model (CS model) is proposed and an approach to parallel test generation for logic circuits on the CAS model is presented. Two problems are considered: optimal granularity problem and optimal scheme problem. First, the problem of parallel test generation on the CAS model is formulated to analyze the effect of the granularity (grain size of target faults allocated to processors) in both cases of static and dynamic task allocation (optimal granularity problem). Then the relationship between the number of processors and the total processing time is analyzed (optimal scheme problem). From the analysis, it is shown that the CAS model can reduce the total processing time over the CS model and that there exists an optimal scheme (an optimal pair of numbers of agent processors and server processors) for the CAS model which minimizes the total processing time for a given number of processors. To corroborate the analysis, the proposed parallel test generation algorithm is implemented on a network of more than 100 workstations and experimental results for the ISCAS benchmark circuits are presented. It is shown that the experimental results are very close to the theoretical results which confirms the existence of optimal granularity and optimal scheme which minimizes the total processing time for the CAS model 相似文献

6.

Resource estimation for heterogeneous computing

Mary M. Eshaghian Ying-Chieh Wu 《Future Generation Computer Systems》1997,12(6):505-520

Code-profiling is the process of determining the types of codes found in a given heterogeneous task. Once this information is available, it is desirable to know how many processors are needed for each of the code types. In this paper, we propose two methods for estimating the minimum number of processors required for each of these code types. The first method involves making use of task compatibility graphs. We show that a task compatibility graph can be generated by analyzing certain compatible relations between task module pairs of a given task flow graph. We define the resource (processor) minimization problem therefore to be equivalent to finding the minimal number of cliques that cover the task compatibility graph, or to finding the minimal number of colors that color the vertices of its complement graph, called task conflict graph. We solve this problem using a greedy approach in O(¦V¦log¦V¦¦E¦) time, where ¦V¦ and ¦E¦ are the number of vertices and edges of the task compatibility graph. We further show that for three special types of task compatibility graphs, optimal solution can be obtained in polynomial time. The second method studied in this paper uses the Cluster-M methodology for estimating the minimum number of processors. Examples are shown to compare the estimated results obtained using different techniques. 相似文献

7.

On-line scheduling of scalable real-time tasks on multiprocessor systems 总被引：1，自引：0，他引：1

Wan Yeon Lee Sung Je Hong Jong Kim 《Journal of Parallel and Distributed Computing》2003,63(12):1315-1324

The computation time of scalable tasks depends on the number of processors allocated to them in multiprocessor systems. As more processors are allocated to a scalable task, the overall computation time of the task decreases but the total amount of processors’ time devoted to the execution of the task, called workload, increases due to parallel execution overhead. In this paper, we propose a task scheduling algorithm that utilizes the property of scalable tasks for on-line and real-time scheduling. In the proposed algorithm, the total workload of all scheduled tasks is reduced by managing processors allocated to the tasks as few as possible without missing their deadlines. As a result, the processors in the system have less load to execute the scheduled tasks and can execute more newly arriving tasks before their deadlines. Simulation results show that the proposed algorithm performs significantly better than the conventional algorithm based on a fixed number of processors to execute each task. 相似文献

8.

Online scheduling of moldable parallel tasks

Deshi Ye Danny Z. Chen Guochuan Zhang 《Journal of Scheduling》2018,21(6):647-654

In this paper, we study an online scheduling problem with moldable parallel tasks on m processors. Each moldable task can be processed simultaneously on any number of processors of a parallel computer, and the processing time of a moldable task depends on the number of processors allotted to it. Tasks arrive one by one. Upon arrival of each task, the scheduler has to determine both the number of processors and the starting time for the task. Moreover, these decisions cannot be changed in the future. The objective is to attain a schedule such that the longest completion time over all tasks, i.e., the makespan, is minimized. First, we provide a general framework to show that any \(\rho \)-bounded algorithm for scheduling of rigid parallel tasks (the number of processors for a task is fixed a prior) can be extended to yield an algorithm for scheduling of moldable tasks with a competitive ratio of \(4\rho \) if the ratio \(\rho \) is known beforehand. As a consequence, we achieve the first constant competitive ratio, 26.65, for the moldable parallel tasks scheduling problem. Next, we provide an improved algorithm with a competitive ratio of at most 16.74. 相似文献

9.

异构计算环境下基于优先队列划分的调度算法

王兰张龙信满君丰周立前李肯立《小型微型计算机系统》2020,(2):303-309

人工智能的飞速发展对高性能计算提出了更高的要求,异构计算环境下任务调度问题一直是高性能计算中的关键问题.本文提出一种基于优先队列划分的调度算法(PQDSA),该算法根据DAG(有向无循环图)任务集的入口节点数量确定优先队列数,通过任务的通信开销和计算开销划分任务队列,进而将关键节点任务分配给合适的队列,以产生效果较佳的任务调度队列,从而提高任务间的并行性,降低任务集的完工时间.与此同时,进一步基于插入策略将任务调度到处理器上,使任务调度更加高效地执行.PQDSA算法可以减少任务间的时间消耗,提高处理器的调度效率.通过与两个经典算法的性能对比,实验结果表明本文提出的PQDSA算法在任务完工时间和调度效率方面都要明显优于对比的算法. 相似文献

10.

云环境下周期和非周期混合实时任务双容错调度算法

曹洁曾国荪《计算机应用》2015,35(3):648-653

云环境中的处理机故障已成为云计算不可忽视的问题,容错成为设计和发展云计算系统的关键需求。针对一些容错调度算法在任务调度过程中调度效率低下以及任务类型单一的问题,提出一种处理机和任务主副版本分组的容错调度方法;并给出了副版本可重叠执行的判定方法,以及任务最坏响应时间的计算公式。通过实验和分析表明,和以前算法相比,将处理机分成两组分别执行任务主版本和任务副版本,减少了任务调度所需进行可调度测试的时间,增加了副版本重叠执行的机会,减少了所需的处理机个数,对提高系统处理机的利用率和容错调度的效率具有重要的意义。相似文献

11.

Allocating task interaction graphs to processors in heterogeneousnetworks

Chi-Chung Hui Chanson S.T. 《Parallel and Distributed Systems, IEEE Transactions on》1997,8(9):908-925

The problem of allocating task interaction graphs (TIGs) to heterogeneous computing systems to minimize job completion time is investigated. The only restriction is that the interprocessor communication cost is the same for any pair of processors. This is suitable for local area network based systems, such as Ethernet, as well as fully interconnected multiprocessor systems. An optimal polynomial solution exists if sufficient homogeneous processors and communication capacity are available. This solution is generalized to obtain two faster heuristics, one for the case of homogeneous processors and the other for heterogeneous processors. The heuristics were tested extensively with 60,900 systematically generated random TIGs and shown to be stable independent of the size of the TIG. A performance model is also proposed to predict the performance of the heuristic algorithms, and it is successful in explaining the experimental results qualitatively 相似文献

12.

Lower and upper bounds on time for multiprocessor optimal schedules

Kumar Jain K. Rajaraman V. 《Parallel and Distributed Systems, IEEE Transactions on》1994,5(8):879-886

The lower and upper bounds on the minimum time needed to process a given directed acyclic task graph for a given number of processors are derived. It is proved that the proposed lower bound on time is not only sharper than the previously known values but also easier to calculate. The upper bound on time, which is useful in determining the worst case behavior of a given task graph, is presented. The lower and upper bounds on the minimum number of processors required to process a given task graph in the minimum possible time are also derived. It is seen with a number of randomly generated dense task graphs that the lower and upper bounds we derive are equal, thus giving the optimal time for scheduling directed acyclic task graphs on a given set of processors 相似文献

13.

A Threshold Scheduling Strategy for Sisal on Distributed-Memory Machines

《Journal of Parallel and Distributed Computing》1994,21(2):223-236

The problem of scheduling tasks on distributed memory machines is known to be NP-complete in the strong sense, ruling out the possibility of a pseudo-polynomial algorithm. This paper introduces a new heuristic algorithm for scheduling Sisal (Streams and Iterations In a Single Assignment Language) programs on a distributed memory machine, Intel Touchstone i860. Our compile time scheduling method works on IF-2, an intermediate form based on the dataflow parallelism in the program. We initially carry out a dependence analysis, to bind the implicit dependencies across IF-2 graph boundaries, followed by a cost assignment based on Intel Touchstone i860 timings. The scheduler works in two phases. The first phase of the scheduler finds the earliest and latest completion times of each task given by the shortest and longest paths from root task to the given task respectively. A threshold defined as the difference between the latest and the earliest start times of the task, is found. The scheduler varies the value of the allowable threshold, and determines the best value for minimal schedule length. In the second phase of the scheduler, we merge the processors to generate schedule to match the available number of processors. Schedule results for several benchmark programs have been included to demonstrate the effectiveness of our approach. 相似文献

14.

Benchmark-problem instances for static scheduling of task graphs with communication delays on homogeneous multiprocessor systems

Tatjana Davidović Teodor Gabriel Crainic 《Computers & Operations Research》2006

Scheduling program tasks on processors is at the core of the efficient use of multiprocessor systems. Most task-scheduling problems are known to be NP-Hard and, thus, heuristics are the method of choice in all but the simplest cases. The utilization of acknowledged sets of benchmark-problem instances is essential for the correct comparison and analysis of heuristics. Yet, such sets are not available for several important classes of scheduling problems, including multiprocessor scheduling problem with communication delays (MSPCD) where one is interested in scheduling dependent tasks onto homogeneous multiprocessor systems, with processors connected in an arbitrary way, while explicitly accounting for the time required to transfer data between tasks allocated to different processors. We propose test-problem instances for the MSPCD that are representative in terms of number of processors, type of multiprocessor architecture, number of tasks to be scheduled, and task graph characteristics (task execution times, communication costs, and density of dependencies between tasks). Moreover, we define our task-graph generators in a way appropriate to ensure that the corresponding problem instances obey the theoretical principles recently proposed in the literature. 相似文献

15.

Optimal fault-tolerant computing on multiprocessor systems

John Bruno E.G. Coffman Jr 《Acta Informatica》1997,34(12):881-904

Suppose identical processors, each subject to random failures, are available for running a single job of given duration . The failure law is operative only while a processor is active. To guard against the loss of accrued work due to a failure, checkpoints can be made, each requiring time ; a successful checkpoint saves the state of the computation, but failures can also occur during checkpoints. The problem is to determine how best to schedule checkpoints if the goal is to maximize the probability that the job finishes before all processors fail. We solve this problem first for and an exponential failure law. For given and we show how to determine an integer and time intervals such that an optimal procedure is to run the job on one processor, checkpointing at the end of each interval , until either the job is done or a failure occurs. In the latter case, the remaining processor resumes the job starting in the state saved by the last successful checkpoint; the job then runs until it completes or until the second processor also fails. We give an explicit formula for the maximum achievable probability of completing the job for any fixed . An explicit result for , the optimum value of , seems out of reach; however, we give upper and lower bounds on that are remarkably tight; they show that only a few values of need to be tested in order to find . With the failure rate normalized to 1, we also derive the asymptotic estimate and calculate conditional expected job completion times. For the more difficult problem with processors, we formulate a computational approach based on a discretized model in which the failure law is the analogous geometric distribution. By proving a unimodality property of the optimal completion probability, we are able to describe a computation of this optimum that requires time, where is the job running time. Several examples bring out behavioral details. Received: 29 September 1995 / 29 January 1997 相似文献

16.

A time-optimal solution for the path cover problem on cographs

Koji Nakano Stephan Olariu Albert Y. Zomaya 《Theoretical computer science》2003,290(3):1541-1556

We show that the notoriously difficult problem of finding and reporting the smallest number of vertex-disjoint paths that cover the vertices of a graph can be solved time- and work-optimally for cographs. Our result implies that for this class of graphs the task of finding a Hamiltonian path can be solved time- and work-optimally in parallel.

It was open for more than 10 years to find a time- and work-optimal parallel solution for this important problem. Our contribution is to offer an optimal solution to this important problem. We begin by showing that any algorithm that solves an instance of size n of the problem must take Ω(log n) time on the CREW, even if an infinite number of processors are available. We then go on to show that this time lower bound is tight by devising an EREW algorithm that, given an n-vertex cograph G represented by its cotree, finds and reports all the paths in a minimum path cover in O(log n) time using n/log n processors. 相似文献

17.

Scheduling precedence constrained task graphs with non-negligibleintertask communication onto multiprocessors

Selvakumar S. Siva Ram Murthy C. 《Parallel and Distributed Systems, IEEE Transactions on》1994,5(3):328-336

The multiprocessor scheduling problem is the problem of scheduling the tasks of a precedence constrained task graph (representing a parallel program) onto the processors of a multiprocessor in a way that minimizes the completion time. Since this problem is known to be NP-hard in the strong sense in all but a few very restricted eases, heuristic algorithms are being developed which obtain near optimal schedules in a reasonable amount of computation time. We present an efficient heuristic algorithm for scheduling precedence constrained task graphs with nonnegligible intertask communication onto multiprocessors taking contention in the communication channels into consideration. Our algorithm for obtaining satisfactory suboptimal schedules is based on the classical list scheduling strategy. It simultaneously exploits the schedule-holes generated in the processors and in the communication channels during the scheduling process in order to produce better schedules. We demonstrate the effectiveness of our algorithm by comparing with two competing heuristic algorithms available in the literature 相似文献

18.

Effective reformulations for task allocation in distributed systems with a large number of communicating tasks 总被引：2，自引：0，他引：2

Menon S. 《Knowledge and Data Engineering, IEEE Transactions on》2004,16(12):1497-1508

In any distributed processing environment, decisions need to be made concerning the assignment of computational task modules to various processors. Many versions of the task allocation problem have appeared in the literature. Intertask communication makes the assignment decision difficult; capacity limitations at the processors increase the difficulty. This problem is naturally formulated as a nonlinear integer program, but can be linearized to take advantage of commercial integer programming solvers. While traditional approaches to linearizing the problem perform well when only a few tasks communicate, they have considerable difficulty solving problems involving a large number of intercommunicating tasks. This paper introduces new mixed integer formulations for three variations of the task allocation problem. Results from extensive computational tests conducted over real and generated data indicate that the reformulations are particularly efficient when a large number of tasks communicate, solving reasonablylarge problems faster than other exact approaches available. 相似文献

19.

The solution algorithms for the multiprocessor scheduling with workspan criterion

Radosław Rudek Agnieszka Rudek Andrzej Kozik 《Expert systems with applications》2013,40(8):2799-2806

In this paper, we consider multiprocessor scheduling problems, where each job (task) must be executed simultaneously by the specified number of processors, but the indices of the processors allotted to each job do not have to be contiguous (i.e., jobs can be fragmentable). Unlike other research in this domain, we analyse the problem under the workspan criterion, which is defined as the product of the maximum job completion time (makespan) and the number of used processors. Moreover, the job processing times can be described by non-increasing or non-decreasing functions dependent on the start times of jobs that model improvement (learning) or degradation (deteriorating), respectively. To solve the problems, we construct some polynomial time algorithms and analyse numerically their efficiency. 相似文献

20.

A fast recursive mapping algorithm

Song Chen Mary M. Eshaghian 《Concurrency and Computation》1995,7(5):391-409

The paper presents a generic technique for mapping parallel algorithms onto parallel architectures. The proposed technique is a fast recursive mapping algorithm which is a component of the Cluster-M programming tool. The other components of Cluster-M are the Specification module and the Representation module. In the Specification module, for a given task specified by a high-level machine-independent program, a clustered task graph called Spec graph is generated. In the Representation module, for a given architecture or computing organization, a clustered system graph called Rep graph is generated. Given a task (or system) graph, a Spec (or Rep) graph can be generated using one of the clustering algorithms presented in the paper. The clustering is done only once for a given task graph (system graph) independent of any system graphs (task graphs). It is a machine-independent (application-independent) clustering, and therefore it is not repeated for different mappings. The Cluster-M mapping algorithm presented produces a sub-optimal matching of a given Spec graph containing M task modules, onto a Rep graph of N processors, in O(MN) time. This generic algorithm is suitable for both the allocation problem and the scheduling problem. Its performance is compared to other leading techniques. We show that Cluster-M produces better or similar results in significantly less time and using fewer or an equal number of processors as compared to the other known methods. 相似文献