期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Optimizing the Reliability of Streaming Applications Under Throughput Constraints

Anne Benoit Hinde Lilia Bouziane Yves Robert 《International journal of parallel programming》2011,39(5):584-614

Mapping a pipelined application onto a distributed and parallel platform is a challenging problem. The problem becomes even more difficult when multiple optimization criteria are involved, and when the target resources are heterogeneous (processors and communication links) and subject to failures. This paper investigates the problem of mapping pipelined applications, consisting of a linear chain of stages executed in a pipeline way, onto such platforms. The objective is to optimize the reliability under a performance constraint, i.e., while guaranteeing a threshold throughput. In order to increase reliability, we replicate the execution of stages on multiple processors. We compare interval mappings, where the application is partitioned into intervals of consecutive stages, with general mappings, where stages may be partitioned without any constraint, thereby allowing a better usage of processors and communication network capabilities. However, the price to pay for general mappings is a dramatic increase in the problem complexity. We show that computing the period of a given general mapping is an NP-complete problem, and we give polynomial bounds to determine a (conservative) approximated value. On the contrary, the period of an interval mapping obeys a simple formula, and we provide an optimal dynamic programming algorithm for the bi-criteria interval mapping problem on homogeneous platforms. On the more practical side, we design a set of efficient heuristics, and we compare the performance of interval and general mapping strategies through extensive simulations. 相似文献

2.

Performance of local search heuristics on scheduling a class of pipelined multiprocessor tasks 总被引：1，自引：0，他引：1

M. Fikret Ceyda 《Computers & Electrical Engineering》2005,31(8):537-555

This paper presents the evaluation of the solution quality of heuristic algorithms developed for scheduling multiprocessor tasks for a class of multiprocessor architectures designed to exploit temporal and spatial parallelism simultaneously. More specifically, we deal with multi-level or partitionable architectures where MIMD parallelism and multiprogramming support are the two main characteristics of the system. We investigate scheduling a number of pipelined multiprocessor tasks with arbitrary processing times and arbitrary processor requirements in this system. The scheduling problem consists of two interrelated sub-problems, which are finding a sequence of pipelined multiprocessor tasks on a processor and finding a proper mapping of tasks to the processors that are already being sequenced. For the solution of the second problem, various techniques are available. However, the problem remains of generating a feasible sequence for the pipelined operations. We employed three well-known local search heuristic algorithms that are known to be robust methods applicable to various optimization problems. These are Simulated Annealing, Tabu Search, and Genetic Algorithms. We then conduct computational experiments and evaluate the reduction achieved in completion time by each heuristic. We have also compared the results with well-known simple list-based heuristics. 相似文献

3.

Optimal Use of Mixed Task and Data Parallelism for Pipelined Computations

《Journal of Parallel and Distributed Computing》2000,60(3):297-319

This paper addresses optimal mapping of parallel programs composed of a chain of data parallel tasks onto the processors of a parallel system. The input to the programs is a stream of data sets, each of which is processed in order by the chain of tasks. This computation structure, also referred to as a data parallel pipeline, is common in several application domains, including digital signal processing, image processing, and computer vision. The parameters of the performance for such stream processing are latency (the time to process an individual data set) and throughput (the aggregate rate at which data sets are processed). These two criteria are distinct since multiple data sets can be pipelined or processed in parallel. The central contribution of this research is a new algorithm to determine a processor mapping for a chain of tasks that optimizes latency in the presence of a throughput constraint. We also discuss how this algorithm can be applied to solve the converse problem of optimizing throughput with a latency constraint. The problem formulation uses a general and realistic model of intertask communication and addresses the entire problem of mapping, which includes clustering tasks into modules, assigning of processors to modules, and possible replicating of modules. The main algorithms are based on dynamic programming and their execution time complexity is polynomial in the number of processors and tasks. The entire framework is implemented as an automatic mapping tool in the Fx parallelizing compiler for a dialect of High Performance Fortran. 相似文献

4.

Mapping workflow applications with types on heterogeneous specialized platforms

Anne Benoit Jean-Marc Nicod 《Parallel Computing》2011,37(8):410-427

In this paper, we study the problem of optimizing the throughput of coarse-grain workflow applications, for which each task of the workflow is of a given type, and subject to failures. The goal is to map such an application onto a heterogeneous specialized platform, which consists of a set of processors that can be specialized to process one type of tasks. The objective function is to maximize the throughput of the workflow, i.e., the rate at which the data sets can enter the system. If there is exactly one task per processor in the mapping, then we prove that the optimal solution can be computed in polynomial time. However, the problem becomes NP-hard if several tasks can be assigned to the same processor. Several polynomial time heuristics are presented for the most realistic specialized setting, in which tasks of the same type can be mapped onto the same processor, but a processor cannot process two tasks of different types. Also, we give an integer linear program formulation of this problem, which allows us to find the optimal solution (in exponential time) for small problem instances. Experimental results show that the best heuristics obtain a good throughput, much better than the throughput obtained with a random mapping. Moreover, we obtain a throughput close to the optimal solution in the particular cases on which the optimal throughput can be computed (small problem instances or particular mappings). 相似文献

5.

Reliability versus performance for critical applications

Alain Girault Érik Saule Denis Trystram 《Journal of Parallel and Distributed Computing》2009

Applications implemented on critical systems are subject to both safety critical and real-time constraints. Classically, applications are specified as precedence task graphs that must be scheduled onto a given target multiprocessor heterogeneous architecture. We propose a new method for simultaneously optimizing two objectives: the execution time and the reliability of the schedule. The problem is decomposed into two successive steps: a spatial allocation during which the reliability is maximized (randomized algorithm), and a scheduling during which the makespan is minimized (list scheduling algorithm). It allows us to produce several trade-off solutions, among which the user can choose the solution that best fits the application’s requirements. Reliability is increased by replicating adequate tasks onto well chosen processors. Our fault model assumes that processors are fail-silent, that they are subject to transient failures, and that the occurrences of failures follow a constant parameter Poisson law. We assess and validate our method by running extensive simulations on both random graphs and actual application graphs. They show that it is competitive, in terms of makespan, compared to existing reference scheduling methods for heterogeneous processors (HEFT), while providing a better reliability. 相似文献

6.

Parallel computation of symbolic robot models and control laws: Theory and application to transputer networks

N. Kir&#x;anski T. Petrovi&#x; M. Vukobratovi&#x; 《野外机器人技术杂志》1993,10(3):345-368

New computer architectures based on large numbers of processors are now used in various application areas ranging from embedded systems to supercomputers. Efficient parallel processing algorithms are applied in a wide variety of applications such as simulation, robot control, and image synthesis. This article presents two novel parallel algorithms for computing robot inverse dynamics (as well as control laws) starting from customized symbolic robot models. To gain the most benefit from the concurrent processor architecture, the whole job is divided into a large number of simple tasks, each involving only a single floating-point operation. Although requiring sophisticated scheduling schemes, fine granularity of tasks was the key factor for achieving nearly maximum efficiency and speedup. The first algorithm resolves the scheduling problem for an array of pipelined processors. The second one is devoted to parallel processors connected by a complete crossbar interconnection network. The main feature of the proposed algorithms is that they take into account the communication delays between processors and minimize both the execution time and communication cost. To prove the theoretical results, the algorithms have been verified by experiments on an INMOS T800 transputer-based system. We used four transputers in serial and parallel configurations. The experimental results show that the most complicated dynamic control laws can be executed in a submilisecond time interval. © 1993 John Wiley & Sons, Inc. 相似文献

7.

A static mapping heuristics to map parallel applications to heterogeneous computing systems

Ranieri Baraglia Renato Ferrini Pierluigi Ritrovato 《Concurrency and Computation》2005,17(13):1579-1605

In order to minimize the execution time of a parallel application running on a heterogeneously distributed computing system, an appropriate mapping scheme is needed to allocate the application tasks to the processors. The general problem of mapping tasks to machines is a well‐known NP‐hard problem and several heuristics have been proposed to approximate its optimal solution. In this paper we propose a static graph‐based mapping algorithm, called Heterogeneous Multi‐phase Mapping (HMM), which permits suboptimal mapping of a parallel application onto a heterogeneous computing distributed system by using a local search technique together with a tabu search meta‐heuristic. HMM allocates parallel tasks by exploiting the information embedded in the parallelism forms used to implement an application, and considering an affinity parameter, that identifies which machine in the heterogeneous computing system is most suitable to execute a task. We compare HMM with some leading techniques and with an exhaustive mapping algorithm. We also give an example of mapping of two real applications using HMM. Experimental results show that HMM performs well demonstrating the applicability of our approach. Copyright © 2005 John Wiley & Sons, Ltd. 相似文献

8.

Fault-Tolerant Scheduling for Real-Time Embedded Control Systems 总被引：8，自引：0，他引：8

下载免费PDF全文

Chun-HuaYang GeertDeconinck Wei-HuaGui 《计算机科学技术学报》2004,19(2):0-0

With the increasing complexity of industrial application, an embedded control system (ECS) requires processing a number of hard real-time tasks and needs fault-tolerance to assure high reliability. Considering the characteristics of real-time tasks in ECS, an integrated algorithm is proposed to schedule real-time tasks and to guarantee that all real-time tasks are completed before their deadlines even in the presence of faults. Based on the nonpreemptive critical-section protocol (NCSP), this paper analyzes the blocking time introduced by resource conflicts of relevancy tasks in fault-tolerant multiprocessor systems. An extended schedulability condition is presented to check the assignment feasibility of a given task to a processor. A primary/backup approach and on-line replacement of failed processors are used to tolerate processor failures. The analysis reveals that the integrated algorithm bounds the blocking time, requires limited overhead on the number of processors, and still assures good processor utilization. This is also demonstrated by simulation results. Both analysis and simulation show the effectiveness of the proposed algorithm in ECS. 相似文献

9.

Benchmark-problem instances for static scheduling of task graphs with communication delays on homogeneous multiprocessor systems

Tatjana Davidović Teodor Gabriel Crainic 《Computers & Operations Research》2006

Scheduling program tasks on processors is at the core of the efficient use of multiprocessor systems. Most task-scheduling problems are known to be NP-Hard and, thus, heuristics are the method of choice in all but the simplest cases. The utilization of acknowledged sets of benchmark-problem instances is essential for the correct comparison and analysis of heuristics. Yet, such sets are not available for several important classes of scheduling problems, including multiprocessor scheduling problem with communication delays (MSPCD) where one is interested in scheduling dependent tasks onto homogeneous multiprocessor systems, with processors connected in an arbitrary way, while explicitly accounting for the time required to transfer data between tasks allocated to different processors. We propose test-problem instances for the MSPCD that are representative in terms of number of processors, type of multiprocessor architecture, number of tasks to be scheduled, and task graph characteristics (task execution times, communication costs, and density of dependencies between tasks). Moreover, we define our task-graph generators in a way appropriate to ensure that the corresponding problem instances obey the theoretical principles recently proposed in the literature. 相似文献

10.

A programmable state machine architecture for packet processing

Wangyang Lai Chin-Tau Lea 《Micro, IEEE》2003,23(4):32-42

The Internet is expanding rapidly and constantly adding new protocols and features. To shorten the design cycle, many companies have adopted a common hardware platform for a variety of products. In these products, specialized packet processors tailored for packet processing handle multiple protocols and feature changes. A packet processor usually incorporates multiple RISC engines that are configurable as several instances of parallel processors, working simultaneously or in a pipelined fashion. In either approach, packet processors are complex and expensive. Packet processing has many levels of programmability requirements. Some tasks require only mild programmability and can't justify the use of a full-fledged packet processor. A finite scare machine (FSM), on the other hand, has high performance but cannot adapt to protocol changes. The solution is something in between: fast, programmable, but not as complicated as a packet processor. A programmable state machine (PSM) is such an idea. 相似文献

11.

Partitioning and mapping of parallel programs by self-organization

Hans-Ulrich Heiss Marcus Dormanns 《Concurrency and Computation》1996,8(9):685-706

To execute a parallel program on a multicomputer system, the tasks of the program have to be mapped to the particular processors of the parallel machine. The aim of the mapping is twofold: (i) to achieve a balanced load on the processors (partitioning problem) and (ii) to keep communication delays low by placing communicating tasks closely together (mapping). Since both the communication structure of the program and the interconnection structure of the parallel machine can be represented as graphs, the mapping problem can be regarded as a graph embedding problem to minimize communication costs. As a new heuristic approach to this NP-hard problem we apply Kohonen's self-organizing maps to establish a topology-preserving embedding. Experimental results are presented and compared to other approaches to this problem. The most attractive feature of our new method is that it can be extremely well parallelized. 相似文献

12.

The many faces of consensus in distributed systems

Turek J. Shasha D. 《Computer》1992,25(6):8-17

Known results regarding consensus among processors are surveyed and related to practice. The ideas embodied in the various proofs are explained. The goal is to give practitioners some sense of the system hardware and software guarantees that are required to achieve a given level of reliability and performance. The survey focuses on two categories of failures: fail-stop failures, which occur when processors fail by stopping; and Byzantine failures, which occur when processors fail by acting maliciously 相似文献

13.

Energy- and reliability-aware task scheduling onto heterogeneous MPSoC architectures

Suleyman Tosun 《The Journal of supercomputing》2012,62(1):265-289

Scheduling periodic tasks onto a multiprocessor architecture under several constraints such as performance, cost, energy, and reliability is a major challenge in embedded systems. In this paper, we present an Integer Linear Programming (ILP) based framework that maps a given task set onto an Heterogeneous Multiprocessor System-on-Chip (HMPSoC) architecture. Our framework can be used with several objective functions; minimizing energy consumption, minimizing cost (i.e., the number of heterogeneous processors), and maximizing reliability of the system under performance constraints. We use Dynamic Voltage Scaling (DVS) for reducing energy consumption while we employ task duplication to maximize reliability. We illustrate the effectiveness of our approach through several experiments, each with a different number of tasks to be scheduled. We also propose two heuristics based on Earliest Deadline First (EDF) algorithm for minimizing energy under performance and cost constraints. Our experiments on generated task sets show that ILP-based method reduces the energy consumption up to 62% percent against a method that does not apply DVS. Heuristic methods obtain promising results when compared to optimal results generated by our ILP-based method. 相似文献

14.

Optimal task assignment in homogeneous networks

Cheol-Hoon Lee Shin K.G. 《Parallel and Distributed Systems, IEEE Transactions on》1997,8(2):119-129

This paper considers the problem of assigning the tasks of a distributed application to the processors of a distributed system such that the sum of execution and communication costs is minimized. Previous work has shown this problem to be tractable for a system of two processors or a linear array of N processors, and for distributed programs of serial parallel structures. Here we focus on the assignment problem on a homogeneous network, which is composed of N functionally-identical processors, each with its own memory. Some processors in the network may have unique resources, such as data files or certain peripheral devices. Certain tasks may have to use these unique resources; they are called attached tasks. The tasks of a distributed program should therefore be assigned so as to make use of specific resources located at certain processors in the network while minimizing the amount of interprocessor communication. The assignment problem in such a homogeneous network is known to be NP-hard even for N=3, thus making it intractable for a network with a medium to large number of processors. We therefore focus on task assignment in general array networks, such as linear arrays, meshes, hypercubes, and trees. We first develop a modeling technique that transforms the assignment problem in an array or tree into a minimum-cut maximum-flow problem. The assignment problem is then solved for a general array or tree network in polynomial time 相似文献

15.

A pipelined schedule to minimize completion time for loop tiling with computation and communication overlapping

《Journal of Parallel and Distributed Computing》2003,63(11):1138-1151

This paper proposes a new method for the problem of minimizing the execution time of nested for-loops using a tiling transformation. In our approach, we are interested not only in tile size and shape according to the required communication to computation ratio, but also in overall completion time. We select a time hyperplane to execute different tiles much more efficiently by exploiting the inherent overlapping between communication and computation phases among successive, atomic tile executions. We assign tiles to processors according to the tile space boundaries, thus considering the iteration space bounds. Our schedule considerably reduces overall completion time under the assumption that some part from every communication phase can be efficiently overlapped with atomic, pure tile computations. The overall schedule resembles a pipelined datapath where computations are not anymore interleaved with sends and receives to nonlocal processors. We survey the application of our schedule to modern communication architectures. We performed two sets of experimental results, one using MPI primitives over FastEthernet and one using the SISCI API over an SCI network. In both cases, the total completion time is significantly reduced. 相似文献

16.

Preemptive Scheduling with Dedicated Processors: Applications of Fractional Graph Coloring

Klaus Jansen Lorant Porkolab 《Journal of Scheduling》2004,7(1):35-48

We study the problem of scheduling independent multiprocessor tasks, where for each task in addition to the processing time(s) there is a prespecified dedicated subset (or a family of alternative subsets) of processors which are required to process the task simultaneously. Focusing on problems where all required (alternative) subsets of processors have the same fixed cardinality, we present complexity results for computing preemptive schedules with minimum makespan closing the gap between computationally tractable and intractable instances. In particular, we show that for the dedicated version of the problem, optimal preemptive schedules of bi-processor tasks (i.e., tasks whose dedicated processor sets are all of cardinality two) can be computed in polynomial time. We give various extensions of this result including one to maximum lateness minimization with release times and due dates. All these results are based on a nice relation between preemptive scheduling and fractional coloring of graphs. In contrast to the positive results, we also prove that the problems of computing optimal preemptive schedules for three-processor tasks or for bi-processor tasks with (possible several) alternative modes are strongly NP-hard. 相似文献

17.

Approximation algorithms in partitioning real-time tasks with replications

Jian Lin Albert M. K. Cheng Gokhan Gercek 《International Journal of Parallel, Emergent and Distributed Systems》2018,33(2):211-232

Today is an era where multiprocessor technology plays a major role in designs of modern computer architecture. While multiprocessor systems offer extra computing power, it also opens a new range of opportunities to improve fault-robustness. This paper focuses on a problem of achieving fault-tolerance using replications in real-time, multiprocessor systems. In the problem, multiple replicas, or copies, of a computing task are executed on distinct processors to resist potential processor failures and computing faults. Two greedy, approximation heuristics, named Worst Fit Increasing K-Replication and First Fit Increasing K-Replication, are studied to maximise the number of real-time tasks assigned on a system with identical processors, respecting to the tasks’ replicating and timely requirements. Worst case performance is analysed by using an approximation ratio between the algorithms and an optimal solution. We mathematically prove that the ratios of using both algorithms are infinitely close to 2. Simulations are performed on a large set of testing cases which can be used to bring to light the average performance of using the algorithms in practice. The results show that both heuristic algorithms provide simple but fast and effective solutions to solve the problem. 相似文献

18.

Fault-Tolerant Dynamic Rescheduling for Heterogeneous Computing Systems

Jing Mei Kenli Li Xu Zhou Keqin Li 《Journal of Grid Computing》2015,13(4):507-525

As the scale and complexity of heterogeneous computing systems grow, failures occur frequently and have an adverse effect on solving large-scale applications. Hence, fault-tolerant scheduling is an imperative step for large-scale computing systems. The existing fault-tolerant scheduling algorithms belong to static scheduling, and they allocate multiple copies of each task to several processors no matter whether processor failures affect the execution of tasks. Such active replication strategies not only waste resource but also sacrifice the makespan. What is more, they cannot guarantee the successful execution of applications. In this paper, we propose a fault-tolerant dynamic rescheduling algorithm named FTDR, which can overcome above drawbacks. FTDR keeps listening to the processor failure, and reschedules the suspended tasks once failures occur. Because FTDR reschedules the tasks that are suspended because of failures, it can tolerate an arbitrary number of failures. Randomly generated DAGs are tested in our experiments. Experimental results show that the proposed algorithm achieves good performance in terms of makespan and resource consumption compared with its direct competitors. 相似文献

19.

面向同构多核处理器的节能任务调度方法

王颖锋刘志镜《计算机科学》2011,38(9):294-297

对于运行在同构多核处理器上的周期性硬实时任务,设计了一个基于动态电压调节的节能调度方法。该方法首先将计算任务按照周期数降序排序并基于计算任务调度长度最短的原则安排任务映射。然后将各个处理核上具有最小通讯时间的计算任务设置为最后执行的计算任务而其它计算任务顺序保持不变。在初始映射中所有计算任务都被分配最高频率的情况下,每个处理核上的计算任务在执行时间扩展过程中确定最佳的计算任务顺序。基于 Intel PXA270的功耗模型,以几个随机任务集作实验。结果表明提出的方法能够有效地降低多核处理器的能量。相似文献

20.

Multi-core real-time scheduling for generalized parallel task models 总被引：1，自引：0，他引：1

Abusayeed Saifullah Jing Li Kunal Agrawal Chenyang Lu Christopher Gill 《Real-Time Systems》2013,49(4):404-435

Multi-core processors offer a significant performance increase over single-core processors. They have the potential to enable computation-intensive real-time applications with stringent timing constraints that cannot be met on traditional single-core processors. However, most results in traditional multiprocessor real-time scheduling are limited to sequential programming models and ignore intra-task parallelism. In this paper, we address the problem of scheduling periodic parallel tasks with implicit deadlines on multi-core processors. We first consider a synchronous task model where each task consists of segments, each segment having an arbitrary number of parallel threads that synchronize at the end of the segment. We propose a new task decomposition method that decomposes each parallel task into a set of sequential tasks. We prove that our task decomposition achieves a resource augmentation bound of 4 and 5 when the decomposed tasks are scheduled using global EDF and partitioned deadline monotonic scheduling, respectively. Finally, we extend our analysis to a directed acyclic graph (DAG) task model where each node in the DAG has a unit execution requirement. We show how these tasks can be converted into synchronous tasks such that the same decomposition can be applied and the same augmentation bounds hold. Simulations based on synthetic workload demonstrate that the derived resource augmentation bounds are safe and sufficient. 相似文献