期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

How to cope with faulty processors in a completely connected network of communicating processors

W.J. Van Gils 《Information Processing Letters》1985,20(4):207-213

Computer systems consisting of a completely connected network of communicating processors should work reliable in spite of a number of malfunctioning processors that give conflicting information to different members of the system. It is necessary that all wellfunctioning processors agree on the message sent by any transmitting processor, regardless of the behaviour of the malfunctioning processors. Moreover, it is required that whenever the transmitter is wellfunctioning, the agreement reached should be equal to the message actually sent by the transmitter. In this paper it is shown by a mathematical proof that this is possible if and only if the number of malfunctioning processors is less than one-third of the total number of processors in the network. In cases where this requirement is fulfilled, we give a clear algorithm to reach agreement. 相似文献

2.

The worst-case analysis of the Garey–Johnson algorithm

Claire Hanen Yakov Zinder 《Journal of Scheduling》2009,12(4):389-400

The Garey–Johnson algorithm is a well known polynomial-time algorithm constructing an optimal schedule for the maximum lateness problem with unit execution time tasks, two parallel identical processors, precedence constraints and release times. The paper is concerned with the worst-case analysis of a generalization of the Garey–Johnson algorithm to the case of arbitrary number of processors. In contrast to other algorithms for the maximum lateness problem, the tight performance guarantee for the even number of processors differs from the tight performance guarantee for the odd number of processors. 相似文献

3.

Comparison of RISC and DSP processors for a transform domain implementation of the discrete wavelet transform

《Journal of Microcomputer Applications》1993,16(1):19-31

In this paper we analyse the efficiency of an implementation of the discrete wavelet transform using a modified transform domain approach on several classes of DSP and RISC processors. The recently emerged discrete wavelet transform is faster than a standard Fast Fourier Transform, yet it is computationally intensive since the number of computations required increases with the number of octaves. A computationally efficient transform domain implementation of the discrete wavelet transform for wavelets of length 20 or greater is described in this paper. To compare the efficiency of RISC and DSP processors for this method, we first analyse the implementation of a radix 2, radix 4, and split radix Fast Fourier Transform algorithm on some of the latest DSP and RISC processors. This is followed by an analysis of the implementation of complex multiplication on the processors under considerations. We show that both DSP and RISC processors need the same number of instruction cycles to execute a complex multiplication. Finally, we show that those RISC processors which have their instruction cycle time equal to 85% of a DSP processor's instruction cycle time, give better performance than DSP processors for implementing the wavelet transform using a longer wavelet. 相似文献

4.

Distributed computing on oriented anonymous hypercubes with faulty components

Evangelos Kranakis Nicola Santoro 《Distributed Computing》2001,14(3):185-189

We give efficient algorithms for distributed computation on oriented, anonymous, asynchronous hypercubes with possible faulty components (i.e. processors and links) and deterministic processors. Initially, the processors know only the size of the network and that they are inter-connected in a hypercube topology. Faults may occur only before the start of the computation (and that despite this the hypercube remains a connected network). However, the processors do not know where these faults are located. As a measure of complexity we use the total number of bits transmitted during the execution of the algorithm and we concentrate on giving algorithms that will minimize this number of bits. The main result of this paper is an algorithm for computing Boolean functions on anonymous hypercubes with bit cost , where is the number of faulty components (i.e. links plus processors), is the number of links which are either faulty, or non-faulty but adjacent to faulty processors, and is the diameter of the hypercube with faulty components. Received: October 1992 / Accepted: April 2001 相似文献

5.

Broadcasting multiple messages in the multiport model

Bar-Noy A. Ching-Tien Ho 《Parallel and Distributed Systems, IEEE Transactions on》1999,10(5):500-508

We consider the problem of broadcasting multiple messages from one processor to many processors in the k-port model for message-passing systems. In such systems, processors communicate in rounds, where in every round, each processor can send k messages to k processors and receive k messages from k processors In this paper, we first present a simple and practical algorithm based on variations of k complete k-ary trees. We then present an optimal algorithm up to an additive term of one for this problem for any number of processors, any number of messages, and any value for k 相似文献

6.

Parallel sorting with cooperating heaps in a linear array of processors

Yen-Chun Lin Ferng-Ching Lin 《Parallel Computing》1990,16(2-3):273-278

A parallel sorting algorithm using cooperating heaps in a linear array of processors is presented. It can sort a sequence whose length is much larger than the number of processors. Because the output begins one step after all the items have been input, sorting n items requires 2n + 1 steps. Two independent modifications of the algorithm are possible; one tries to reduce the number of processors used, and the other can sort more items on the same array. 相似文献

7.

一种多处理机任务分配的启发式算法 总被引：4，自引：1，他引：4

冯斌孙俊《计算机工程》2004,30(14):63-65,157

列表调度方法与其它方法相比，可以用较少的开销获得更好的结果。但仅用于处理机个数有限的系统，对于处理机个数无限的系统，调度策略都是基于任务簇调度的。文章提出了一种处理机个数无限的任务分配的列表调度算法，称之为节点迁移调度算法(NTSA)。实验证明。该算法解的性能优于其它的算法。相似文献

8.

Families of optimal fault-tolerant multiple-bus networks

Huan-Yu Tu Hawkes L.W. 《Parallel and Distributed Systems, IEEE Transactions on》2001,12(1):60-73

Optimally fault-tolerant partial-connection multiple-bus networks and their fault-tolerant routing algorithms are presented in this paper. The proposed networks are scalable and provide flexibility in the choice of network parameters determining construction cost, system performance, and fault tolerance, given a fixed number of processors. In this design, when performance begins to fall due to contention, the simple addition of a bus can improve performance without adding costly processors or changing the whole topology, as required for other multiple-bus designs. Also, in situations requiring high reliability, for a fixed number of processors, excellent fault tolerance can be obtained 相似文献

9.

Leader election problem on networks in which processor identitynumbers are not distinct

Yamashita M. Kameda T. 《Parallel and Distributed Systems, IEEE Transactions on》1999,10(9):878-887

In the networks considered in this paper, processors do not have distinct identity numbers. On such a network, we discuss the leader election problem and the problem of counting the number of processors having the same identity number. As the communication mode, we consider port-to-port, broadcast-to-port, port-to-mail box, and broadcast-to-mailbox. For each of the above communication modes, we present: an algorithm for counting the number of processors with the same identity number; an algorithm for solving the leader election problem; and a graph theoretical characterization of the solvable class for the leader election problem 相似文献

10.

Cost and time-cost effectiveness of multiprocessing

Sarkar D. 《Parallel and Distributed Systems, IEEE Transactions on》1993,4(6):704-712

Speedup and efficiency, two measures for performance of pipelined computers, are now used to evaluate performance of parallel algorithms for multiprocessor systems. However, these measures consider only the computation time and number of processors used and do not include the number of the communication links in the system. The author defines two new measures, cost effectiveness and time-cost effectiveness, for evaluating performance of a parallel algorithm for a multiprocessor system. From these two measures two characterization factors for multiprocessor systems are defined and used to analyze some well-known multiprocessor systems. It is found that for a given penalty function, every multiprocessor architecture has an optimal number of processors that produces maximum profit. If too many processors are used, the higher cost of the system reduces the profit obtained from the faster solution. On the other hand, if too few processors are used, the penalty paid for taking a longer time to obtain the solution reduces the profit 相似文献

11.

A scalable scheduling scheme for functional parallelism ondistributed memory multiprocessor systems

Pande S. Agrawal D.P. Mauney J. 《Parallel and Distributed Systems, IEEE Transactions on》1995,6(4):388-399

We attempt a new variant of the scheduling problem by investigating the scalability of the schedule length with the required number of processors, by performing scheduling partially at compile time and partially at run time. Assuming infinite number of processors, the compile time schedule is found using a new concept of the threshold of a task that quantifies a trade-off between the schedule-length and the degree of parallelism. The schedule is found to minimize either the schedule length or the number of required processors and it satisfies: A feasibility condition which guarantees that the schedule delay of a task from its earliest start time is below the threshold, and an optimality condition which uses a merit function to decide the best task-processor match for a set of tasks competing for a given processor. At run time, the tasks are merged producing a schedule for a smaller number of available processors. This allows the program to be scaled down to the processors actually available at run time. Usefulness of this scheduling heuristic has been demonstrated by incorporating the scheduler in the compiler backend for targeting Sisal (Streams and Iterations in a Single Assignment Language) on iPSC/860 相似文献

12.

On-line scheduling of scalable real-time tasks on multiprocessor systems 总被引：1，自引：0，他引：1

Wan Yeon Lee Sung Je Hong Jong Kim 《Journal of Parallel and Distributed Computing》2003,63(12):1315-1324

The computation time of scalable tasks depends on the number of processors allocated to them in multiprocessor systems. As more processors are allocated to a scalable task, the overall computation time of the task decreases but the total amount of processors’ time devoted to the execution of the task, called workload, increases due to parallel execution overhead. In this paper, we propose a task scheduling algorithm that utilizes the property of scalable tasks for on-line and real-time scheduling. In the proposed algorithm, the total workload of all scheduled tasks is reduced by managing processors allocated to the tasks as few as possible without missing their deadlines. As a result, the processors in the system have less load to execute the scheduled tasks and can execute more newly arriving tasks before their deadlines. Simulation results show that the proposed algorithm performs significantly better than the conventional algorithm based on a fixed number of processors to execute each task. 相似文献

13.

Lower bound on the number of processors and time for schedulingprecedence graphs with communication costs

Al-Mouhamed M.A. 《IEEE transactions on pattern analysis and machine intelligence》1990,16(12):1390-1401

A lower bound on the number of processors and finish time for the problem of scheduling precedence graphs with communication costs is presented. The notion of the earliest starting time of a task is formulated for the context of lower bounds. A lower bound on the completion time is proposed. A task delay which does not increase the earliest completion time of a schedule is defined. Each task can then be scheduled within a time interval without affecting the lower bound performance on the finish time. This leads to definition of a new lower bound on the number of processors required to process the task graph. A derivation of the minimum time increase over the earliest completion time is also proposed for the case of a smaller number of processors. A lower bound on the minimum number of interprocessor communication links required to achieve optimum performance is proposed. Evaluation had been carried out by using a set of 360 small graphs. The bound on the finish time deviates at most by 5% from the optimum solution in 96% of the cases and performs well with respect to the minimum number of processors and communication links 相似文献

14.

A Heterogeneous Mixed-Mode Execution Model for Massively Parallel Systems

《Journal of Parallel and Distributed Computing》1999,56(1):2-16

In this paper, we consider a massively parallel system that is composed of heterogeneous processors, that is, processors with different processing power, and that combines the advantages of the SIMD and MIMD architectures. The heterogeneous mixed-mode (HeMM) execution model is composed of two main components, which operate in the well-known SIMD and MIMD paradigms. The main computing power comes from a component that is composed of a massive number of processors and operates in a data parallel manner. The other component is composed of a few (or even one) fast processors which operate in the MIMD paradigm. The operation of a small number of processors in an MIMD paradigm has been well demonstrated through actual systems. The processors in this component add flexibility to the execution of the parallel programs such that it adjusts to the changing parallelism of the program to enhance the performance. Based on this execution model we analyze the gains in performance that is obtainable by this new system. We show that substantial performance gains can be obtained by using the HeMM system. 相似文献

15.

On Task Relocation in Two-Dimensional Meshes

《Journal of Parallel and Distributed Computing》2000,60(5):616-638

In parallel computer systems with a number of processors, external fragmentation is caused by continuous allocation and deallocation of processors to tasks which require exclusive use of several contiguous processors. With this condition, the system may not be able to find contiguous processors to be allocated to an incoming task even with a sufficient number of free processors. Relocation is an approach for alleviating this problem by reassigning the running tasks to other processors. In this paper, we examine two relocation schemes—full relocation and partial relocation scheme—for two-dimensional meshes. The full relocation scheme is desirable when the system is highly fragmented, while the partial relocation scheme is used for minimizing the number of relocated tasks. For the relocation process, we formally define and use two basic submesh movement operations—shifting and rotating. Comprehensive computer simulation reveals that the proposed schemes are beneficial when the relocation overhead is not high, which is machine dependent. 相似文献

16.

A shelf-based Lagrangian relaxation algorithm to scheduleparallelizable tasks

Monte J.D. Pattipati K.R. 《IEEE transactions on systems, man, and cybernetics. Part A, Systems and humans : a publication of the IEEE Systems, Man, and Cybernetics Society》2001,31(6):687-697

This paper generalizes classical scheduling theory by removing the restriction that only a single processor can work on a given task at a particular time. Instead, it is assumed that each task can be allocated any number of identical processors from one to the maximum number available, and that a task's completion time is a nonincreasing function of the number of processors allocated. Once started, a task must run to completion without altering the number of processors given to it. Furthermore, a task can start only when no task is currently executing. The objective considered is minimization of overall completion time (make-span) for the tasks subject to the constraint of a limited number of available processors. To approximately solve this problem, an algorithm based on Lagrangian relaxation is developed, and its performance is analyzed through extensive simulations. Approximate duality gaps range from 0 to about 60% on average and are strongly a function of problem type and size 相似文献

17.

TSA—OT：一个调度Out—Tree任务科的算法 总被引：4，自引：1，他引：4

刘振英方滨兴张毅《计算机学报》2001,24(4):390-394

对于把一个任务群调度到多个处理器的问题,人们往往只注重找到一个调度路径最短的算法,却忽略了要节省处理器。收于Out-Tree任务图代表分治算法的一大类问题,因此,文中专门针对该任务图,给出了一个基于任务复制的算法TSA－OT。它首先分配关键路径上的任务结点,然后在不改变调度长度的情况下,把非关键路径上的结点尽可能分配到已用的处理器上。并且,该算法将Out-Tree任务图中的所有通信都化为零。TSA－OT算法与近几年所提出的TDS,CPFD,DCP算法之间的比较表明,TSA－OT算法不仅调度长度最短,而且采用了更少或相当个数的处理器。相似文献

18.

The Coming Wave of Multithreaded Chip Multiprocessors

James Laudon Lawrence Spracklen 《International journal of parallel programming》2007,35(3):299-330

The performance of microprocessors has increased exponentially for over 35 years. However, process technology challenges, chip power constraints, and difficulty in extracting instruction-level parallelism are conspiring to limit the performance of future individual processors. To address these limits, the computer industry has embraced chip multiprocessing (CMP), predominately in the form of multiple high-performance superscalar processors on the same die. We explore the trade-off between building CMPs from a few high-performance cores or building CMPs from a large number of lower-performance cores and argue that CMPs built from a larger number of lower-performance cores can provide better performance and performance/Watt on many commercial workloads. We examine two multi-threaded CMPs built using a large number of processor cores: Sun’s Niagara and Niagara 2 processors. We also explore the programming issues for CMPs with large number of threads. The programming model for these CMPs is similar to the widely used programming model for symmetric multiprocessors (SMPs), but the greatly reduced costs associated with communication of data through the on-chip shared secondary cache allows for more fine-grain parallelism to be effectively exploited by the CMP. Finally, we present performance comparisons between Sun’s Niagara and more conventional dual-core processors built from large superscalar processor cores. For several key server workloads, Niagara shows significant performance and even more significant performance/Watt advantages over the CMPs built from traditional superscalar processors. 相似文献

19.

Analyzing Scalability of Parallel Algorithms and Architectures

《Journal of Parallel and Distributed Computing》1994,22(3):379-391

The scalability of a parallel algorithm on a parallel architecture is a measure of its capacity to effectively utilize an increasing number of processors. Scalability analysis may be used to select the best algorithm-architecture combination for a problem under different constraints on the growth of the problem size and the number of processors. It may be used to predict the performance of a parallel algorithm and a parallel architecture for a large number of processors from the known performance on fewer processors. For a fixed problem size, it may be used to determine the optimal number of processors to be used and the maximum possible speedup that can be obtained. The objectives of this paper are to critically assess the state of the art in the theory of scalability analysis, and to motivate further research on the development of new and more comprehensive analytical tools to study the scalability of parallel algorithms and architectures. We survey a number of techniques and formalisms that have been developed for studying scalability issues, and discuss their interrelationships. For example, we derive an important relationship between time-constrained scaling and the isoefficiency function. We point out some of the weaknesses of the existing schemes for measuring scalability, and discuss possible ways of extending them. 相似文献

20.

Performance Measures for Evaluating Algorithms for SIMD Machines

《IEEE transactions on pattern analysis and machine intelligence》1982,(4):319-331

This paper examines measures for evaluating the performance of algorithms for single instruction stream–multiple data stream (SIMD) machines. The SIMD mode of parallelism involves using a large number of processors synchronized together. All processors execute the same instruction at the same time; however, each processor operates on a different data item. The complexity of parallel algorithms is, in general, a function of the machine size (number of processors), problem size, and type of interconnection network used to provide communications among the processors. Measures which quantify the effect of changing the machine-size/problem-size/network-type relationships are therefore needed. A number of such measures are presented and are applied to an example SIMD algorithm from the image processing problem domain. The measures discussed and compared include execution time, speed, parallel efficiency, overhead ratio, processor utilization, redundancy, cost effectiveness, speed-up of the parallel algorithm over the corresponding serial algorithm, and an additive measure called "sprice" which assigns a weighted value to computations and processors. 相似文献