共查询到20条相似文献,搜索用时 15 毫秒
1.
de Cerio L.D. Valero-Garcia M. Gonzalez A. 《Parallel and Distributed Systems, IEEE Transactions on》2002,13(12):1247-1260
A new methodology named CALMANT (CC-cube Algorithms on Meshes and Tori) for mapping a type of algorithm that we call CC-cube algorithm onto multicomputers with hypercube, mesh, or torus interconnection topology is proposed. This methodology is suitable when the initial problem can be expressed as a set of processes that communicate through a hypercube topology (a CC-cube algorithm). There are many important algorithms that fit into the CC-cube type. CALMANT is based on three different techniques: (a) the standard embedding to assign the processes of the algorithm to the nodes of the mesh multicomputer; (b) the communication pipelining technique to increase the level of communication parallelism inherent in the CC-cube algorithms; and (c) optimal message-scheduling algorithms proposed in this work in order to avoid conflicts and minimizing in this way the communication time. Although CALMANT is proposed for multicomputers with different interconnection network topologies, the paper only focuses on the particular case of meshes. 相似文献
2.
In non-contiguous allocation, a job request can be split into smaller parts that are allocated possibly non-adjacent free sub-meshes rather than always waiting until a single sub-mesh of the requested size and shape is available. Lifting the contiguity condition is expected to reduce processor fragmentation and increase system utilization. However, the distances traversed by messages can be long, and as a result the communication overhead, especially contention, is increased. The extra communication overhead depends on how the allocation request is partitioned and assigned to free sub-meshes. In this paper, a new non-contiguous processor allocation strategy, referred to as Greedy-Available-Busy-List, is suggested for the 2D mesh network. Request partitioning in our suggested strategy is based on the sub-meshes available for allocation. To evaluate the performance improvement achieved by our strategy and compare it against well-known existing non-contiguous and contiguous strategies, we conduct extensive simulation runs under the assumption of wormhole routing and three communication patterns, notably one-to-all, all-to-all and random. The results show that the new strategy can reduce the communication overhead and substantially improve performance in terms of job turnaround time and system utilization. 相似文献
3.
A new approach for dynamic job scheduling in mesh-connected multiprocessor systems, which supports a multiuser environment, is proposed in this paper. Our approach combines a submesh reservation policy with a priority-based scheduling policy to obtain high performance in terms of high throughput, high utilization, and low turn-around times for jobs. This high performance is achieved at the expense of scheduling jobs in a strictly fair, FCFS fashion; in fact, the algorithm is parameterized to allow trade-offs between performance and (short-term) POPS fairness. The proposed scheduler can be used with any submesh allocation policy. A fast and efficient implementation of the proposed scheduler has also been presented. The performance of the proposed scheme has been compared with the FCFS policy, the only existing scheduling strategy for meshes, to demonstrate the effectiveness of the proposed approach. Simulation results indicate that our scheduling strategy outperforms the FCFS policy significantly. Specifically, our strategy significantly reduces the average waiting delay of jobs over the FCFS policy. The fast implementation of the proposed scheduler results in low allocation and deallocation time overhead, as well as low space overhead 相似文献
4.
We propose a fault-tolerant tree-based multicast algorithm for 2-dimensional (2-D) meshes based on the concept of the extended safety level which is a vector associated with each node to capture fault information in the neighborhood.In this approach each destination is reached through a minimum number of hops,In order to minimize the total number of traffic steps,three heuristic strategies are proposed.This approach can be easily implemented by pipelined circuit switching(PCS).A simulation study is conducted to measure the total number of traffic steps under different strategies.Our approach is the first attempt to address the faulttolerant tree-based multicast strategies.Our approach is the first attempt to address the faulttolerant tree-based multicast problem in 2-D meshes based on limited global information with a simple model and succinct information. 相似文献
5.
V. Singh V. Kumar G. Agha C. Tomlinson 《International journal of parallel programming》1991,20(2):95-131
We present two new parallel algorithms QSP1 and QSP2 based on sequential quicksort for sorting data on a mesh multicomputer, and analyze their scalability using the isoefficiency metric. We show that QSP2 matches the lower bound on the isoefficiency function for mesh multicomputers, while QSP1 is fairly close to optimal. Langet al.
(1) and Schnorret al.
(2) have developed parallel sorting algorithms for the mesh architecture that have either optimal (Schnorr) or close to optimal (Lang) run-time complexity for the one-element-perprocessor case. Both QSP1 and QSP2 have better scalability than the scaled-down variants of these algorithms (for the case in which there are more elements than processors). We also analyze a different variant of Lang's sort which is as scalable as QSP2. We briefly discuss another metric called resource consumption. According to this metric, both QSP1 and QSP2 are superior to variants of Lang's sort. 相似文献
6.
《Journal of Systems Architecture》2004,50(4):177-192
In a mesh multicomputer, performing jobs needs to schedule submeshes according to some processor allocation scheme. In order to assign the incoming jobs to a free submesh, a task compaction scheme is needed to generate a larger contiguous free region. The overhead of compaction depends on the efficiency of the task migration scheme. In this paper, two simple task migration schemes are first proposed in n-dimensional mesh multicomputers with supporting dimension-ordered wormhole routing in one-port communication model. Then, a hybrid scheme which combines advantages of the two schemes is discussed. Finally, we evaluate the performance of all of these proposed approaches. 相似文献
7.
Xiaola Lin McKinley P.K. Ni L.M. 《Parallel and Distributed Systems, IEEE Transactions on》1994,5(8):793-804
Multicast communication services, in which the same message is delivered from a source node to an arbitrary number of destination nodes, are being provided in new-generation multicomputers. Broadcast is a special case of multicast in which a message is delivered to all nodes in the network. The nCUBE-2, a wormhole-routed hypercube multicomputer, provides hardware support for broadcast and a restricted form of multicast in which the destinations form a subcube. However, the broadcast routing algorithm adopted in the nCUBE-2 is not deadlock-free. In this paper, four multicast wormhole routing strategies for 2-D mesh multicomputers are proposed and studied. All of the algorithms are shown to be deadlock-free. These are the first deadlock-free multicast wormhole routing algorithms ever proposed. A simulation study has been conducted that compares the performance of these multicast algorithms under dynamic network traffic conditions in a 2-D mesh. The results indicate that a dual-path routing algorithm offers performance advantages over tree-based, multipath, and fixed-path algorithms 相似文献
8.
Comparative evaluation of contiguous allocation strategies on 3D mesh multicomputers 总被引:1,自引:0,他引:1
S. Bani-Mohammad Author Vitae M. Ould-Khaoua Author Vitae I. Ababneh Author Vitae Lewis M. Mackenzie Author Vitae 《Journal of Systems and Software》2009,82(2):307-318
The performance of contiguous allocation strategies can be significantly affected by the type of the distribution adopted for job execution times. In this paper, the performance of the existing contiguous allocation strategies for 3D mesh multicomputers is re-visited in the context of heavy-tailed distributions (e.g., a Bounded Pareto distribution). The strategies are evaluated and compared using simulation experiments for both First-Come-First-Served (FCFS) and Shortest-Service-Demand (SSD) scheduling strategies under a variety of system loads and system sizes. The results show that the performance of the allocation strategies degrades considerably when job execution times follow a heavy-tailed distribution. Moreover, SSD copes much better than FCFS scheduling strategy in the presence of heavy-tailed job execution times. The results also reveal that allocation strategies that employ a list of allocated sub-meshes for both allocation and de-allocation exhibit low allocation overhead, and maintain good system performance in terms of average turnaround time and mean system utilization. 相似文献
9.
In a mesh multicomputer, submeshes are allocated to perform jobs according to processor allocation schemes, with each task assigned to occupy processors of one submesh with an appropriate size. To assign regions for incoming tasks, task compaction is needed to produce a large contiguous free region. The overhead of task compaction relies mainly on designing an efficient task migration scheme. This paper investigates task migration schemes in 2D wormhole-routed mesh multicomputers with an all-port communication model. Two constraints are given between two submeshes for task migration. Two task migration schemes that follow one of the constraints in 2D mesh multicomputers are then developed. In addition, the proposed schemes are proven to be deadlock-free and congestion-free. Finally, performance analysis is adopted to compare the proposed task migration schemes. 相似文献
10.
Corbett P.F. Scherson I.D. 《Parallel and Distributed Systems, IEEE Transactions on》1992,3(5):626-632
A sorting algorithm, dubbed MeshSort, for multidimensional mesh-connected multiprocessors is introduced. Bitonic Sort and ShearSort are shown to be special cases of MeshSort. MeshSort thus provides some insight into the operation of parallel sorting. It requires operations only along orthogonal vectors of processors, simplifying the control of the multiprocessor. This allows MeshSort to be used on any reduced architecture where a multidimensional memory structure is interconnected with a lower dimensional structure of processors. A modified version of MeshSort, called FastMeshSort, is presented. This algorithm applies the same basic principle as MeshSort, and is almost as simple to implement, but achieves much better performance. The modified algorithm is shown to be very efficient for reasonably sized meshes. FastMeshSort is presented as a practical sorting and routing algorithm for real multidimensional mesh-connected multiprocessors. The algorithms can easily be extended to other multiprocessor structures 相似文献
11.
Allocating submeshes to jobs in mesh-connected multicomputers in a FCFS fashion can lead to poor system performance (e.g., long job waiting delays) because the job at the head of the waiting queue can prevent the allocation of free submeshes to other waiting jobs with smaller submesh requirements. However, serving jobs aggressively out-of-order can lead to excessive waiting delays for jobs with large allocation requests. In this paper, we propose a scheduling scheme that uses a window of consecutive jobs from which it selects jobs for allocation and execution. This window starts with the current oldest waiting job and corresponds to the lookahead of the scheduler. The performance of the proposed window-based scheme has been compared to that of FCFS and other previous job scheduling schemes. Extensive simulation results based on synthetic workloads and real workload traces indicate that the new scheduling strategy exhibits good performance when the scheduling window size is large. In particular, it is substantially superior to FCFS in terms of system utilization, average job turnaround times, and maximum waiting delays under medium to heavy system loads. Also, it is superior to aggressive out-of-order scheduling in terms of maximum job waiting delays. Window-based job scheduling can improve both overall system performance and fairness (i.e., maximum job waiting delays) by adopting large lookahead job scheduling windows. 相似文献
12.
Po-Jen Chuang Nian-Feng Tzeng 《Parallel and Distributed Systems, IEEE Transactions on》1994,5(2):211-217
We propose a new processor allocation strategy that applies to any mesh system and recognizes submeshes of arbitrary sizes at any locations in a mesh system. The proposed strategy allocates a submesh of exactly the size requested by an incoming task, completely avoiding internal fragmentation. Because of its efficient allocation, this strategy exhibits better performance than an earlier allocation strategy based on the buddy principle. An efficient implementation of this strategy is presented. Extensive simulation runs are carried out to collect experimental cost and performance measures of interest under different allocation schemes 相似文献
13.
The inadequacies of conventional parallel languages for programming multicomputers are identified. The C* language is briefly reviewed, and a compiler that translates C* programs into C programs suitable for compilation and execution on a hypercube multicomputer is presented. Results illustrating the efficiency of executing data-parallel programs on a hypercube multicomputer are reported. They show the speedup achieved by three hand-compiled C* programs executing on an N-Cube 3200 multicomputer. The first two programs, Mandelbrot set calculation and matrix multiplication, have a high degree of parallelism and a simple control structure. The C* compiler can generate relatively straightforward code with performance comparable to hand-written C code. Results for a C* program that performs Gaussian elimination with partial pivoting are also presented and discussed 相似文献
14.
Parallel 2-d convolution on a mesh connected array processor 总被引:2,自引:0,他引:2
In this correspondence, a parallel 2-D convolution scheme is presented. The processing structure is a mesh connected array processor consisting of the same number of simple processing elements as the number of pixels in the image. For most windows considered, the number of computation steps required is the same as that of the coefficients of a convolution window. The proposed scheme can be easily extended to convolution windows of arbitrary size and shape. The basic idea of the proposed scheme is to apply the 1-D systolic concept to 2-D convolution on a mesh structure. The computation is carried out along a path called a convolution path in a systolic manner. The efficiency of the scheme is analyzed for windows of various shapes. The ideal convolution path is a Hamiltonian path ending at the center of the window, the length of which is equal to the number of window coefficients. The simple architecture and control strategy make the proposed scheme suitable for VLSI implementation. 相似文献
15.
Efficient algorithms to compute the Hough transform on MIMD and SIMD hypercube multicomputer are developed. Our algorithms can compute p angles of the Hough transform of an N × N image, p N, in 0(p + log N) time on both MIMD and SIMD hypercubes. These algorithms require 0(N
2) processors. We also consider the computation of the Hough transform on MIMD hypercubes with a fixed number of processors. Experimental results on an NCUBE/7 hypercube are presented.This research was supported by the National Science Foundation under grants DCR84-20935 and 86-17374. All correspondence should be mailed to Sanjay Ranka. 相似文献
16.
The submesh allocation problem is to recognize and locate a free submesh that can accommodate a request for a submesh of a specified size. In this paper, we propose a new best-fit submesh allocation strategy for mesh-connected multiprocessor systems. The proposed strategy maintains and uses a free submesh list for an efficient allocation. For an allocation request, the strategy selects the best-fit submesh which causes the least amount of potential processor fragmentation. As many large free submeshes as possible are preserved for later allocations. For this purpose, we introduce a novel function quantifying the degree of potential fragmentation of submeshes. The proposed strategy has the capability of recognizing a complete submesh. We also propose an allocation strategy for faulty meshes which can maintain and allocate virtual submeshes derived from faulty submeshes. Extensive simulation is carried out to compare the proposed strategy with previous strategies. The proposed strategy has the best performance: a 6-50 percent improvement over the previous best strategy 相似文献
17.
The heterogeneous nature of data types and computational structures involved in Computer Vision algorithms make the design
and implementation of massively parallel image processing systems a not yet fully solved problem. It is common belief that
in the next future MIMD architectures with their high degree of flexibility will play a very important role in this research
area, by using a limited number of identical but powerful processing elements. The aim of this paper is to show how a selected
list of algorithms in which a unique Image Understanding process can be decomposed could map onto a distributed-memory MIMD
architecture. The operative modalities we adopt are the SPMD modality for the low level processing and the MIMD modality for
the intermediate and high levels of processing. Either efficient parallel formulations of the algorithms with respect to the
interconnection topology of processors and their optimized implementations on a target transputer-based architecture are reported. 相似文献
18.
Multiple processor systems are an integral part of today's high-performance computing environment. Such systems are often configured as a two-dimensional grid of processors called a mesh. Tasks compete for rectangular submeshes of this mesh. The choice of submesh allocation strategy can significantly affect the level of processor utilization and a task's waiting time. In addition, the execution speed of various allocation algorithms varies widely, which can further affect system performance. This paper describes and categorizes several submesh allocation strategies, including a previously unreported method that is superior to other methods in terms of execution speed. The paper includes results of simulation studies used to compare the performance characteristics of the most efficient allocation strategies in each category. 相似文献
19.
N.H. Gehani 《Computer Languages, Systems and Structures》1982,7(1):21-23
Implementation of Ada's parallel tasks on a multicomputer architecture requires additional communication and naming overhead because tasks can operate on shared data via global variables and pointers. This increases the complexity of implementing Ada and has a negative impact on program understandability. 相似文献
20.
The Tiny, CSN, Multiple Rings, and Ordered Dimensions, and interval labeling routing systems for transputer networks are reviewed. The systems are compared with respect to several criteria, such as adaptivity, deadlock freedom, generality, livelock freedom, and network latency 相似文献