首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
We present a new parallel algorithm for computing N point lagrange interpolation on an n-dimensional hypercube with total number of nodes p = 2n. Initially, we consider the case when N = p. The algorithm is extended to the case when only p (p fixed) processors are available, p < N. We assume that N is exactly divisible by p. By dividing the hypercube into subcubes of dimension two, we compute the products and sums appearing in Lagrange's formula in a novel way such that wasteful repetitions of forming products are avoided. The speed up and efficiency of our algorithm is calculated both theoretically and by simulating it over a network of PCs.  相似文献   

2.
To effectively utilize a faulty hypercube, it is often necessary to reconfigure the hypercube in such a way as to retain as many fault-free nodes as possible. This inspires us to identify maximalincomplete subcubesin a faulty hypercube, as the subcube so reconfigured is often much larger than any complete subcube obtainable and is likely to retain performance better. An efficient algorithm is first presented to find incomplete subcubes in a faulty hypercube. Three applications are then implemented on both an incomplete system and a complete one to measure their actual performance differences. Each application is mapped onto incomplete hypercubes, following the techniques developed for complete hypercubes (possibly with some modifications). Among the three applications,Gaussian eliminationtakes exactly the same mapping scheme on an incomplete system as on a complete one, whileFFTrequires some efforts to adapt it to the incomplete topology. The measured results of the three applications indicate that reconfiguring a faulty hypercube into an incomplete subcube is beneficial in practice.  相似文献   

3.
In this paper, we develop load balancing strategies for scalable high-performance parallel A* algorithms suitable for distributed-memory machines. In parallel A* search, inefficiencies such as processor starvation and search of nonessential spaces (search spaces not explored by the sequential algorithm) grow with the number of processors P used, thus restricting its scalability. To alleviate this effect, we propose a novel parallel startup phase and an efficient dynamic load balancing strategy called the quality equalizing (QE) strategy. Our new parallel startup scheme executes optimally in Θ(log P) time and, in addition, achieves good initial load balance. The QE strategy prossess certain unique quantitative and qualitative load balancing properties that enable it to significantly reduce starvation and nonessential work. Consequently, we obtain a highly scalable parallel A* algorithm with an almost-linear speedup. The startup and load balancing schemes were employed in parallel A* algorithms to solve the Traveling Salesman Problem on an nCUBE2 hypercube multicomputer. The QE strategy yields average speedup improvements of about 20-185% and 15-120% at low and intermediate work densities (the ratio of the problem size to P), respectively, over three well-known load balancing methods-the round-robin (RR), the random communication (RC), and the neighborhood averaging (NA) strategies. The average speedup observed on 1024 processors is about 985, representing a very high efficiency of 0.96. Finally, we analyze and empirically evaluate the scalability of parallel A* algorithms in terms of the isoefficiency metric. Our analysis gives (1) a Θ(P log P) lower bound on the isoefficiency function of any parallel A* algorithm, and (2) a general expression for the upper bound on the isoefficiency function of our parallel A* algorithm using the QE strategy on any topology-for the hypercube and 2-D mesh architectures the upper bounds on the isoefficiency function are found to be Θ(P log2P) and Θ(P[formula]), respectively. Experimental results validate our analysis, and also show that parallel A* search has better scalability using the QE load balancing strategy than using the RR, RC, or NA strategies.  相似文献   

4.
This paper consists of two parts. In the first one, two new algorithms for wormhole routing on the hypercube network are presented. These techniques are adaptive and are ensured to be deadlock- and livelock-free. These properties are guaranteed by using a small number of resources in the routing node. The first algorithm is adaptive and nonminimal and will be referred to as Nonminimal. In this technique, some moderate derouting is allowed in order to alleviate the potential congestion arising from highly structured communication patterns. The second algorithm, dubbed Subcubes, is adaptive and minimal, and is based on partitioning the hypercube into subcubes of smaller dimension; This technique requires only two virtual channels per physical link of the node. In the second part of the paper, a wide variety of techniques for wormhole routing in the hypercube are evaluated from an algorithmic point of view. Five partially adaptive algorithms are considered: the Hanging algorithm, the Zenith algorithm, the Hanging-Order algorithm, the Nonminimal algorithm; and the Subcubes algorithm. One oblivious algorithm, the Dimension-Order, or E-Cube routing algorithm, is also used. Finally, a Fully Adaptive Minimal algorithm is tried. A simple node model was designed and adapted to all the algorithms  相似文献   

5.
The main result of this paper is an algorithm for embeddingk-ary complete trees into hypercubes with uniform load and asymptotically optimal dilation. The algorithm is fully scalable—the dimension of the hypercube can be chosen independent of the arity and height of the tree. The embedding enables to emulate optimally both Divide&Conquer computations onk-ary complete trees where only one level of nodes is active at a time and general computations based onk-ary complete trees where all tree nodes are active simultaneously. As a special case, we get an algorithm for embedding thek-ary complete tree of given height into its optimal hypercube with load 1 and asymptotically optimal dilation.  相似文献   

6.
We consider the problem of broadcasting a message in the n-cube, Qn, equipped with wormhole switching. The communication model assumed is one-port, and the broadcasting scheme is path-based whereby, during broadcasting along a path by a node, all the nodes on that path will receive the message. The wormhole path length is m where –m's concurrently (cube-based broadcast). The second method is based on the concept of Gray codes (GCs), and at every given step, it forms the Hamiltonian path of appropriate size as the broadcast path (GC-based broadcast). It is shown that the steps required in GC-based broadcast is fewer than or equal to those needed by cube-based broadcast. Furthermore, comparison of time complexity of GC-based broadcast to the lower bound reveals that this algorithm is near-optimal, and in fact optimal in many cases. This work improves on the best algorithm developed for path-based broadcast in one-port hypercube both in complexity and in simplicity.  相似文献   

7.
We consider the neighbourhood load balancing problem. Given a network of processors and an arbitrary distribution of tasks over the network, the goal is to balance load by exchanging tasks between neighbours. In the continuous model, tasks can be arbitrarily divided and perfectly balanced state can always be reached. This is not possible in the discrete model where tasks are non-divisible. In this paper we consider the problem in a very general setting, where the tasks can have arbitrary weights and the nodes can have different speeds. Given a continuous load balancing algorithm that balances the load perfectly in \(T\) rounds, we convert the algorithm into a discrete version. This new algorithm is deterministic and balances the load in \(T\) rounds so that the difference between the average and the maximum load is at most \(2d\cdot w_{\max }\), where d is the maximum degree of the network and \(w_{\max }\) is the maximum weight of any task. For general graphs, these bounds are asymptotically lower compared to the previous results. The proposed conversion scheme can be applied to a wide class of continuous processes, including first and second order diffusion, dimension exchange, and random matching processes. For the case of identical tasks, we present a randomized version of our algorithm that balances the load up to a discrepancy of \(\mathscr {O}(\sqrt{d \log n})\) provided that the initial load on every node is large enough.  相似文献   

8.
The dimension exchange method is a distributed load balancing method for point-to-point networks. We add a parameter, called the exchange parameter, to the method to control the splitting of load between a pair of directly connected processors, and call this parameterized version the generalized dimension exchange (GDE) method. The rationale for the introduction of this parameter is that splitting the workload into equal halves does not necessarily lead to an optimal result (in terms of the convergence rate) for certain structures. We carry out an analysis of this new method, emphasizing its termination aspects and potential efficiency. Given a specific structure, one needs to determine a value to use for the exchange parameter that would lead to an optimal result. To this end, we first derive a sufficient and necessary condition for the termination of the method. We then show that equal splitting, proposed originally by others as a heuristic strategy, indeed yields optimal efficiency in hypercube structures. For chains, rings, meshes, and tori, however, optimal choices of the exchange parameter are found to be closely related to the scales of these structures. Finally, to further investigate the potential of the GDE method, we extend it to allow exchange parameters of different values to be used over the set of edges, and based on this extension, we compare the GDE method with the diffusion method.  相似文献   

9.
This paper presented a routing algorithm that finds n disjoint shortest paths from the source node s to target node d in the n-dimensional hypercube. Fault-tolerant routing over all shortest node-disjoint paths has been investigated to overcome the failure encountered during routing in hypercube networks. In this paper, we proposed an efficient approach to provide fault-tolerant routing which has been investigated on hypercube networks. The proposed approach is based on all shortest node-disjoint paths concept in order to find a fault-free shortest path among several paths provided. The proposed algorithm is a simple uniform distributed algorithm that can tolerate a large number of process failures, while delivering all n messages over optimal-length disjoint paths. However, no distributed algorithm uses acknowledgement messages (acks) for fault tolerance. So, for dealing the faults, acknowledgement messages (acks) are included in the proposed algorithm for routing messages over node-disjoint paths in a hypercube network.  相似文献   

10.
The Flexible Hypercube is a generalization of binary hypercube networks in that the number of nodes can be arbitrary in contrast to a strict power of 2. Restated, the Flexible Hypercube retains the connectivity and diameter properties of the corresponding hypercube. Although the embedding of complete binary trees in faulty hypercubes has received considerable attention, to our knowledge, no paper has demonstrated how to embed a complete binary tree in a faulty Flexible Hypercube. Therefore, this investigation presents a novel algorithm to facilitate the embedding job when the Flexible Hypercube contains faulty nodes. Of particular concern are the network structures of the Flexible Hypercube that balance the load before as well as after faults start to degrade the performance of the Flexible Hypercube. Furthermore, to obtain the replaceable node of the faulty node, 2-expansion is permitted such that up to (n – 2) faults can be tolerated with congestion 1, dilation 4 and load 1. That is, (n – 1) is the dimension of a Flexible Hypercube. Results presented herein demonstrate that embedding methods are optimized.  相似文献   

11.
We present a parallel algorithm for computing an optimal sequence alignment in efficient space. The algorithm is intended for a message-passing architecture with one-dimensional-array topology. The algorithm computes an optimal alignment of two sequences of lengthsM andN inO((M+N) 2 /P) time andO((M+N)/P) space per processor, where the number of processors isP>=max(M, N). Thus, whenP=max(M, N) it achieves linear speedup and requires constant space per processor. Some experimental results on an Intel hypercube are provided.This research was supported by NIH Grant LM05110 from the National Library of Medicine.  相似文献   

12.
Recently, several variations of the hypercube have been proposed to enhance its performance and reliability. The folded hypercube is one of these variations, in which an extra link is added to each node providing a direct connection to the node located farthest from it. In this paper, we propose a new operation mode of the folded hypercube to enhance its performance and fault-tolerance. There are (kn+1) regular k-cubes within a folded hypercube of dimension n, denoted by FQn. We introduce another type of hypercube, called the twisted hypercube, to improve the performance and fault tolerance of the folded hypercube. The problems of finding a subcube of given size in an FQn and routing messages within the subcube are addressed for the proposed operation mode. The advantages of the proposed operation mode over the regular-hypercube operation mode are analyzed in terms of dependability and robustness. The proposed operation mode is shown to make significant improvements over the regular-hypercube operation mode in both dependability and robustness. Because the new operation mode can be applied to only an (n-1)-subcube level for a given FQn, we present general form of folded hypercube, thus enhancing the availability of subcubes of any dimension m相似文献   

13.
In this paper, we use the regular distribution method to design a perfect load balancing algorithm for an n-star with a maximum error of 1 and a time complexity of 3n(n+1). This algorithm is based on the novel notion of leader trees. A second algorithm proposed in this paper as an enhancement to our first algorithm and uses an arbitrary spanning tree as the leader tree and has a worst time complexity of 2.25n 2−3n+0.75. We also discuss the issue of dynamically selecting the leader tree and hybrid load balancing algorithms in general. Furthermore, we present a hybrid algorithm for load balancing on the star interconnection network which benefits from a diffusion load balancing preprocessing phase and shows a smaller mean time complexity than our two first algorithms.  相似文献   

14.
An incomplete hypercube possesses virtually every advantage of complete hypercubes, including simple deadlock-free routing, a small diameter, bounded link traffic density, a good support of parallel algorithms, and so on. It is natural to reconfigure a faulty hypercube into a maximum incomplete cube so as to lower potential performance degradation, because a hypercube so reconfigured often results in a much larger system than what is attainable according to any conventional reconfiguration scheme which identifies only complete subcubes. A maximum incomplete subcube involves one maximum complete subcube, plus certain smaller complete subcubes, and, thus, may accommodate multiple jobs of different sizes simultaneously, delivering a higher performance level. This paper proposes an efficient approach for identifying all the maximum incomplete subcubes present in a faulty hypercube. The proposed approach is on the basis of manipulating Boolean expressions, with the search space reduced considerably by taking advantage of the basic properties of faulty hypercubes during expression manipulation. It is distributed, in that every healthy node executes the same identification algorithm independently, at the same time, it is confirmed by fault simulation that our approach indeed gives rise to significantly larger reconfigured systems and requires short execution times  相似文献   

15.
An optimal tree contraction algorithm for the boolean hypercube and the constant-degree hypercubic networks, such as the shuffle exchange or the butterfly network, is presented. The algorithm is based on novel routing techniques and, for certain small subtrees, simulates optimal PRAM algorithms. For trees of size n, stored on a p processor hypercube in in-order, the running time of the algorithm is . The resulting speed-up of is optimal due to logarithmic communication overhead, as shown by a corresponding lower bound. The same algorithmic ingredients can also be used to solve the term matching problem, one of the fundamental problems in logic programming. Received August 10, 1994; revised May 2, 1995.  相似文献   

16.
17.
A compression algorithm is constructed which allows one to reduce an arbitrary realization of a finite sequence of matrices M1,…, Mr to a minimal partial realization of M1,…, Mr. Furthermore, a state space criterion of minimality of a partial realization and a formula for the minimal state space dimension are obtained.  相似文献   

18.
We present optimal embeddings of three genres of butterfly-like graphs in the (boolean) hypercube; each embedding is specified via a linear-time algorithm. Our first embedding finds an instance of the FFT graph as a subgraph of the smallest hypercube that is big enough to hold it; thus, we embed then-level FFT graph, which has (n+1)2 n vertices, in the (n+log2(n+1))-dimensional hypercube, with unit dilation. This embedding yields a mapping of the pipelined FFT algorithm on the hypercube architecture, which is optimal in all resources (time, processor utilization, load balancing, etc.) and which is on-line in the sense that inputs can be added to the transform even during the computation. Second, we find optimal embeddings of then-level butterfly graph and then-level cube-connected cycles graph, each of which hasn2 n vertices, in the (n+log2 n)-dimensional hypercube. These embeddings, too, have optimal dilation, congestion, and expansion. The dilation is 1+(n mod 2), which is best possible. Our embeddings indicate that these two bounded-degree approximations to the hypercube do not have any communication power that is not already present in the hypercube.The research of D. S. Greenberg was supported in part by NSF Grant MIP-86-01885. The research of L. S. Heath was supported in part by NSF Grant DCI-85-04308. The research of A. L. Rosenberg was supported in part by NSF Grants DCI-85-04308, DCI-87-96236, and CCR-88-12567.  相似文献   

19.
The token distribution (TD) problem, an abstract static variant of load balancing, is defined as follows: letM be a (parallel processor) network with setP of processors. Initially, each processorP P has a certain amountl(P) of tokens. The goal of a TD algorithm, run onM, is to distribute the tokens evenly among the processors. In this paper we introduce the notion of strongly adaptive TD algorithms, i.e., algorithms whose running times come close to the best possible runtime, the off-line complexity of the TD problem, for each individual initial token distributionl. Until now, only weakly adaptive algorithms have been considered, where the running time is measured in terms of the maximum initial load max{l(P)P P}.We design an almost optimal, strongly adaptive algorithm on mesh-connected networks of arbitrary dimension extended by a single 1-bit bus. This result shows that an on-line TD algorithm can come close to the optimal (off-line) bound for each individual initial load. Furthermore, we exactly characterize the off-line complexity of arbitrary initial token distributions on arbitrary networks. As an intermediate result, we design almost optimal weakly adaptive algorithms for TD on mesh-connected networks of arbitrary dimension.This research was partially supported by DFG-Forschergruppe Effiziente Nutzung massiv paralleler Systeme, Teilprojekt 4, by the ESPRIT Basic Research Action No. 7141 (ALCOM II), and by the Volkswagen-stiftung. A preliminary version was presented at the 20th ICALP, 1993, see [9].  相似文献   

20.
Parallelizing the Data Cube   总被引:1,自引:0,他引:1  
This paper presents a general methodology for the efficient parallelization of existing data cube construction algorithms. We describe two different partitioning strategies, one for top-down and one for bottom-up cube algorithms. Both partitioning strategies assign subcubes to individual processors in such a way that the loads assigned to the processors are balanced. Our methods reduce inter processor communication overhead by partitioning the load in advance instead of computing each individual group-by in parallel. Our partitioning strategies create a small number of coarse tasks. This allows for sharing of prefixes and sort orders between different group-by computations. Our methods enable code reuse by permitting the use of existing sequential (external memory) data cube algorithms for the subcube computations on each processor. This supports the transfer of optimized sequential data cube code to a parallel setting.The bottom-up partitioning strategy balances the number of single attribute external memory sorts made by each processor. The top-down strategy partitions a weighted tree in which weights reflect algorithm specific cost measures like estimated group-by sizes. Both partitioning approaches can be implemented on any shared disk type parallel machine composed of p processors connected via an interconnection fabric and with access to a shared parallel disk array.We have implemented our parallel top-down data cube construction method in C++ with the MPI message passing library for communication and the LEDA library for the required graph algorithms. We tested our code on an eight processor cluster, using a variety of different data sets with a range of sizes, dimensions, density, and skew. Comparison tests were performed on a SunFire 6800. The tests show that our partitioning strategies generate a close to optimal load balance between processors. The actual run times observed show an optimal speedup of p.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号