首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Consider an×n binary image. Given a directionD, the parallel visibility problem consists of determining for each pixel of the image the portion that is visible (i.e., not obstructed by any other black pixel of the image) in directionD from infinity. A related problem, referred to as point visibility, is to compute for each pixel the portion that is visible from a given pointp. In this paper, we deriveO(logn) time SIMD algorithms for each of these two problems on the hypercube, where one processor is assigned to every pixel of the image. Since the worst case communication distance of two processors in an 2-processor hypercube is 2 logn, it follows that both of the above algorithms are asymptotically optimal.This paper summarizes a preliminary version [Ref. 1] and short note on a possible improvement [Ref. 2] presented at the 1988IFIP WG 10.3. Working Conference on Parallel Processing and 1988Allerton Conference on Communication, Control and Computing, respectively. The first and third authors' research are partially supported by the Natural Sciences and Engineering Research Council of Canada.  相似文献   

2.
Several results related to the load balancing problem on the hypercube, the shuffle-exchange, the cube-connected cycles, and the butterfly are shown. Implications of these results for routing algorithms are also discussed. Our results include the following:
• Efficient load balancing algorithms are found for the hypercube, the shuffle-exchange, the cube-connected cycles, and the butterfly.
• Load balancing is shown to require more time on a p-processor shuffle-exchange, cube-connected cycle or butterfly than on a p-processor weak hypercube.
• Routing n packets on a p-processor hypercube can be done optimally whenever n = p1+1/k, for any fixed k > 0.
  相似文献   

3.
Hypercube interconnection networks have been receiving considerable attention in the supercomputing environment. However, the number of processors must be exactly 2r for an r-cube complete hypercube. This restriction severely limits its applicability. In this paper, we address three variant hypercube topologies with more flexibility in system sizes, the labelled hypercubes Imr, IMr, and IAr. Incomplete hypercube Imr consists of an r-cube and an m-cube complete hypercubes; Imr is composed of 2r and Σm ε M 2m nodes; IAr comes from an r-cube complete hypercube which operates in a degraded manner and allows that the missing nodes to be arbitrarily distributed. Specifically, we focus on the parallel paths routing algorithms for these three classes of incomplete hypercubes. Parallel paths between any given two nodes mean that these paths have the same source and destination nodes but with different intermediate nodes. Parallel communication is important as it will allow us to use the full bandwidth of the multiprocessors for the data transfer operation between any two nodes, and3these redundant paths can increase system fault-tolerance and communication reliability. With these parallel routing algorithms, one can use them as a criterion to design multiprocessor systems.  相似文献   

4.
The embedding of one interconnection network into another is a very important issue in the design and analysis of parallel algorithms. Through such embeddings, the algorithms originally developed for one architecture can be directly mapped to another architecture. This paper describes a new embedding method, based on matrix transformations, for optimally embedding hierarchical hypercube networks (HHNs) into the hypercube (binary n-cube). Thus, this embedding method has practical importance in enhancing the capabilities and extending the usefulness of the hypercube, since hierarchical hypercube networks have proven to be very cost-effective for a wide range of applications  相似文献   

5.
In this paper, we study the properties of the bus-based hypercube, denoted as U(n,b), which is a kind of multiple-bus networks (MBN). U(n,b) consists of 2/sup n/ processors and 2/sup b/ buses, where 0 /spl les/ b /spl les/ n - 1, and each processor is connected to either /spl lceil/(b+2)/2/spl rceil/ or /spl lceil/(b+1)/2/spl rceil/ buses. We show that the diameter of U(n,b) is /spl lceil/(b-1)/2/spl rceil/ if b /spl ges/ 2. We also present an algorithm to select the best neighbor processor via which we can obtain one shortest routing path. In U(n,b), we show that if there exist some faults, the fault diameter DF(n,b,f) /spl les/ b+1, where f is the sum of bus faults and processor faults and 0 /spl les/ f /spl les/ /spl lceil/(b+3)/2/spl rceil/. Furthermore, we also show that the bus fault diameter DB(n,b,f) /spl les/ b/-2/spl rfloor/ - 3, where 0 /spl les/ f /spl les/ /spl lceil/(b-1)/2/spl rceil/ and f is the number of bus faults. These results improve significantly the previous result that DB(n,b,f) /spl les/ b - 2f + 1, where f is the number of bus faults.  相似文献   

6.
Several commercial hypercube parallel processors with the potential to deliver massive parallelism cost-effectively have been announced recently. They open the door to a wide variety of application areas that could benefit from parallelism. Computer vision is one of these application areas. This paper develops a general model for hypercube machines, and uses it to show how vision algorithms can be executed on hypercubes. In particular, the steps in the problem of thick-film inspection are used as a concrete example. The time needed to complete a typical inspection is used to demonstrate the performance of hypercube machines. Experimental results from a hypercube machine illustrate the potential use of such machines.  相似文献   

7.
Many parallel algorithms use hypercubes as the communication topology among their processes. When such algorithms are executed on hypercube multicomputers the communication cost is kept minimum since processes can be allocated to processors in such a way that only communication between neighbor processors is required. However, the scalability of hypercube multicomputers is constrained by the fact that the interconnection cost-per-node increases with the total number of nodes. From scalability point of view, meshes and toruses are more interesting classes of interconnection topologies. This paper focuses on the execution of algorithms with hypercube communication topology on multicomputers with mesh or torus interconnection topologies. The proposed approach is based on looking at different embeddings of hypercube graphs onto mesh or torus graphs. The paper concentrates on toruses since an already known embedding, which is called standard embedding, is optimal for meshes. In this paper, an embedding of hypercubes onto toruses of any given dimension is proposed. This novel embedding is called xor embedding. The paper presents a set of performance figures for both the standard and the xor embeddings and shows that the latter outperforms the former for any torus. In addition, it is proven that for a one-dimensional torus (a ring) the xor embedding is optimal in the sense that it minimizes the execution time of a class of parallel algorithms with hypercube topology. This class of algorithms is frequently found in real applications, such as FFT and some class of sorting algorithms  相似文献   

8.
Consideration is given to the problem of mapping systolic array algorithms into efficient algorithms for a fixed-size hypercube architecture. The authors describe in detail several optimal implementations of algorithms given for one-way one- and two-dimensional systolic arrays. Since interprocessor communication is many times slower than local computation in parallel computers built to date, the problem of efficient communication is specifically addressed for these mappings. In order to validate the technique experimentally, five systolic algorithms were mapped in various ways onto a 64-node NCUBE/7 MIMD hypercube machine. The algorithms are for the following problems: the shuffle scheduling problem, finite impulse response filtering, linear context-free language recognition, matrix multiplication, and computing the Boolean transitive closure. Experimental evidence indicates that good performance is obtained for the mappings  相似文献   

9.
In this paper we describe anO(logN)-bit-step randomized algorithm for bit-serial message routing on a hypercube. The result is asymptotically optimal, and improves upon the best previously known algorithms by a logarithmic factor. The result also solves the problem of on-line circuit switching in anO(1)-dilated hypercube (i.e., the problem of establishing edge-disjoint paths between the nodes of the dilated hypercube for any one-to-one mapping).Our algorithm is adaptive and we show that this is necessary to achieve the logarithmic speedup. We generalize the Borodin-Hopcroft lower bound on oblivious routing by proving that any randomized oblivious algorithm on a polylogarithmic degree network requires at least (log2 N/log logN) bit steps with high probability for almost all permutations.This research was supported by the Defense Advanced Research Projects Agency under Contracts N00014-87-K-825 and N00014-89-J-1988, the Air Force under Contract AFOSR-89-0271, and the Army under Contract DAAL-03-86-K-0171. This work was completed while the third and fourth authors were at the Laboratory for Computer Science, Massachusetts Institute of Technology.  相似文献   

10.
Summary Let a distributed system be represented by a graphG=(V, E), whereV is the set of nodes andE is the set of communication links. A coterie is defined as a family,C, of subsets ofV such that any pair of subsets inC has at least one node in common and no subset inC contains any other subset inC. Assuming that each nodev i V (resp. linke j E) is operational with probabilityp i (resp.r j ), the availability of a coterie is defined as the probability that the operational nodes and links ofG connect all nodes in at least one subset in the coterie. Although it is computationally intractable to find an optimal coterie that maximizes availability for general graphG, we show in this paper that, ifG is a ring, either a singleton coterie or a 3-majority coterie is optimal. Therefore, for any ring, an optimal coterie can be computed in polynomial time. This result is extended to the more general graphs, in which each biconnected component is either an edge or a ring. Toshihide Ibaraki received the B.E., M.E., and Dr. E. degrees from Kyoto University, in 1963, 1965 and 1970, respectively. Since 1969, he has been with the Department of Applied Mathematics and Physics, Kyoto University, except for two and a half years from 1983 to 1985, during which time he was with Department of Computer and Information Sciences, Toyohashi University of Technology. Currently, he is a professor of Kyoto University. He has held a number of visiting appointments with University of Illinois, University of Waterloo, Simon Fraser University, Rutgers University, and others. He is the author of Enumerative Approaches to Combinatorial Optimization, Baltzer AG., a coauthor of Resource Allocation Problems: Algorithmic Approaches, MIT Press, and the author of several books in Japanese, including Algorithms and Data Structures, Shoukohdou. His interest includes algorithms, optimization, computational complexity and their applications. Hiroshi Nagamochi was born in Tokyo, on January 1, 1960. He received the B.A., M.E. and D.E. degrees from Kyoto University, in 1983, in 1985 and in 1988, respectively. From 1988 to 1990, he was with Department of Computer and Information Sciences, Toyohashi University of Technology. Currently, he is an Associate Professor in the Department of Applied Mathematics and Physics at Kyoto University. His research interests include network flow problems and graph connectivity problems. Dr. Nagamochi is a member of the Operations Research Society of Japan and the Information Processing Society. Tsunehiko Kameda received the B.E. and M.E. degrees from the University of Tokyo in 1961 and 1963, respectively, and the Ph.D. degree from Princeton University in 1968. From 1967 to 1980, he was with the Department of Electrical Engineering, University of Waterloo. Since 1981, he has been with Simon Fraser University, Burnaby, British Columbia, Canada, where he is a Professor of Computing Science and is the Director of Computer and Communications Research Laboratory. He has held a number of visiting appointments with Universities of Erlangen-Nürnberg, Bonn, Frankfurt, and Braunschweig, and Gesellschaft für Mathematik und Datenverarbeitung, all in Germany, and also with Kyoto University, Japan. Dr. Kameda is a member of the ACM. He has co-authored three books: Einführung in die Codierungstheorie, Bibliographisches Institut, Distributed Algorithms, Kindai Kagaku-Sha, and A Probabilistic Analysis of Test Response Compaction, IEEE Press. His current research interests include database systems, combinatorial algorithms, distributed computing, ATM networks, and random testing of VLSI circuits.This work was supported in part by Grant-in-Aid from the Ministry of Education, Science and Culture of Japan, and in part by grants from the Natural Sciences and Engineering Research Council of Canada, and the Advanced Systems Institute of British Columbia.  相似文献   

11.
针对超立方体结构的多处理机系统中存在故障的情况,提出了一个应用于超立方体网络的容错路由算法。该容错路由算法是基于局部信息的,只需要知道邻节点的状态,而无需知道整个网络的运行情况。对于给定的源节点和目的节点,路由算法均能够找到一条最优通路,并且可以预防死锁。模拟实验结果表明,路由算法所构造的路径长度接近于两个节点之间的最优路径长度。  相似文献   

12.
It is well known that parallelism by itself does not lead to higher speeds. This study shows how to put parallelism to best use, that is, how to find an optimal balance between communication and computation overheads for two parallel matrix algorithms. The problem graph for matrix algorithms analyzed in this paper is a two-dimensional grid (toroidal mesh) which is mapped onto a hypercube topology. To perform matrix operations on a hypercube, a matrix is partitioned into several submatrices which are stored and manipulated in the nodes. We seek to find an optimal matrix partitioning to minimize overall execution time. The NCUBE parallel machine is used for experimental performance evaluation. For matrix multiplication, we derive an exact analytical model to determine the optimal partitioning size and perform its experimental verification on the NCUBE parallel processor. For a parallel Gaussian elimination known as the balanced algorithm, we present performance measurements and an approximate analytical model for performance evaluation. Our analyses show that the optimal submatrix size is typically small and does not depend on the original matrix size.  相似文献   

13.
This paper introduces optimal batch scheduling algorithms for the problem of scheduling batches of bursts in optical burst switching networks. The problem is modelled as a job scheduling problem with identical machines. The consideration of previously scheduled bursts in the scheduling allows such modeling. Two optimal algorithms with polynomial time complexity are derived and evaluated. Results show that the algorithm that allows re-scheduling of previously scheduled bursts leads to preferred solutions.Moreover, an extended version of the JET reservation protocol is proposed for efficient handling of batches of bursts. Results obtained via simulation prove the superior performance of the BATCHOPT algorithm.  相似文献   

14.
《Parallel Computing》1987,4(1):17-31
We discuss algorithms for matrix multiplication on a concurrent processor containing a two-dimensional mesh or richer topology. We present detailed performance measurements on hypercubes with 4, 16, and 64 nodes, and analyze them in terms of communication overhead and load balancing. We show that the decomposition into square subblocks is optimal C code implementing the algorithms is available.  相似文献   

15.
A hypercube algorithm to solve the list ranking problem is presented. Let n be the length of the list, and let p be the number of processors of the hypercube. The algorithm described runs in time O(n/p) when n=Ω(p 1+ε) for any constant ε>0, and in time O(n log n/p+log3 p) otherwise. This clearly attains a linear speedup when n=Ω(p 1+ε). Efficient balancing and routing schemes had to be used to achieve the linear speedup. The authors use these techniques to obtain efficient hypercube algorithms for many basic graph problems such as tree expression evaluation, connected and biconnected components, ear decomposition, and st-numbering. These problems are also addressed in the restricted model of one-port communication  相似文献   

16.
Two parallel algorithms for finding minimum spanning forest (MSF) of a weighted undirected graph on hypercube computers, consisting of a fixed number of processors, are presented. One algorithm is suited for sparse graphs, the other for dense graphs. Our design strategy is based on successive elimination of non-MSF edges. The input graph is partitioned equally among different processors, which then repeatedly eliminate non-MSF edges and merge results to gradually construct the desired MSF of the entire graph. Low communication overhead is achieved by restricting the message-flow to between the neighboring processors in the hypercube topology. The correctness of our approach is due to a theorem which states that with total-ordered edges, if an edge of an arbitrary subgraph does not belong to its MSF, then it does not belong to the MSF of the entire graph. For a graph of n vertices and m edges, our first algorithm finds an MSF in O(m log m)/p) time using p processors for p ≤ (mlog m)/n(1+log(m/n)). The second algorithm, efficient for dense graphs, requires O(n2/p) time for pn/log n.  相似文献   

17.
Partitioned Optimal Passive Stars network, POPS(d,g), is an optical interconnection network of N processors (N=dg) with g 2 optical passive star couplers. In this network, there are g groups of d processors each and the g 2 couplers are used for connecting each group with each of the groups, including itself. In this paper, we present a technique for optimally simulating a frequently arising hypercube communication pattern on this network for all combinations of values of d and g. Specifically, we show that one-hop movements on the hypercube along the same dimension can be simulated on the POPS(d,g) network in slots for dg and in 2 slots for d=g.  相似文献   

18.
We present an adaptive fault-tolerant wormhole routing algorithm for hypercubes by using 3 virtual networks. The routing algorithm can tolerate at least n−1 faulty nodes and can route a message via a path of length no more than the shortest path plus four. Previous algorithms which achieve the same fault tolerant ability need 5 virtual networks. Simulation results are also given in this paper.  相似文献   

19.
Finding cycles in hierarchical hypercube networks   总被引:1,自引:0,他引:1  
The hierarchical hypercube network, which was proposed as an alternative to the hypercube, is suitable for building a large-scale multiprocessor system. A bipartite graph G=(V,E) is bipancyclic if it contains cycles of all even lengths ranging from 4 to |V|. In this paper, we show that the hierarchical hypercube network is bipancyclic.  相似文献   

20.
Hypercube networks offer a feasible cost-effective solution to parallel computing. Here, a large number of low-cost processors with their own local memories are connected to form an n-cube (Bn) or one of its variants; and the inter-processor communication takes place by message passing instead of shared variables. This paper addresses a constrained two-terminal reliability measure referred to as distance reliability (DR) as it considers the probability that a message can be delivered in optimal time from a given node s to a node t. The problem is equivalent to that of having an operational optimal path (not just any path) between the two nodes. In Bn, the Hamming distance between labels of s and t or H(s, t) determines the length of the optimal path between the two nodes. The shortest distance restriction guarantees optimal communication delay between processors and high link/node utilization across the network. Moreover, it provides a measure for the robustness of symmetric networks. In particular, when H(s, t) = n in Bn, DR will yield the probability of degradation in the diameter, a concept which directly relates to fault-diameter. The paper proposes two schemes to evaluate DR in Bn. The first scheme uses a combinatorial approach by limiting the number of faulty components to (2H(s, t) − 2), while the second outlines paths of length H(s, t) and, then generates a recursive closed-form solution to compute DR. The theoretical results have been verified by simulation. The discrepancy between the theoretical and simulation results is in most cases below 1% and in the worst case 4.6%.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号