首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
This paper gives hypercube algorithms for some simple problems involving geometric properties of sets of points. The properties considered emphasize aspects of convexity and domination. Efficient algorithms are given for both fine- and medium-grain hypercube computers, including a discussion of implementation, running times and results on an Intel iPSC hypercube, as well as theoretical results. For both serial and parallel computers, sorting plays an important role in geometric algorithms for determining simple properties, often being the dominant component of the running time. Since the time required to sort data on a hypercube computer is still not fully understood, the running times of some of our algorithms for unsorted data are not completely determined. For both the fine- and medium-grain models, we show that faster expected-case running time algorithms are possible for point sets generated randomly. Our algorithms are developed for sets of planar points, with several of them extending to sets of points in spaces of higher dimension.The research of E. Cohen, R. Miller, and E. M. Sarraf was partially supported by National Science Foundation Grant ASC-8705104. R. Miller was also partially supported by National Science Foundation Grants DCR-8608640 and IRI-8800514. Q. F. Stout's research was partially supported by National Science Foundation Grant DCR-85-07851, and an Incentives for Excellence Grant from the Digital Equipment Corporation.  相似文献   

2.
3.
Two parallel algorithms for finding minimum spanning forest (MSF) of a weighted undirected graph on hypercube computers, consisting of a fixed number of processors, are presented. One algorithm is suited for sparse graphs, the other for dense graphs. Our design strategy is based on successive elimination of non-MSF edges. The input graph is partitioned equally among different processors, which then repeatedly eliminate non-MSF edges and merge results to gradually construct the desired MSF of the entire graph. Low communication overhead is achieved by restricting the message-flow to between the neighboring processors in the hypercube topology. The correctness of our approach is due to a theorem which states that with total-ordered edges, if an edge of an arbitrary subgraph does not belong to its MSF, then it does not belong to the MSF of the entire graph. For a graph of n vertices and m edges, our first algorithm finds an MSF in O(m log m)/p) time using p processors for p ≤ (mlog m)/n(1+log(m/n)). The second algorithm, efficient for dense graphs, requires O(n2/p) time for pn/log n.  相似文献   

4.
A hypercube algorithm to solve the list ranking problem is presented. Let n be the length of the list, and let p be the number of processors of the hypercube. The algorithm described runs in time O(n/p) when n=Ω(p 1+ε) for any constant ε>0, and in time O(n log n/p+log3 p) otherwise. This clearly attains a linear speedup when n=Ω(p 1+ε). Efficient balancing and routing schemes had to be used to achieve the linear speedup. The authors use these techniques to obtain efficient hypercube algorithms for many basic graph problems such as tree expression evaluation, connected and biconnected components, ear decomposition, and st-numbering. These problems are also addressed in the restricted model of one-port communication  相似文献   

5.
6.
It is well known that parallelism by itself does not lead to higher speeds. This study shows how to put parallelism to best use, that is, how to find an optimal balance between communication and computation overheads for two parallel matrix algorithms. The problem graph for matrix algorithms analyzed in this paper is a two-dimensional grid (toroidal mesh) which is mapped onto a hypercube topology. To perform matrix operations on a hypercube, a matrix is partitioned into several submatrices which are stored and manipulated in the nodes. We seek to find an optimal matrix partitioning to minimize overall execution time. The NCUBE parallel machine is used for experimental performance evaluation. For matrix multiplication, we derive an exact analytical model to determine the optimal partitioning size and perform its experimental verification on the NCUBE parallel processor. For a parallel Gaussian elimination known as the balanced algorithm, we present performance measurements and an approximate analytical model for performance evaluation. Our analyses show that the optimal submatrix size is typically small and does not depend on the original matrix size.  相似文献   

7.
Data distribution has been one of the most important research topics in parallelizing compilers for distributed memory parallel computers. Good data distribution schema should consider both the computation load balance and the communication overhead. In this paper, we show that data redistribution is necessary for executing a sequence of Do-loops if the communication cost due to performing this sequence of Do-loops is larger than a threshold value. Based on this observation, we can prune the searching space and derive efficient dynamic programming algorithms for determining effective data distribution schema to execute a sequence of Do-loops with a general structure. Experimental studies on a 32-node nCUBE-2 computer are also presented  相似文献   

8.
The security of the RSA cryptosystems is based on the difficulty of factoring a large composite integer. In 1994, Shor showed that factoring a large composite is executable in polynomial time if we use a quantum Turing machine. Since this algorithm is complicated, straightforward implementations seem impractical judging from current technologies. In this paper, we propose simple and efficient algorithms for factoring and discrete logarithm problem based on NMR quantum computers. Our algorithms are easier to implement if we consider NMR quantum computers with small qubits. A part of this work was done while both authors were with NTT Communication Science Laboratories. Noboru Kunihiro, Ph.D.: He is Assistant Professor of the University of Electro-Communications. He received his B.E., M.E. and Ph.D. in mathematical engineering and information physics from the University of Tokyo in 1994, 1996 and 2001, respectively. He had been engaged in the research on cryptography and information security at NTT Communication Science Laboratories from 1996 to 2002. Since 2002, he has been working for Department of Information and Communication Engineering of the University of Elector-Communications. His research interest includes cryptography, information security and quantum computations. He was awarded the SCIS’97 paper prize. Shigeru Yamashita, Ph.D.: Associate Professor of Graduate School of Information Science, Nara Institute of Science and Technology, Nara 630-0192, Japan. He received his B.E., M.E. and Ph.D. degrees in information science from Kyoto University, Kyoto, Japan, in 1993, 1995 and 2001, respectively. His research interests include new type of computer architectures and quantum computation. He received the 2000 IEEE Circuits and Systems Society Transactions on Computer-Aided Design of Integrated Circuits and Systems Best Paper Award.  相似文献   

9.
Several commercial hypercube parallel processors with the potential to deliver massive parallelism cost-effectively have been announced recently. They open the door to a wide variety of application areas that could benefit from parallelism. Computer vision is one of these application areas. This paper develops a general model for hypercube machines, and uses it to show how vision algorithms can be executed on hypercubes. In particular, the steps in the problem of thick-film inspection are used as a concrete example. The time needed to complete a typical inspection is used to demonstrate the performance of hypercube machines. Experimental results from a hypercube machine illustrate the potential use of such machines.  相似文献   

10.
An efficient processor allocation policy is presented for hypercube computers. The allocation policy is called free list since it maintains a list of free subcubes available in the system. An incoming request of dimension k (2k nodes) is allocated by finding a free subcube of dimension k or by decomposing an available subcube of dimension greater than k. This free list policy uses a top-down allocation rule in contrast to the bottom-up approach used by the previous bit-map allocation algorithms. This allocation scheme is compared to the buddy, gray code (GC), and modified buddy allocation policies reported for the hypercubes. It is shown that the free list policy is optimal in a static environment, as are the other policies, and it also gives better subcube recognition ability compared to the previous schemes in a dynamic environment. The performance of this policy, in terms of parameters such as average delay, system utilization, and time complexity, is compared to the other schemes to demonstrate its effectiveness. The extension of the algorithm for parallel implementation, noncubic allocation, and inclusion/exclusion allocation is also given  相似文献   

11.
A unified analytical model for computing the task-based dependability (TDB) of hypercube architectures is presented. A hypercube is deemed operational as long as a task can be executed on the system. The technique can compute both reliability and availability for two types of task requirements-I-connected model and subcube model. The I-connected TBD assumes that a connected group of at least I working nodes is required for task execution. The subcube TBD needs at least an m-cube in an n-cube, mn, for task execution. The dependability is computed by multiplying the probability that x nodes (xI or x⩾2m) are working in an n-cube at time t by the conditional probability that the hypercube can satisfy any one of the two task requirements from x working nodes. Recursive models are proposed for the two types of task requirements to find the connection probability. The subcube requirement is extended to find multiple subcubes for analyzing multitask dependability. The analytical results are validated through extensive simulation  相似文献   

12.
13.
The adaptation of the Cooley—Tukey, the Pease and the Stockham FFT's to vector computers is discussed. Each of these algorithms computes the same result namely, the discrete Fourier transform. They differ only in the way that intermediate computations are stored. Yet it is this difference that makes one or the other more appropriate depending on the application. This difference also influences the computational efficiency on a vector computer and motivates the development of methods to improve efficiency. Each of the FFT's is defined rigorously by a short expository FORTRAN program which provides the basis for discussions about vectorization. Several methods for lengthening vectors are discussed, including the case of multiple and multi-dimensional transforms where M sequences of length N can be transformed as a single sequence of length MN using a ‘truncated’ FFT. The implementation of an in place FFT on a computer with memory-to-memory architecture is made possible by in place matrix-vector multiplication.  相似文献   

14.
This paper introduces a new hierarchy of cube-based interconnection schemes, called the base-b cube (which properly contains the well-known binary cube), for the design of parallel computers. This hierarchy admits a recursive definition and allows many more reconfigurations than are possible with the binary cube. Our analysis addresses the inherent cost-delay trade-off for this hierarchy along with a number of related topological properties such as sparsity, diameter, existence of node disjoint paths, and odd and even cycles. Embeddings of standard interconnection schemes including linear and two-dimensional arrays, rings, and complete binary trees in a base-b cube are illustrated.  相似文献   

15.
Hypercube interconnection networks have been receiving considerable attention in the supercomputing environment. However, the number of processors must be exactly 2r for an r-cube complete hypercube. This restriction severely limits its applicability. In this paper, we address three variant hypercube topologies with more flexibility in system sizes, the labelled hypercubes Imr, IMr, and IAr. Incomplete hypercube Imr consists of an r-cube and an m-cube complete hypercubes; Imr is composed of 2r and Σm ε M 2m nodes; IAr comes from an r-cube complete hypercube which operates in a degraded manner and allows that the missing nodes to be arbitrarily distributed. Specifically, we focus on the parallel paths routing algorithms for these three classes of incomplete hypercubes. Parallel paths between any given two nodes mean that these paths have the same source and destination nodes but with different intermediate nodes. Parallel communication is important as it will allow us to use the full bandwidth of the multiprocessors for the data transfer operation between any two nodes, and3these redundant paths can increase system fault-tolerance and communication reliability. With these parallel routing algorithms, one can use them as a criterion to design multiprocessor systems.  相似文献   

16.
In this paper we describe anO(logN)-bit-step randomized algorithm for bit-serial message routing on a hypercube. The result is asymptotically optimal, and improves upon the best previously known algorithms by a logarithmic factor. The result also solves the problem of on-line circuit switching in anO(1)-dilated hypercube (i.e., the problem of establishing edge-disjoint paths between the nodes of the dilated hypercube for any one-to-one mapping).Our algorithm is adaptive and we show that this is necessary to achieve the logarithmic speedup. We generalize the Borodin-Hopcroft lower bound on oblivious routing by proving that any randomized oblivious algorithm on a polylogarithmic degree network requires at least (log2 N/log logN) bit steps with high probability for almost all permutations.This research was supported by the Defense Advanced Research Projects Agency under Contracts N00014-87-K-825 and N00014-89-J-1988, the Air Force under Contract AFOSR-89-0271, and the Army under Contract DAAL-03-86-K-0171. This work was completed while the third and fourth authors were at the Laboratory for Computer Science, Massachusetts Institute of Technology.  相似文献   

17.
18.
19.
Consider an×n binary image. Given a directionD, the parallel visibility problem consists of determining for each pixel of the image the portion that is visible (i.e., not obstructed by any other black pixel of the image) in directionD from infinity. A related problem, referred to as point visibility, is to compute for each pixel the portion that is visible from a given pointp. In this paper, we deriveO(logn) time SIMD algorithms for each of these two problems on the hypercube, where one processor is assigned to every pixel of the image. Since the worst case communication distance of two processors in an 2-processor hypercube is 2 logn, it follows that both of the above algorithms are asymptotically optimal.This paper summarizes a preliminary version [Ref. 1] and short note on a possible improvement [Ref. 2] presented at the 1988IFIP WG 10.3. Working Conference on Parallel Processing and 1988Allerton Conference on Communication, Control and Computing, respectively. The first and third authors' research are partially supported by the Natural Sciences and Engineering Research Council of Canada.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号