首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A reconfigurable network termed as the reconfigurable multi-ring network (RMRN) is described. The RMRN is shown to be a truly scalable network in that each node in the network has a fixed degree of connectivity and the reconfiguration mechanism ensures a network diameter of O(log2 N) for anN-processor network. Algorithms for the two-dimensional mesh and the SIMD or SPMD n-cube are shown to map very elegantly onto the RMRN. Basic message passing and reconfiguration primitives for the SIMD/SPMD RMRN are designed for use as building blocks for more complex parallel algorithms. The RMRN is shown to be a viable architecture for image processing and computer vision problems using the parallel computation of the stereocorrelation imaging operation as an example. Stereocorrelation is one of the most computationally intensive imaging tasks. It is used as a visualization tool in many applications, including remote sensing, geographic information systems and robot vision.An earlier version of this paper was presented at the 1995 International Conference on Parallel and Distributed Processing Techniques and Applications.  相似文献   

2.
Thedistance transform(DT) is an image computation tool which can be used to extract the information about the shape and the position of the foreground pixels relative to each other. It converts a binary image into a grey-level image, where each pixel has a value corresponding to the distance to the nearest foreground pixel. The time complexity for computing the distance transform is fully dependent on the different distance metrics. Especially, the more exact the distance transform is, the worse execution time reached will be. Nowadays, quite often thousands of images are processed in a limited time. It seems quite impossible for a sequential computer to do such a computation for the distance transform in real time. In order to provide efficient distance transform computation, it is considerably desirable to develop a parallel algorithm for this operation. In this paper, based on the diagonal propagation approach, we first provide anO(N2) time sequential algorithm to compute thechessboard distance transform(CDT) of anN×Nimage, which is a DT using the chessboard distance metrics. Based on the proposed sequential algorithm, the CDT of a 2D binary image array of sizeN×Ncan be computed inO(logN) time on the EREW PRAM model usingO(N2/logN) processors,O(log logN) time on the CRCW PRAM model usingO(N2/log logN) processors, andO(logN) time on the hypercube computer usingO(N2/logN) processors. Following the mapping as proposed by Lee and Horng, the algorithm for the medial axis transform is also efficiently derived. The medial axis transform of a 2D binary image array of sizeN×Ncan be computed inO(logN) time on the EREW PRAM model usingO(N2/logN) processors,O(log logN) time on the CRCW PRAM model usingO(N2/log logN) processors, andO(logN) time on the hypercube computer usingO(N2/logN) processors. The proposed parallel algorithms are composed of a set of prefix operations. In each prefix operation phase, only increase (add-one) operation and minimum operation are employed. So, the algorithms are especially efficient in practical applications.  相似文献   

3.
Y.-J. Chen  S.-J. Horng 《Computing》1997,59(2):95-114
To represent a region of a digital image as the union of maximal upright squares contained in the region is called the medial axis transform. In this paper, we present anO(logn) time parallel algorithm for the medial axis transform of ann×n binary image on an SIMD mesh-connected computers with hyperbus broadcasting usingn 3 processors.  相似文献   

4.
Discrete Fourier transform (DFT) is an important tool in digital signal processing. In the present paper, we propose a novel approach to performing DFT. We transform DFT into a form expressed in discrete moments via a modular mapping and truncating Taylor series expansion. From this, we extend the use of our systolic array for fast computation of moments without any multiplications to one that computes DFT with only a few multiplications and without any evaluations of exponential functions. The multiplication number used in our method isO(Nlog2 N/ log2log2 N) superior toO(Nlog2 N) in FFT. The execution time of the systolic array is onlyO(Nlog2 N/ log2log2 N) for 1-D DFT andO(Nk) fork-D DFT (k⩾2). The systolic implementation is a demonstration of the locality of dataflow in the algorithms and hence it implies an easy and potential hardware/VLSI realization. The approach is also applicable to DFT inverses.  相似文献   

5.
Maximal outerplanar graphs constitute an important class of graphs, often encountered in various applications, e.g., computational geometry, robotics, etc. In this paper, we propose a parallel algorithm for testing the isomorphism of maximal outerplanar graphs. Given the ordered adjacency lists of the two graphs, the proposed algorithm tests their isomorphism inO(log N) time usingNprocessors, for graphs withNnodes on an EREW shared memory model, as well as on a hypercube arhitecture. When the adjacency matrices of the graphs are given, this algorithm can be redesigned onN2processors to run inO(log N) time.  相似文献   

6.
Efficient algorithms to compute the Hough transform on MIMD and SIMD hypercube multicomputer are developed. Our algorithms can compute p angles of the Hough transform of an N × N image, p N, in 0(p + log N) time on both MIMD and SIMD hypercubes. These algorithms require 0(N 2) processors. We also consider the computation of the Hough transform on MIMD hypercubes with a fixed number of processors. Experimental results on an NCUBE/7 hypercube are presented.This research was supported by the National Science Foundation under grants DCR84-20935 and 86-17374. All correspondence should be mailed to Sanjay Ranka.  相似文献   

7.
Xin He 《Algorithmica》1990,5(1):545-559
We present an efficient algorithm for 4-coloring perfect planar graphs. The best previously known algorithm for this problem takesO(n 3/2) sequential time, orO(log4 n) parallel time withO(n3) processors. The sequential implementation of our algorithm takesO(n logn) time. The parallel implementation of our algorithm takesO(log3 n) time withO(n) processors on a PRAM.  相似文献   

8.
This paper presents parallel algorithms for determining the number of partitions of a given integerN, where the partitions may be subject to restrictions, such as being composed of distinct parts, of a given number of parts, and/or of parts belonging to a specified set. We present a series of adaptive algorithms suitable for varying numbers of processors. The fastest of these algorithms computes the number of partitions ofnwith largest part equal tok, for 1 ≤knN, in timeO(log2(N)) usingO(N2/logN) processors. Parallel logarithmic time algorithms that generate partitions uniformly at random, using these quantities, are also presented.  相似文献   

9.
He  Xin 《Algorithmica》1990,5(1-4):545-559

We present an efficient algorithm for 4-coloring perfect planar graphs. The best previously known algorithm for this problem takesO(n 3/2) sequential time, orO(log4 n) parallel time withO(n3) processors. The sequential implementation of our algorithm takesO(n logn) time. The parallel implementation of our algorithm takesO(log3 n) time withO(n) processors on a PRAM.

  相似文献   

10.
This paper describes a system-level diagnosis algorithm for hypercube multicomputer systems. The algorithm is based on the PMC model and can isolate all faulty processors to within a set that contains at most one fault-free processor. If we denote by N the total number of processors in a hypercube system to be diagnosed, then, based on the judiciously designed data structures, the algorithm can run in O(Nlog2N) time; whereas the best-known diagnosis algorithm, the YML algorithm, runs in O(N2.5) time. Consequently, the new algorithm is remarkably superior to the YML algorithm in terms of the time cost.  相似文献   

11.
Previous research on developing parallel triangulation algorithms concentrated on triangulating planar point sets.O(log3 n) running time algorithms usingO(n) processors have been developed in Refs. 1 and 2. Atallah and Goodrich(3) presented a data structure that can be viewed as a parallel analogue of the sequential plane-sweeping paradigm, which can be used to triangulate a planar point set inO(logn loglogn) time usingO(n) processors. Recently Merks(4) described an algorithm for triangulating point sets which runs inO(logn) time usingO(n) processors, and is thus optimal. In this paper we develop a parallel algorithm for triangulating simplicial point sets in arbitrary dimensions based on the idea of the sequential algorithm presented in Ref. 5. The algorithm runs inO(log2 n) time usingO(n/logn) processors. The algorithm hasO(n logn) as the product of the running time and the number of processors; i.e., an optimal speed-up.  相似文献   

12.
The derivation of the prefixes of a given sequence (prefix computation) and the fast reduction of a tree to a single node (tree contraction) are two useful primitives for many applications on parallel computers. It is well known that certain special cases of the two problems can be solved efficiently on the hypercube. Here we extend this result to a large family of parallel computers. The family of parallel computers are based on a novel interconnection scheme called thegeneralized Fibonacci cube that encompasses both the hypercube and the second-order Fibonacci cube in [8]. Specifically, we show that thek-th order Fibonacci tree of sizeN can be reduced to a single node inO(logN) steps on ak-th order Fibonacci cube withN nodes (processors). Assuming thatO(logN) data items are on each of theN processors, we also show that the prefixes can be computed inO(logN) steps on thek-th order Fibonacci cube.  相似文献   

13.
LetG(V,E) be a simple undirected graph with a maximum vertex degree Δ(G) (or Δ for short), |V| =nand |E| =m. An edge-coloring ofGis an assignment to each edge inGa color such that all edges sharing a common vertex have different colors. The minimum number of colors needed is denoted by χ′(G) (called thechromatic index). For a simple graphG, it is known that Δ ≤ χ′(G) ≤ Δ + 1. This paper studies two edge-coloring problems. The first problem is to perform edge-coloring for an existing edge-colored graphGwith Δ + 1 colors stemming from the addition of a new vertex intoG. The proposed parallel algorithm for this problem runs inO3/2log3Δ + Δ logn) time usingO(max{nΔ, Δ3}) processors. The second problem is to color the edges of a given uncolored graphGwith Δ + 1 colors. For this problem, our first parallel algorithm requiresO5.5log3Δ logn+ Δ5log4n) time andO(max{n2Δ,nΔ3}) processors, which is a slight improvement on the algorithm by H. J. Karloff and D. B. Shmoys [J. Algorithms8 (1987), 39–52]. Their algorithm costsO6log4n) time andO(n2Δ) processors if we use the fastest known algorithm for finding maximal independent sets by M. Goldberg and T. Spencer [SIAM J. Discrete Math.2 (1989), 322–328]. Our second algorithm requiresO4.5log3Δ logn+ Δ4log4n) time andO(max{n2,nΔ3}) processors. Finally, we present our third algorithm by incorporating the second algorithm as a subroutine. This algorithm requiresO3.5log3Δ logn+ Δ3log4n) time andO(max{n2log Δ,nΔ3}) processors, which improves, by anO2.5) factor in time, on Karloff and Shmoys' algorithm. All of these algorithms run in the COMMON CRCW PRAM model.  相似文献   

14.
In this paper a general technique for reducing processors in simulation without any increase in time is described. This results in an O(√logn) time algorithm for simulating one step of PRIORITY on TOLERANT with processor-time product of O(n log logn); the same as that for simulating PRIORITY on ARBITRARY. This is used to obtain anO(logn/log logn + √logn (log logm ? log logn)) time algorithm for sortingn integers from the set {0,...,m ? 1},mn, with a processor-time product ofO(n log logm log logn) on a TOLERANT CRCW PRAM. New upper and lower bounds for ordered chaining problem on an allocated COMMON CRCW model are also obtained. The algorithm for ordered chaining takesO(logn/log logn) time on an allocated PRAM of sizen. It is shown that this result is best possible (upto a constant multiplicative factor) by obtaining a lower bound of Ω(r logn/(logr + log logn)) for finding the first (leftmost one) live processor on an allocated-COMMON PRAM of sizen ofr-slow virtual processors (one processor simulatesr processors of allocated PRAM). As a result, for ordered chaining problem, “processor-time product” has to be at least Ω(n logn/log logn) for any poly-logarithmic time algorithm. Algorithm for ordered-chaining problem results in anO(logN/log logN) time algorithm for (stable) sorting ofn integers from the set {0,...,m ? 1} withn-processors on a COMMON CRCW PRAM; hereN = max(n, m). In particular if,m =n O(1), then sorting takes Θ(logn/log logn) time on both TOLERANT and COMMON CRCW PRAMs. Processor-time product for TOLERANT isO(n(log logn)2). Algorithm for COMMON usesn processors.  相似文献   

15.
This paper describes a new factorization of the inverse of the joint-space inertia matrix M. In this factorization, M ?1 is directly obtained as the product of a set of sparse matrices wherein, for a serial chain, only the inversion of a block-tridiagonal matrix is needed. In other words, this factorization reduces the inversion of a dense matrix to that of a block-tridiagonal one. As a result, this factorization leads to both an optimal serial and an optimal parallel algorithm, that is, a serial algorithm with a complexity of O(N) and a parallel algorithm with a time complexity of O(logN) on a computer with O(N) processors. The novel feature of this algorithm is that it first calculates the interbody forces. Once these forces are known, the accelerations are easily calculated. We discuss the extension of the algorithm to the task of calculating the forward dynamics of a kinematic tree consisting of a single main chain plus any number of short side branches. We also show that this new factorization of M ?1 leads to a new factorization of the operational-space inverse inertia, Λ ?1, in the form of a product involving sparse matrices. We show that this factorization can be exploited for optimal serial and parallel computation of Λ ?1, that is, a serial algorithm with a complexity of O(N) and a parallel algorithm with a time complexity of O(logN) on a computer with O(N) processors.  相似文献   

16.
The main results of this paper are efficient parallel algorithms, MSP and LOCATE, for computing minimal spanning trees and locating minimal paths in directed graphs, respectively. Algorithm MSP has time complexityO(log3 n) usingO(n 3/logn) processors, while LOCATE has time complexityO(logn) usingO(n 2) processors. Algorithm MSP is derived from sequential algorithms, when the unbounded parallelism model is used.  相似文献   

17.
This paper presents two massively parallel processing architectures suitable for solving a wide variety of algorithms of divide-and-conquer type for problems such as the discrete Fourier transform, production systems, design automation, and others. The first architecture, called the Chain-structured Butterfly ARchitecture (CBAR), consists of a two-dimensional array of N = L · (log2(L) + 1) processing elements (PE) organized as L levels of log2(L) + 1 stages, and which has the butterfly connection between PEs in consecutive stages with straight-through feedback between PEs in the last and first stages. This connection system has the desirable property of allowing thousands of PEs to be connected with O(N) connection cost, O(log2(N/log2N)) communication paths, and a small number (=4) of I/O ports per PE. However, this architecture is not fault tolerant. We, therefore, propose a second architecture, called the REconfigurable Chain-structured Butterfly ARchitecture (RECBAR), which is a modified version of the CBAR. The RECBAR possesses all the desirable features of the CBAR, with the number of I/O ports per PE increased to six, and uses O((log2N)/N) overhead in PEs and approximately 50% overhead in links to achieve single-level fault tolerance. Reliability improvements ofthe RECBAR over the CBAR are studied. This paper also presents a distributed diagnostic and structuring algorithm for the RECBAR that enables the architecture to detect faults and structure itself accordingly within 2 · log2(L) + 1 time steps, thus making it a truly fault tolerant architecture.  相似文献   

18.
We present a method called the Truncation method for computing Walsh-Hadamard transforms of one- and two-dimensional data. In one dimension, the method uses binary trees as a basis for representing the data and computing the transform. In two dimensions, the method uses quadtrees (pyramids), adaptive quad-trees, or binary trees as a basis. We analyze the storage and time complexity of this method in worst and general cases. The results show that the Truncation method degenerates to the Fast Walsh Transform (FWT) in the worst case, while the Truncation method is faster than the Fast Walsh Transform when there is coherence in the input data, as will typically be the case for image data. In one dimension, the performance of the Truncation method for N data samples is between O(N) and O(N log2N), and it is between O(N2) and O(N2 log2N) in two dimensions. Practical results on several images are presented to show that both the expected and actual overall times taken to compute Walsh transforms using the Truncation method are less than those required by a similar implementation of the FWT method.  相似文献   

19.
Xin He  Yaacov Yesha 《Algorithmica》1990,5(1-4):129-145
We develop efficient parallel algorithms for ther-dominating set and thep-center problems on trees. On a concurrent-read exclusive-write PRAM, our algorithm for ther-dominating set problem runs inO(logn log logn) time withn processors. The algorithm for thep-center problem runs inO(log2 n log logn) time withn processors.  相似文献   

20.
We develop parallel algorithms to compute the Hough transform on a reconfigurable mesh with buses (RMESH) multiprocessor. The p angle Hough transform of an N × N image can be computed in O(p log(N/p)) time by an N × N RMESH, in O((p/N) log N) time by an N × N2 RMESH with N copies of the image pretiled, in O((p/[formula]) log N) time by an N1.5 × N1.5 RMESH, and in O((p/N) log N) time by an N2 × N2 RMESH.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号