期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Parallel stereocorrelation on a reconfigurable multi-ring network

Hamid R. Arabnia Suchendra M. Bhandarkar 《The Journal of supercomputing》1996,10(3):243-269

A reconfigurable network termed as the reconfigurable multi-ring network (RMRN) is described. The RMRN is shown to be a truly scalable network in that each node in the network has a fixed degree of connectivity and the reconfiguration mechanism ensures a network diameter of O(log₂ N) for anN-processor network. Algorithms for the two-dimensional mesh and the SIMD or SPMD n-cube are shown to map very elegantly onto the RMRN. Basic message passing and reconfiguration primitives for the SIMD/SPMD RMRN are designed for use as building blocks for more complex parallel algorithms. The RMRN is shown to be a viable architecture for image processing and computer vision problems using the parallel computation of the stereocorrelation imaging operation as an example. Stereocorrelation is one of the most computationally intensive imaging tasks. It is used as a visualization tool in many applications, including remote sensing, geographic information systems and robot vision.An earlier version of this paper was presented at the 1995 International Conference on Parallel and Distributed Processing Techniques and Applications. 相似文献

2.

Optimal Computing the Chessboard Distance Transform on Parallel Processing Systems

Yu-Hua Lee Shi-Jinn Horng 《Computer Vision and Image Understanding》1999,73(3):272

Thedistance transform(DT) is an image computation tool which can be used to extract the information about the shape and the position of the foreground pixels relative to each other. It converts a binary image into a grey-level image, where each pixel has a value corresponding to the distance to the nearest foreground pixel. The time complexity for computing the distance transform is fully dependent on the different distance metrics. Especially, the more exact the distance transform is, the worse execution time reached will be. Nowadays, quite often thousands of images are processed in a limited time. It seems quite impossible for a sequential computer to do such a computation for the distance transform in real time. In order to provide efficient distance transform computation, it is considerably desirable to develop a parallel algorithm for this operation. In this paper, based on the diagonal propagation approach, we first provide anO(N²) time sequential algorithm to compute thechessboard distance transform(CDT) of anN×Nimage, which is a DT using the chessboard distance metrics. Based on the proposed sequential algorithm, the CDT of a 2D binary image array of sizeN×Ncan be computed inO(logN) time on the EREW PRAM model usingO(N²/logN) processors,O(log logN) time on the CRCW PRAM model usingO(N²/log logN) processors, andO(logN) time on the hypercube computer usingO(N²/logN) processors. Following the mapping as proposed by Lee and Horng, the algorithm for the medial axis transform is also efficiently derived. The medial axis transform of a 2D binary image array of sizeN×Ncan be computed inO(logN) time on the EREW PRAM model usingO(N²/logN) processors,O(log logN) time on the CRCW PRAM model usingO(N²/log logN) processors, andO(logN) time on the hypercube computer usingO(N²/logN) processors. The proposed parallel algorithms are composed of a set of prefix operations. In each prefix operation phase, only increase (add-one) operation and minimum operation are employed. So, the algorithms are especially efficient in practical applications. 相似文献

3.

Medial axis transform on mesh-connected computers with hyperbus broadcasting

Y.-J. Chen S.-J. Horng 《Computing》1997,59(2):95-114

To represent a region of a digital image as the union of maximal upright squares contained in the region is called the medial axis transform. In this paper, we present anO(logn) time parallel algorithm for the medial axis transform of ann×n binary image on an SIMD mesh-connected computers with hyperbus broadcasting usingn ³ processors. 相似文献

4.

A Novel Approach to Fast Discrete Fourier Transform

《Journal of Parallel and Distributed Computing》1998,54(1):48-58

Discrete Fourier transform (DFT) is an important tool in digital signal processing. In the present paper, we propose a novel approach to performing DFT. We transform DFT into a form expressed in discrete moments via a modular mapping and truncating Taylor series expansion. From this, we extend the use of our systolic array for fast computation of moments without any multiplications to one that computes DFT with only a few multiplications and without any evaluations of exponential functions. The multiplication number used in our method isO(Nlog₂ N/ log₂log₂ N) superior toO(Nlog₂ N) in FFT. The execution time of the systolic array is onlyO(Nlog₂ N/ log₂log₂ N) for 1-D DFT andO(N^k) fork-D DFT (k⩾2). The systolic implementation is a demonstration of the locality of dataflow in the algorithms and hence it implies an easy and potential hardware/VLSI realization. The approach is also applicable to DFT inverses. 相似文献

5.

A SimpleO(log N) Time Parallel Algorithm for Testing Isomorphism of Maximal Outerplanar Graphs

《Journal of Parallel and Distributed Computing》1999,56(2):144-155

Maximal outerplanar graphs constitute an important class of graphs, often encountered in various applications, e.g., computational geometry, robotics, etc. In this paper, we propose a parallel algorithm for testing the isomorphism of maximal outerplanar graphs. Given the ordered adjacency lists of the two graphs, the proposed algorithm tests their isomorphism inO(log N) time usingNprocessors, for graphs withNnodes on an EREW shared memory model, as well as on a hypercube arhitecture. When the adjacency matrices of the graphs are given, this algorithm can be redesigned onN²processors to run inO(log N) time. 相似文献

6.

Computing Hough transforms on hypercube multicomputers

Sanjay Ranka Sartaj Sahni 《The Journal of supercomputing》1990,4(2):169-190

Efficient algorithms to compute the Hough transform on MIMD and SIMD hypercube multicomputer are developed. Our algorithms can compute p angles of the Hough transform of an N × N image, p N, in 0(p + log N) time on both MIMD and SIMD hypercubes. These algorithms require 0(N ²) processors. We also consider the computation of the Hough transform on MIMD hypercubes with a fixed number of processors. Experimental results on an NCUBE/7 hypercube are presented.This research was supported by the National Science Foundation under grants DCR84-20935 and 86-17374. All correspondence should be mailed to Sanjay Ranka. 相似文献

7.

Efficient parallel and sequential algorithms for 4-coloring perfect planar graphs

Xin He 《Algorithmica》1990,5(1):545-559

We present an efficient algorithm for 4-coloring perfect planar graphs. The best previously known algorithm for this problem takesO(n ^3/2) sequential time, orO(log⁴ n) parallel time withO(n³) processors. The sequential implementation of our algorithm takesO(n logn) time. The parallel implementation of our algorithm takesO(log³ n) time withO(n) processors on a PRAM. 相似文献

8.

Parallel Algorithms for Counting and Randomly Generating Integer Partitions

《Journal of Parallel and Distributed Computing》1996,34(1):29-35

This paper presents parallel algorithms for determining the number of partitions of a given integerN, where the partitions may be subject to restrictions, such as being composed of distinct parts, of a given number of parts, and/or of parts belonging to a specified set. We present a series of adaptive algorithms suitable for varying numbers of processors. The fastest of these algorithms computes the number of partitions ofnwith largest part equal tok, for 1 ≤k≤n≤N, in timeO(log²(N)) usingO(N²/logN) processors. Parallel logarithmic time algorithms that generate partitions uniformly at random, using these quantities, are also presented. 相似文献

9.

Efficient parallel and sequential algorithms for 4-coloring perfect planar graphs

He Xin 《Algorithmica》1990,5(1-4):545-559

We present an efficient algorithm for 4-coloring perfect planar graphs. The best previously known algorithm for this problem takesO(n ^3/2) sequential time, orO(log⁴ n) parallel time withO(n³) processors. The sequential implementation of our algorithm takesO(n logn) time. The parallel implementation of our algorithm takesO(log³ n) time withO(n) processors on a PRAM.

相似文献

10.

A fast pessimistic one-step diagnosis algorithm for hypercube multicomputer systems

《Journal of Parallel and Distributed Computing》2004,64(4):546-553

This paper describes a system-level diagnosis algorithm for hypercube multicomputer systems. The algorithm is based on the PMC model and can isolate all faulty processors to within a set that contains at most one fault-free processor. If we denote by N the total number of processors in a hypercube system to be diagnosed, then, based on the judiciously designed data structures, the algorithm can run in O(Nlog₂N) time; whereas the best-known diagnosis algorithm, the YML algorithm, runs in O(N^2.5) time. Consequently, the new algorithm is remarkably superior to the YML algorithm in terms of the time cost. 相似文献

11.

An optimal speed-up parallel algorithm for triangulating simplicial point sets in space

Hossam ElGindy 《International journal of parallel programming》1986,15(5):389-398

Previous research on developing parallel triangulation algorithms concentrated on triangulating planar point sets.O(log³ n) running time algorithms usingO(n) processors have been developed in Refs. 1 and 2. Atallah and Goodrich⁽³⁾ presented a data structure that can be viewed as a parallel analogue of the sequential plane-sweeping paradigm, which can be used to triangulate a planar point set inO(logn loglogn) time usingO(n) processors. Recently Merks⁽⁴⁾ described an algorithm for triangulating point sets which runs inO(logn) time usingO(n) processors, and is thus optimal. In this paper we develop a parallel algorithm for triangulating simplicial point sets in arbitrary dimensions based on the idea of the sequential algorithm presented in Ref. 5. The algorithm runs inO(log² n) time usingO(n/logn) processors. The algorithm hasO(n logn) as the product of the running time and the number of processors; i.e., an optimal speed-up. 相似文献

12.

Parallel tree contraction and prefix computations on a large family of interconnection topologies

W. J. Hsu C. V. Page 《Acta Informatica》1995,32(2):145-153

The derivation of the prefixes of a given sequence (prefix computation) and the fast reduction of a tree to a single node (tree contraction) are two useful primitives for many applications on parallel computers. It is well known that certain special cases of the two problems can be solved efficiently on the hypercube. Here we extend this result to a large family of parallel computers. The family of parallel computers are based on a novel interconnection scheme called thegeneralized Fibonacci cube that encompasses both the hypercube and the second-order Fibonacci cube in [8]. Specifically, we show that thek-th order Fibonacci tree of sizeN can be reduced to a single node inO(logN) steps on ak-th order Fibonacci cube withN nodes (processors). Assuming thatO(logN) data items are on each of theN processors, we also show that the prefixes can be computed inO(logN) steps on thek-th order Fibonacci cube. 相似文献

13.

Parallel Algorithms for the Edge-Coloring and Edge-Coloring Update Problems

《Journal of Parallel and Distributed Computing》1996,32(1):66-73

LetG(V,E) be a simple undirected graph with a maximum vertex degree Δ(G) (or Δ for short), |V| =nand |E| =m. An edge-coloring ofGis an assignment to each edge inGa color such that all edges sharing a common vertex have different colors. The minimum number of colors needed is denoted by χ′(G) (called thechromatic index). For a simple graphG, it is known that Δ ≤ χ′(G) ≤ Δ + 1. This paper studies two edge-coloring problems. The first problem is to perform edge-coloring for an existing edge-colored graphGwith Δ + 1 colors stemming from the addition of a new vertex intoG. The proposed parallel algorithm for this problem runs inO(Δ^3/2log³Δ + Δ logn) time usingO(max{nΔ, Δ³}) processors. The second problem is to color the edges of a given uncolored graphGwith Δ + 1 colors. For this problem, our first parallel algorithm requiresO(Δ^5.5log³Δ logn+ Δ⁵log⁴n) time andO(max{n²Δ,nΔ³}) processors, which is a slight improvement on the algorithm by H. J. Karloff and D. B. Shmoys [J. Algorithms8 (1987), 39–52]. Their algorithm costsO(Δ⁶log⁴n) time andO(n²Δ) processors if we use the fastest known algorithm for finding maximal independent sets by M. Goldberg and T. Spencer [SIAM J. Discrete Math.2 (1989), 322–328]. Our second algorithm requiresO(Δ^4.5log³Δ logn+ Δ⁴log⁴n) time andO(max{n²,nΔ³}) processors. Finally, we present our third algorithm by incorporating the second algorithm as a subroutine. This algorithm requiresO(Δ^3.5log³Δ logn+ Δ³log⁴n) time andO(max{n²log Δ,nΔ³}) processors, which improves, by anO(Δ^2.5) factor in time, on Karloff and Shmoys' algorithm. All of these algorithms run in the COMMON CRCW PRAM model. 相似文献

14.

Parallel integer sorting and simulation amongst CRCW models

Sanjeev Saxena 《Acta Informatica》1996,33(5):607-619

In this paper a general technique for reducing processors in simulation without any increase in time is described. This results in an O(√logn) time algorithm for simulating one step of PRIORITY on TOLERANT with processor-time product of O(n log logn); the same as that for simulating PRIORITY on ARBITRARY. This is used to obtain anO(logn/log logn + √logn (log logm ? log logn)) time algorithm for sortingn integers from the set {0,...,m ? 1},m ≧n, with a processor-time product ofO(n log logm log logn) on a TOLERANT CRCW PRAM. New upper and lower bounds for ordered chaining problem on an allocated COMMON CRCW model are also obtained. The algorithm for ordered chaining takesO(logn/log logn) time on an allocated PRAM of sizen. It is shown that this result is best possible (upto a constant multiplicative factor) by obtaining a lower bound of Ω(r logn/(logr + log logn)) for finding the first (leftmost one) live processor on an allocated-COMMON PRAM of sizen ofr-slow virtual processors (one processor simulatesr processors of allocated PRAM). As a result, for ordered chaining problem, “processor-time product” has to be at least Ω(n logn/log logn) for any poly-logarithmic time algorithm. Algorithm for ordered-chaining problem results in anO(logN/log logN) time algorithm for (stable) sorting ofn integers from the set {0,...,m ? 1} withn-processors on a COMMON CRCW PRAM; hereN = max(n, m). In particular if,m =n ^O(1), then sorting takes Θ(logn/log logn) time on both TOLERANT and COMMON CRCW PRAMs. Processor-time product for TOLERANT isO(n(log logn)²). Algorithm for COMMON usesn processors. 相似文献

15.

A new factorization of the mass matrix for optimal serial and parallel calculation of multibody dynamics

Amir Fijany Roy Featherstone 《Multibody System Dynamics》2013,29(2):169-187

This paper describes a new factorization of the inverse of the joint-space inertia matrix M. In this factorization, M ^?1 is directly obtained as the product of a set of sparse matrices wherein, for a serial chain, only the inversion of a block-tridiagonal matrix is needed. In other words, this factorization reduces the inversion of a dense matrix to that of a block-tridiagonal one. As a result, this factorization leads to both an optimal serial and an optimal parallel algorithm, that is, a serial algorithm with a complexity of O(N) and a parallel algorithm with a time complexity of O(logN) on a computer with O(N) processors. The novel feature of this algorithm is that it first calculates the interbody forces. Once these forces are known, the accelerations are easily calculated. We discuss the extension of the algorithm to the task of calculating the forward dynamics of a kinematic tree consisting of a single main chain plus any number of short side branches. We also show that this new factorization of M ^?1 leads to a new factorization of the operational-space inverse inertia, Λ ^?1, in the form of a product involving sparse matrices. We show that this factorization can be exploited for optimal serial and parallel computation of Λ ^?1, that is, a serial algorithm with a complexity of O(N) and a parallel algorithm with a time complexity of O(logN) on a computer with O(N) processors. 相似文献

16.

Parallel algorithms for minimal spanning trees of directed graphs

Yixin Zhang 《International journal of parallel programming》1989,18(3):205-221

The main results of this paper are efficient parallel algorithms, MSP and LOCATE, for computing minimal spanning trees and locating minimal paths in directed graphs, respectively. Algorithm MSP has time complexityO(log³ n) usingO(n ³/logn) processors, while LOCATE has time complexityO(logn) usingO(n ²) processors. Algorithm MSP is derived from sequential algorithms, when the unbounded parallelism model is used. 相似文献

17.

A fault tolerant massively parallel processing architecture

《Journal of Parallel and Distributed Computing》1987,4(4):363-383

This paper presents two massively parallel processing architectures suitable for solving a wide variety of algorithms of divide-and-conquer type for problems such as the discrete Fourier transform, production systems, design automation, and others. The first architecture, called the Chain-structured Butterfly ARchitecture (CBAR), consists of a two-dimensional array of N = L · (log₂(L) + 1) processing elements (PE) organized as L levels of log₂(L) + 1 stages, and which has the butterfly connection between PEs in consecutive stages with straight-through feedback between PEs in the last and first stages. This connection system has the desirable property of allowing thousands of PEs to be connected with O(N) connection cost, O(log₂(N/log₂N)) communication paths, and a small number (=4) of I/O ports per PE. However, this architecture is not fault tolerant. We, therefore, propose a second architecture, called the REconfigurable Chain-structured Butterfly ARchitecture (RECBAR), which is a modified version of the CBAR. The RECBAR possesses all the desirable features of the CBAR, with the number of I/O ports per PE increased to six, and uses O((log₂N)/N) overhead in PEs and approximately 50% overhead in links to achieve single-level fault tolerance. Reliability improvements ofthe RECBAR over the CBAR are studied. This paper also presents a distributed diagnostic and structuring algorithm for the RECBAR that enables the architecture to detect faults and structure itself accordingly within 2 · log₂(L) + 1 time steps, thus making it a truly fault tolerant architecture. 相似文献

18.

A Truncation Method for Computing Walsh Transforms with Applications to Image Processing

《CVGIP: Graphical Models and Image Processing》1993,55(6):482-493

We present a method called the Truncation method for computing Walsh-Hadamard transforms of one- and two-dimensional data. In one dimension, the method uses binary trees as a basis for representing the data and computing the transform. In two dimensions, the method uses quadtrees (pyramids), adaptive quad-trees, or binary trees as a basis. We analyze the storage and time complexity of this method in worst and general cases. The results show that the Truncation method degenerates to the Fast Walsh Transform (FWT) in the worst case, while the Truncation method is faster than the Fast Walsh Transform when there is coherence in the input data, as will typically be the case for image data. In one dimension, the performance of the Truncation method for N data samples is between O(N) and O(N log₂N), and it is between O(N²) and O(N² log₂N) in two dimensions. Practical results on several images are presented to show that both the expected and actual overall times taken to compute Walsh transforms using the Truncation method are less than those required by a similar implementation of the FWT method. 相似文献

19.

Efficient parallel algorithms forr-dominating set andp-center problems on trees

Xin He Yaacov Yesha 《Algorithmica》1990,5(1-4):129-145

We develop efficient parallel algorithms for ther-dominating set and thep-center problems on trees. On a concurrent-read exclusive-write PRAM, our algorithm for ther-dominating set problem runs inO(logn log logn) time withn processors. The algorithm for thep-center problem runs inO(log² n log logn) time withn processors. 相似文献

20.

Reconfigurable Mesh Algorithms for the Hough Transform

《Journal of Parallel and Distributed Computing》1994,20(1):69-77

We develop parallel algorithms to compute the Hough transform on a reconfigurable mesh with buses (RMESH) multiprocessor. The p angle Hough transform of an N × N image can be computed in O(p log(N/p)) time by an N × N RMESH, in O((p/N) log N) time by an N × N² RMESH with N copies of the image pretiled, in O((p/[formula]) log N) time by an N^1.5 × N^1.5 RMESH, and in O((p/N) log N) time by an N² × N² RMESH. 相似文献