期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Parallel clustering algorithms 总被引：3，自引：0，他引：3

Xiaobo Li Zhixi Fang 《Parallel Computing》1989,11(3):275-290

Clustering techniques play an important role in exploratory pattern analysis, unsupervised learning and image segmentation applications. Many clustering algorithms, both partitional clustering and hierarchical clustering, require intensive computation, even for a modest number of patterns. This paper presents two parallel clustering algorithms. For a clustering problem with N = 2ⁿ patterns and M = 2^m features, the time complexity of the traditional partitional clustering algorithm on a single processor computer is O(MNK), where K is the number of clusters. The proposed algorithm on anSIMD computer with MN processors has a time complexity O(K(n + m)). The time complexity of the proposed single-link hierarchical clustering algorithm is reduced from O(MN²) of the uniprocessor algorithm to O(nN) with MN processors. 相似文献

2.

Parallel nested dissection

John M. Conroy 《Parallel Computing》1990,16(2-3):139-156

Nested dissection is a very popular direct method for solving sparse linear systems that arise from finite difference and finite element methods. Worley and Schreiber [16] give a fine grain algorithm for a square array of processors. Their algorithm uses O(N²) processors, each with O(N) memory, to factor an N² by N² sparse matrix whose graphs is an N × N mesh. The efficiency of their method is between 1/46 and 1/12. George et al. [6] [8] give a medium grain algorithm for hypercube architecture, while George et al. [7] give an algorithm for shared memory machines. These papers present a column oriented approach which can exploit O(N) parallelism and yield efficiencies up to 50%. Lucas [11] also gives a column oriented scheme which achieves up to 75% efficiency and O(N) parallelism. In this paper, we present a medium to fine grain algorithm for a P × P array of processors with local memory. This algorithm can exploit up to O(N²) parallelism. The efficiency of the fine grain version is comparable to [16] while as a medium grain algorithm achieves about 49% efficiency. The strength of the method is due to three factors: its ability to pipeline much of the computation, overlapping computation and communication, and the use of level 3 BLAS like primitives. In addition to its high efficiency its memory requirement is optimal, only O(N² log N/P²) words memory is needed per processor. 相似文献

3.

Pyramidal thinning algorithm for SIMD parallel machines

Stphane 《Pattern recognition》1995,28(12):1993-2000

We propose a parallel thinning algorithm for binary pictures. Given an N × N binary image including an object, our algorithm computes in O(N²) the skeleton of the object, using a pyramidal decomposition of the picture. The behavior of this algorithm is studied considering a family of digitalization of the same object at a different level of resolution. With the Exclusive Read Exclusive Write (EREW) Parallel Random Access Machine (PRAM), our algorithm runs in O(log N) time using O(N²/logN) processors and it is work-optimal. The same result is obtained with high-connectivity distributed memory SIMD machines having strong hypercube and pyramid. We describe the basic operator, the pyramidal algorithm and some experimental results on the SIMD MasPar parallel machine. 相似文献

4.

Parallel marching Poisson solvers

Marian Vajter&#x;ic 《Parallel Computing》1984,1(3-4):325-330

The paper presents parallel algorithms for solving Poisson equation at N² mesh points. The methods based on marching techniques are structured for efficient parallel realization. Using orthogonal decomposition properties of arising matrices, the algorithms can be formulated in terms of transformed vectors. On a MIMD computer with not more than N processors, the computations can be performed in horizontal slices with minimal synchronization requirements. Considering an SIMD machine with N² processors, the complexity bound O(log N) has been achieved, whereby the single marching requires 10 log N steps only. 相似文献

5.

An optimal distributed algorithm for recognizing mesh-connected networks

Rajanarayanan Subbiah Sitharama S. Iyengar

Sridhar Radhakrishnan

R. L. Kashyap 《Theoretical computer science》1993,120(2):261-278

In this paper, we consider the problem of recognizing whether a given network is a rectangular mesh. We present an efficient distributed algorithm with an O(N) message and time complexity, where N is the number of nodes in the network. This is an improvement of a previous algorithm presented in Mohan (1990) with a message complexity of O(N log N) and time complexity of O(N^1.6). The proposed algorithm is constructive in nature and also assigns coordinates to the nodes of the network. 相似文献

6.

Parallel processing approaches to edge relaxation

Eva Leung Xiaobo Li 《Pattern recognition》1988,21(6):547-558

This paper describes several parallel algorithms for image edge relaxation on array processors with different numbers of processing elements (PEs) connected by a mesh or hypercube network. The time complexity of Prager's original edge relaxation scheme is O(N²) per iteration using floating-point operations on a sequential machine, where N² is the number of pixels in the image. Modifications to the scheme are made so that no multiplications are employed and only integer operations are required. Moreover, with parallel processing, the time complexity per iteration is reduced to some constant value. A time complexity analysis on two parallel algorithms is performed. Although the algorithm on an array processor with 4N² PEs achieved higher degree of parallelism, the algorithm with N² PEs is preferred. Further modifications on the latter algorithm are made to accommodate to fewer PEs. 相似文献

7.

Scalable parallel FFT for spectral simulations on a Beowulf cluster

P. Dmitruk L. -P. Wang W. H. Matthaeus R. Zhang D. Seckel 《Parallel Computing》2001,27(14):1921-1936

The implementation and performance of the multidimensional Fast Fourier Transform (FFT) on a distributed memory Beowulf cluster is examined. We focus on the three-dimensional (3D) real transform, an essential computational component of Galerkin and pseudo-spectral codes. The approach studied is a 1D domain decomposition algorithm that relies on communication-intensive transpose operation involving P processors. Communication is based upon the standard portable message passing interface (MPI). We show that 1/P scaling for execution time at fixed problem size N³ (i.e., linear speedup) can be obtained provided that (1) the transpose algorithm is optimized for simultaneous block communication by all processors; and (2) communication is arranged for non-overlapping pairwise communication between processors, thus eliminating blocking when standard fast ethernet interconnects are employed. This method provides the basis for implementation of scalable and efficient spectral method computations of hydrodynamic and magneto-hydrodynamic turbulence on Beowulf clusters assembled from standard commodity components. An example is presented using a 3D passive scalar code. 相似文献

8.

Efficient algorithms for computing two nearest-neighbor problems on a rap

Tzong-Wann Kao Shi-Jinn Horng 《Pattern recognition》1994,27(12):1707-1716

This paper makes an improvement of computing two nearest-neighbor problems of images on a reconfigurable array of processors (RAP) by increasing the bus width between processors. Based on a base-n system, a constant time algorithm is first presented for computing the maximum/minimum of N log N-bit unsigned integers on a RAP using N processors each with N^1/c-bit bus width, where c is a constant and c ≥ 1. Then, two basic operations such as image component labeling and border following are also derived from it. Finally, these algorithms are used to design two constant time algorithms for the nearest neighbor black pixel and the nearest neighbor component problems on an N^1/2 × N^1/2 image using N^1/2 × N^1/2 processors each with N^1/c-bit bus width, where c is a constant and c ≥ 1. Another contribution of this paper is that the execution time of the proposed algorithms is tunable by the bus width. 相似文献

9.

A fast digital Radon transform—An efficient means for evaluating the Hough transform

W.A. H.J. 《Pattern recognition》1995,28(12):1985-1992

A fast digital Radon transform based on recursively defined digital straight lines is described, which has the sequential complexity of N² log N additions for an N × N image. This transform can be used to evaluate the Hough transform to detect straight lines in a digital image. Whilst a parallel implementation of the Hough transform algorithm is difficult because of global memory access requirements, the fast digital Radon transform is vectorizable and therefore well suited for parallel computation. The structure of the fast algorithm is shown to be quite similar to the FFT algorithm for decimation in frequency. It is demonstrated that even for sequential computation the fast Radon transform is an attractive alternative to the classical Hough transform algorithm. 相似文献

10.

Parallel algorithms for solving the satisfaction problem of functional and multivalued data dependencies

Chao-Chih Yang Weicong Shen 《Data & Knowledge Engineering》1989,3(4):323-338

Parallel algorithms for solving the satisfaction problem of non-trivial functional and multivalued data dependencies (FDs and MVDs) in a relation of N tuples by M processors are developed in this paper. Algorithms performing, in a parallel manner, batch or interactive checking of these data dependencies are also discussed. The M processors are organized as a linear systolic array. The time complexities of the first two algorithms for solving the FD satisfaction problem under M N are both O(N), and that of Algorithm (3) or (4) for solving the FD or MVD satisfaction problem under N M is O(N²/M). The latter complexity reduced to O(N) if N = M and is at least not worse than O(N log N) if N = M (N/log N). 相似文献

11.

Efficient and scalable quicksort on a linear array with a reconfigurable pipelined bus system 总被引：3，自引：0，他引：3

Yi Pan Mounir Hamdi Keqin Li 《Future Generation Computer Systems》1998,13(6):501-513

Based on the current fiber optic technology, a new computational model, called a linear array with a reconfigurable pipelined abus system (LARPBS), is proposed in this paper. A parallel quicksort algorithm is implemented on the model, and its time complexity is analyzed. For a set of N numbers, the quicksort algorithm reported in this paper runs in O(log₂ N) average time on a linear array with a reconfigurable pipelined bus system of size N. If the number of processors available is reduced to P, where P < N, the algorithm runs in O((N/P) log₂ N) average time and is still scalable. Besides proposing a new algorithm on the model, some basic data movement operations involved in the algorithm are discussed. We believe that these operations can be used to design other parallel algorithms on the same model. Future research in this area is also identified in this paper. 相似文献

12.

Optimal and nearly optimal algorithms for approximating polynomial zeros

V.Y. Pan 《Computers & Mathematics with Applications》1996,31(12):97-138

We substantially improve the known algorithms for approximating all the complex zeros of an n^th degree polynomial p(x). Our new algorithms save both Boolean and arithmetic sequential time, versus the previous best algorithms of Schönhage [1], Pan [2], and Neff and Reif [3]. In parallel (NC) implementation, we dramatically decrease the number of processors, versus the parallel algorithm of Neff [4], which was the only NC algorithm known for this problem so far. Specifically, under the simple normalization assumption that the variable x has been scaled so as to confine the zeros of p(x) to the unit disc x : |x| ≤ 1, our algorithms (which promise to be practically effective) approximate all the zeros of p(x) within the absolute error bound 2^−b, by using order of n arithmetic operations and order of (b + n)n² Boolean (bitwise) operations (in both cases up to within polylogarithmic factors). The algorithms allow their optimal (work preserving) NC parallelization, so that they can be implemented by using polylogarithmic time and the orders of n arithmetic processors or (b + n)n² Boolean processors. All the cited bounds on the computational complexity are within polylogarithmic factors from the optimum (in terms of n and b) under both arithmetic and Boolean models of computation (in the Boolean case, under the additional (realistic) assumption that n = O(b)). 相似文献

13.

Spacetime-minimal systolic arrays for Gaussian elimination and the Algebraic path problem

Abdelhamid Benaini Yves Robert 《Parallel Computing》1990,15(1-3):211-225

In this paper, we derive time-minimal systolic arrays for Gaussian elimination and the Algebraic Path Problem (APP) that use a minimal number of processors. For a problem of size n, we obtain an execution time T(n) = 3n −1 using A(n) = n²/4+O(n) processors for Gaussian elimination, and T(n) = 5n −2 and A(n) = n³/+O(n) for the APP. 相似文献

14.

Constant-time algorithm for computing the Euclidean distance maps of binary images on 2D meshes with reconfigurable buses

Yi Pan Keqin Li 《Information Sciences》1999,120(1-4):209-221

The computation of Euclidean distance maps (EDM), also called Euclidean distance transform, is a basic operation in computer vision, pattern recognition, and robotics. Fast computation of the EDM is needed since most of the applications using the EDM require real-time computation. It is shown in L. Chen and H.Y.H. Chuang [Information Processing Letters, 51, pp. 25–29 (1994)] that a lower bound Ω(n²) is required for any sequential EDM algorithm due to the fact that in any EDM algorithm each of the n² pixels has to be scanned at least once. Recently, many parallel EDM algorithms have been proposed to speedup its computation. Chen and Chuang proposed an algorithm for computing the EDM on an n×n mesh in O(n) time [L. Chen and H.Y.H. Chuang Parallel Computing, 21, pp. 841–852 (1995)]. Clearly, the VLSI complexities of both the sequential and the mesh algorithm described in L. Chen and H.Y.H. Chuang [Parallel Computing, 21, pp. 841–852 (1995)] are AT²=O(n⁴), where A is the VLSI layout area of the design and T is the computation time using area A when implemented in VLSI. In this paper, we propose a new and faster parallel algorithm for computing the EDM problem on the reconfigurable VLSI mesh model. For the same problem, our algorithm runs in O(1) time on a two-dimensional n²×n² reconfigurable mesh. We show that the VLSI complexity of our algorithm is the same as those of the above sequential algorithm and the mesh algorithm, while it uses much less time. To our best knowledge, this is the first constant-time EDM algorithm on any parallel computational model. 相似文献

15.

Reducing partition skew on MapReduce: an incremental allocation approach

Zhuo WANG Qun CHEN Bo SUO Wei PAN Zhanhuai LI 《Frontiers of Computer Science》2019,13(5):960

MapReduce, a parallel computational model, has been widely used in processing big data in a distributed cluster. Consisting of alternate map and reduce phases, MapReduce has to shuffle the intermediate data generated by mappers to reducers. The key challenge of ensuring balanced workload on MapReduce is to reduce partition skew among reducers without detailed distribution information on mapped data. In this paper, we propose an incremental data allocation approach to reduce partition skew among reducers on MapReduce. The proposed approach divides mapped data into many micro-partitions and gradually gathers the statistics on their sizes in the process of mapping. The micropartitions are then incrementally allocated to reducers in multiple rounds. We propose to execute incremental allocation in two steps, micro-partition scheduling and micro-partition allocation. We propose a Markov decision process (MDP) model to optimize the problem of multiple-round micropartition scheduling for allocation commitment. We present an optimal solution with the time complexity of O(K · N²), in which K represents the number of allocation rounds and N represents the number of micro-partitions. Alternatively, we also present a greedy but more efficient algorithm with the time complexity of O(K · N ln N). Then, we propose a minmax programming model to handle the allocation mapping between micro-partitions and reducers, and present an effective heuristic solution due to its NP-completeness. Finally, we have implemented the proposed approach on Hadoop, an open-source MapReduce platform, and empirically evaluated its performance. Our extensive experiments show that compared with the state-of-the-art approaches, the proposed approach achieves considerably better data load balance among reducers as well as overall better parallel performance. 相似文献

16.

Heaps and heapsort on secondary storage

R. Fadel K. V. Jakobsen J. Katajainen J. Teuhola 《Theoretical computer science》1999,220(2):585-362

A heap structure designed for secondary storage is suggested that tries to make the best use of the available buffer space in primary memory. The heap is a complete multi-way tree, with multi-page blocks of records as nodes, satisfying a generalized heap property. A special feature of the tree is that the nodes may be partially filled, as in B-trees. The structure is complemented with priority-queue operations insert and delete-max. When handling a sequence of S operations, the number of page transfers performed is shown to be O(∑_{i = 1}^S(1/P) log_(M/P)(N_i/P)), where P denotes the number of records fitting into a page, M the capacity of the buffer space in records, and N_i, the number of records in the heap prior to the ith operation (assuming P 1 and S> M c · P, where c is a small positive constant). The number of comparisons required when handling the sequence is O(∑_{i = 1}^S log₂ N_i). Using the suggested data structure we obtain an optimal external heapsort that performs O((N/P) log_(M/P)(N/P)) page transfers and O(N log₂ N) comparisons in the worst case when sorting N records. 相似文献

17.

Efficient algorithms for shortest distance queries on special classes of polygons

R. Sridhar K. Han N. Chandrasekharan 《Theoretical computer science》1995,140(2):291-300

The problem of finding a rectilinear minimum bend path (RMBP) between two designated points inside a rectilinear polygon has applications in robotics and motion planning. In this paper, we present efficient algorithms to solve the query version of the RMBP problem for special classes of rectilinear polygons given their visibility graphs. Specifically, we show that given an unweighted graph G = (V, E), with ¦V¦ = N and ¦E¦ = M, algorithms to preprocess G in linear space and time such that the shortest distance queries — queries asking for the distance between any pair of nodes in the graph — can be answered in constant time and space are presented in this paper. For the case of a chordal graph G, our algorithms give a distance which is at most one away from the actual shortest distance. When G is a K-chordal graph, our algorithm produces an exact shortest distance in O(K) time. We also present a non-trivial parallel implementation of the sequential preprocessing algorithm for the CREW-PRAM model which runs in O(log² N) time using O(N + M) processors. After the preprocessing, we can answer the queries in constant time using a single processor. 相似文献

18.

An optimal distributed algorithm for finding articulation points in a network

Pranay Chaudhuri 《Computer Communications》1998,21(18):50-1715

This paper presents a distributed algorithm for finding the articulation points in an n node communication network represented by a connected undirected graph. For a given graph if the deletion of a node splits the graph into two or more components then that node is called an articulation point. The output of the algorithm is available in a distributed manner, i.e., when the algorithm terminates each node knows whether it is an articulation point or not. It is shown that the algorithm requires O(n) messages and O(n) units of time and is optimal in communication complexity to within a constant factor. 相似文献

19.

Parallel on-line parsing in constant time per word

Klaas Sikkel 《Theoretical computer science》1993,120(2):303-310

An on-line parser processes each word as soon as it is typed by the user, without waiting for the end of the sentence. Thus, in an interactive system, a sentence will be parsed almost immediately after the last word has been presented.

The complexity of an on-line parser is determined by the resources needed for the analysis of a single word, as it is assumed that previous words have been processed already. Sequential parsing algorithms like CYK or Earley need O(n²) time for the nth word. A parallel implementation in O(n) time on O(n) processors is straightforward. In this paper a novel parallel on-line parser is presented that needs O(1) time on O(n²) processors. 相似文献

20.

A new parallel and distributed shortest path algorithm forhierarchically clustered data networks

Zhu S. Huang G.M. 《Parallel and Distributed Systems, IEEE Transactions on》1998,9(9):841-855

This paper presents new efficient shortest path algorithms to solve single origin shortest path problems (SOSP problems) and multiple origins shortest path problems (MOSP problems) for hierarchically clustered data networks. To solve an SOSP problem for a network with n nodes, the distributed version of our algorithm reaches the time complexity of O(log(n)), which is less than the time complexity of O(log ² (n)) achieved by the best existing algorithm. To solve an MOSP problem, our algorithm minimizes the needed computation resources, including computation processors and communication links for the computation of each shortest path so that we can achieve massive parallelization. The time complexity of our algorithm for an MOSP problem is O(m log(n)), which is much less than the time complexity of O(M log² (0)) of the best previous algorithm. Here, M is the number of the shortest paths to be computed and m is a positive number related to the network topology and the distribution of the nodes incurring communications, m is usually much smaller than M. Our experiment shows that m is almost a constant when the network size increases. Accordingly, our algorithm is significantly faster than the best previous algorithms to solve MOSP problems for large data networks 相似文献