首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This paper makes an improvement of computing two nearest-neighbor problems of images on a reconfigurable array of processors (RAP) by increasing the bus width between processors. Based on a base-n system, a constant time algorithm is first presented for computing the maximum/minimum of N log N-bit unsigned integers on a RAP using N processors each with N1/c-bit bus width, where c is a constant and c ≥ 1. Then, two basic operations such as image component labeling and border following are also derived from it. Finally, these algorithms are used to design two constant time algorithms for the nearest neighbor black pixel and the nearest neighbor component problems on an N1/2 × N1/2 image using N1/2 × N1/2 processors each with N1/c-bit bus width, where c is a constant and c ≥ 1. Another contribution of this paper is that the execution time of the proposed algorithms is tunable by the bus width.  相似文献   

2.
Parallel clustering algorithms   总被引:3,自引:0,他引:3  
Clustering techniques play an important role in exploratory pattern analysis, unsupervised learning and image segmentation applications. Many clustering algorithms, both partitional clustering and hierarchical clustering, require intensive computation, even for a modest number of patterns. This paper presents two parallel clustering algorithms. For a clustering problem with N = 2n patterns and M = 2m features, the time complexity of the traditional partitional clustering algorithm on a single processor computer is O(MNK), where K is the number of clusters. The proposed algorithm on anSIMD computer with MN processors has a time complexity O(K(n + m)). The time complexity of the proposed single-link hierarchical clustering algorithm is reduced from O(MN2) of the uniprocessor algorithm to O(nN) with MN processors.  相似文献   

3.
The paper presents parallel algorithms for solving Poisson equation at N2 mesh points. The methods based on marching techniques are structured for efficient parallel realization. Using orthogonal decomposition properties of arising matrices, the algorithms can be formulated in terms of transformed vectors. On a MIMD computer with not more than N processors, the computations can be performed in horizontal slices with minimal synchronization requirements. Considering an SIMD machine with N2 processors, the complexity bound O(log N) has been achieved, whereby the single marching requires 10 log N steps only.  相似文献   

4.
Based on the current fiber optic technology, a new computational model, called a linear array with a reconfigurable pipelined abus system (LARPBS), is proposed in this paper. A parallel quicksort algorithm is implemented on the model, and its time complexity is analyzed. For a set of N numbers, the quicksort algorithm reported in this paper runs in O(log2 N) average time on a linear array with a reconfigurable pipelined bus system of size N. If the number of processors available is reduced to P, where P < N, the algorithm runs in O((N/P) log2 N) average time and is still scalable. Besides proposing a new algorithm on the model, some basic data movement operations involved in the algorithm are discussed. We believe that these operations can be used to design other parallel algorithms on the same model. Future research in this area is also identified in this paper.  相似文献   

5.
Nested dissection is a very popular direct method for solving sparse linear systems that arise from finite difference and finite element methods. Worley and Schreiber [16] give a fine grain algorithm for a square array of processors. Their algorithm uses O(N2) processors, each with O(N) memory, to factor an N2 by N2 sparse matrix whose graphs is an N × N mesh. The efficiency of their method is between 1/46 and 1/12. George et al. [6] [8] give a medium grain algorithm for hypercube architecture, while George et al. [7] give an algorithm for shared memory machines. These papers present a column oriented approach which can exploit O(N) parallelism and yield efficiencies up to 50%. Lucas [11] also gives a column oriented scheme which achieves up to 75% efficiency and O(N) parallelism. In this paper, we present a medium to fine grain algorithm for a P × P array of processors with local memory. This algorithm can exploit up to O(N2) parallelism. The efficiency of the fine grain version is comparable to [16] while as a medium grain algorithm achieves about 49% efficiency. The strength of the method is due to three factors: its ability to pipeline much of the computation, overlapping computation and communication, and the use of level 3 BLAS like primitives. In addition to its high efficiency its memory requirement is optimal, only O(N2 log N/P2) words memory is needed per processor.  相似文献   

6.
Stphane 《Pattern recognition》1995,28(12):1993-2000
We propose a parallel thinning algorithm for binary pictures. Given an N × N binary image including an object, our algorithm computes in O(N2) the skeleton of the object, using a pyramidal decomposition of the picture. The behavior of this algorithm is studied considering a family of digitalization of the same object at a different level of resolution. With the Exclusive Read Exclusive Write (EREW) Parallel Random Access Machine (PRAM), our algorithm runs in O(log N) time using O(N2/logN) processors and it is work-optimal. The same result is obtained with high-connectivity distributed memory SIMD machines having strong hypercube and pyramid. We describe the basic operator, the pyramidal algorithm and some experimental results on the SIMD MasPar parallel machine.  相似文献   

7.
W.A.  H.J. 《Pattern recognition》1995,28(12):1985-1992
A fast digital Radon transform based on recursively defined digital straight lines is described, which has the sequential complexity of N2 log N additions for an N × N image. This transform can be used to evaluate the Hough transform to detect straight lines in a digital image. Whilst a parallel implementation of the Hough transform algorithm is difficult because of global memory access requirements, the fast digital Radon transform is vectorizable and therefore well suited for parallel computation. The structure of the fast algorithm is shown to be quite similar to the FFT algorithm for decimation in frequency. It is demonstrated that even for sequential computation the fast Radon transform is an attractive alternative to the classical Hough transform algorithm.  相似文献   

8.
In this paper, a distributed selectsort algorithm and a parameterized selectsort algorithm are presented to be applied on distributed systems for cases when N P where N is the number of elements to be sorted and P is the number of processors in the system. The distributed system considered in this paper uses a broadcasting channel for communication between processors. We show that the number of messages required for the parameterized selectsort algorithm is independent of N and is of complexity O(P), which is optimal in a distributed system with P processors. Furthermore, the amount of communication required in terms of elements is N + O(P3) and the computation time complexity is O((N/P)lgN + P2lg(N/P)). Hence, when N P3, the computation time complexity is O((N/P)lgN), which is optimal using P processors. In addition, this parameterized algorithm provides us with a parameter K such that by choosing the value of K allows us to trade among processing requirement, memory requirement, and communication requirement. It is shown that this parameterized algorithm can reduce the communication requirements significantly while only slightly increasing the computation requirements.  相似文献   

9.
Parallel algorithms for solving the satisfaction problem of non-trivial functional and multivalued data dependencies (FDs and MVDs) in a relation of N tuples by M processors are developed in this paper. Algorithms performing, in a parallel manner, batch or interactive checking of these data dependencies are also discussed. The M processors are organized as a linear systolic array. The time complexities of the first two algorithms for solving the FD satisfaction problem under M N are both O(N), and that of Algorithm (3) or (4) for solving the FD or MVD satisfaction problem under N M is O(N2/M). The latter complexity reduced to O(N) if N = M and is at least not worse than O(N log N) if N = M (N/log N).  相似文献   

10.
The paper presents an optimal systolic array architecture for rapid solution of dense systems of linear equations. The array solves a system of size n×n in 4n + 1 time units including I/0 time. Data communications are strictly local and the processing elements (PEs) are simple. The complete three-phase solution algorithm is executed on a single array, employing about 3n2/2 PEs without any need for costly inter-phase I/0. Due to a novel data steering mechanism, the three algorithmic phases are maximally overlapped. Design optimality is established using systolic precedence diagrams. It is also shown that merging the functions of two adjacent PEs into a single PE is possible resulting in maximal PE utilization. An interesting result regarding cascading phase-optimal arrays is obtained.  相似文献   

11.
In this paper, we consider the problem of recognizing whether a given network is a rectangular mesh. We present an efficient distributed algorithm with an O(N) message and time complexity, where N is the number of nodes in the network. This is an improvement of a previous algorithm presented in Mohan (1990) with a message complexity of O(N log N) and time complexity of O(N1.6). The proposed algorithm is constructive in nature and also assigns coordinates to the nodes of the network.  相似文献   

12.
Yi Pan  Keqin Li 《Information Sciences》1999,120(1-4):209-221
The computation of Euclidean distance maps (EDM), also called Euclidean distance transform, is a basic operation in computer vision, pattern recognition, and robotics. Fast computation of the EDM is needed since most of the applications using the EDM require real-time computation. It is shown in L. Chen and H.Y.H. Chuang [Information Processing Letters, 51, pp. 25–29 (1994)] that a lower bound Ω(n2) is required for any sequential EDM algorithm due to the fact that in any EDM algorithm each of the n2 pixels has to be scanned at least once. Recently, many parallel EDM algorithms have been proposed to speedup its computation. Chen and Chuang proposed an algorithm for computing the EDM on an n×n mesh in O(n) time [L. Chen and H.Y.H. Chuang Parallel Computing, 21, pp. 841–852 (1995)]. Clearly, the VLSI complexities of both the sequential and the mesh algorithm described in L. Chen and H.Y.H. Chuang [Parallel Computing, 21, pp. 841–852 (1995)] are AT2=O(n4), where A is the VLSI layout area of the design and T is the computation time using area A when implemented in VLSI. In this paper, we propose a new and faster parallel algorithm for computing the EDM problem on the reconfigurable VLSI mesh model. For the same problem, our algorithm runs in O(1) time on a two-dimensional n2×n2 reconfigurable mesh. We show that the VLSI complexity of our algorithm is the same as those of the above sequential algorithm and the mesh algorithm, while it uses much less time. To our best knowledge, this is the first constant-time EDM algorithm on any parallel computational model.  相似文献   

13.
Steven L. 《Pattern recognition》1995,28(12):1965-1972
Two fast algorithms for median filtering of images using parallel computers having 2-D mesh interconnections are given. Both algorithms assume that an n × n image is loaded onto the mesh with one processing element per pixel. One algorithm performs median filtering over d × d neighborhoods in O(d2) time and works with pixel values in an arbitrarily large range. This algorithm, while theoretically suboptimal, achieves a lower constant than a previously published asymptotically—optimal algorithm and is simpler to program. The second algorithm assumes that the range of pixel values is limited and relatively small, and it accomplishes median filtering in O(d) time.  相似文献   

14.
We substantially improve the known algorithms for approximating all the complex zeros of an nth degree polynomial p(x). Our new algorithms save both Boolean and arithmetic sequential time, versus the previous best algorithms of Schönhage [1], Pan [2], and Neff and Reif [3]. In parallel (NC) implementation, we dramatically decrease the number of processors, versus the parallel algorithm of Neff [4], which was the only NC algorithm known for this problem so far. Specifically, under the simple normalization assumption that the variable x has been scaled so as to confine the zeros of p(x) to the unit disc x : |x| ≤ 1, our algorithms (which promise to be practically effective) approximate all the zeros of p(x) within the absolute error bound 2b, by using order of n arithmetic operations and order of (b + n)n2 Boolean (bitwise) operations (in both cases up to within polylogarithmic factors). The algorithms allow their optimal (work preserving) NC parallelization, so that they can be implemented by using polylogarithmic time and the orders of n arithmetic processors or (b + n)n2 Boolean processors. All the cited bounds on the computational complexity are within polylogarithmic factors from the optimum (in terms of n and b) under both arithmetic and Boolean models of computation (in the Boolean case, under the additional (realistic) assumption that n = O(b)).  相似文献   

15.
The major concern of this paper is the systolic realization of Jacobi algorithms with a cyclic-by-rows iteration scheme. This aim can be achieved by construction of an algorithm which is well suited for parallel processing and is essentially equivalent to the above-mentioned type of algorithm. Moreover, a large variety of applications for Jacobi algorithms is presented as well as a comparison to other parallel schemes for the same problem. Finally, a systolic array is derived which requires (n + 1)2/4 processing cells and has a time complexity of O(n) for each sweep.  相似文献   

16.
尤洁  李劲    张赛  李婷 《智能系统学报》2019,14(4):761-768
针对已有链路预测算法复杂度高,不适于在大规模图上进行链接预测的问题,本文基于图勾勒近似技术对已有链路预测方法进行优化,提出了基于图勾勒的链路预测方法。该方法将链路预测算法的计算复杂度由On3)降低至On2k2log2n)。为进一步提高链接预测效率,给出了基于Spark的并行化链路预测实现方法。在真实图数据集上进行测试,实验结果表明本文方法在保证链接预测精度的前提下,可有效提升算法效率。  相似文献   

17.
A cross-bridge reconfigurable array of processors is a parallel processing system which has the ability to change dynamically the supported interconnection scheme during the execution of an algorithm. Based on this architecture, several O(1) time basic operations such as the transpose, the untranspose, the shift, the unshift and the prefix sum of a binary sequence are first proposed. Then, these basic operations can be used to find the kth smallest element of N m bits unsigned integers in O(m) time using N processors and to sort N data items in O(1) time using O(N5/3) processors instead of using O(N2) processors as those proposed by other researchers  相似文献   

18.
This paper proposes a high speed multi-level-parallel array processor for programmable vision chips.This processor includes 2-D pixel-parallel processing element(PE)array and 1-D row-parallel row processor(RP)array.The two arrays both operate in a single-instruction multiple-data(SIMD)fashion and share a common instruction decoder.The sizes of the arrays are scalable according to dedicated applications.In PE array,each PE can communicate not only with its nearest neighbor PEs,but also with the next near neighbor PEs in diagonal directions.This connection can help to speed up local operations in low-level image processing.On the other hand,global operations in mid-level processing are accelerated by the skipping chain and binary boosters in RP array.The array processor was implemented on an FPGA device,and was successfully tested for various algorithms,including real-time face detection based on PPED algorithm.The results show that the image processing speed of proposed processor is much higher than that of the state-of-the-arts digital vision chips.  相似文献   

19.
This paper addresses parallel execution of chain code generation on a linear array architecture. The contours in the proposed algorithm are viewed as a set of edges (or contour segments) that can be traced by a top-down contour tracing method to generate the chain codes for the outer and inner object contours. A parallel algorithm that contains the chain code generating rules and operations needed is also described, and the algorithm is mapped onto a one-dimensional systolic array containing processing elements (PEs) to devise this architecture. The architecture extracts the contours of objects and quickly generates the corresponding chain codes after the image data in all rows are inputted in a linear fashion. The total processing time for generating the chain codes in an N×N image is O(3N). By doing so, the real-time requirement is fulfilled and its execution time is independent of the image content. In addition, a partition method is developed to process an image when the parallel architecture has a fixed number of PEs; say two or more. The total execution time for an N×N image by employing a fixed number of PEs is N(N+1)/M+2(M−1), when M is the fixed number of PEs.  相似文献   

20.
快速Hadamard变换被广泛应用于信号与图像处理、通信系统、数字逻辑等领域中.当问题规模非常大时,快速Hadamard变换有可能不能满足计算时间的要求;这种情况下,算法并行化是一种行之有效的手段.本文以单像素相机的压缩感知图像复原为应用背景,利用基二快速Hadamard变换与快速傅里叶变换的结构相似性,提出一种通用的基二快速Hadamard变换的任务级并行算法,并用构造方式证明了该并行算法与串行算法计算结果之间的等价性.仿真表明对于小于220向量长度的问题规模以及并行子任务数少于210的情况,该并行算法对比串行算法的数值计算结果的欧氏距离平方误差小于10-18,佐证了并行算法的正确性.在PC平台通过多核CPU上POSIX线程实现的实验表明:在该特定平台和特定配置上对于220至225向量长度的问题规模并行计算加速比为1.33~1.42,证明了文中提出方法的可行性和有效性.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号