首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A linear-time algorithm is developed to perform all odd (even) length circular shifts of data in an SIMD (single-instruction-stream, multiple-data-stream) hypercube. As an application, the algorithm is used to obtain an O(M2+log N) time and O(1) memory per processor algorithm to compute the two-dimensional convolution of an N×N image and an M×M template on an N2 processor SIMD hypercube. This improves the previous best complexity of O(M2 log M+log N)  相似文献   

2.
Semigroup and prefix computations on two-dimensional mesh-connected computers with multiple broadcasting (2-MCCMBs) are studied. Previously, only square 2-MCCMBs with N processing elements were considered for semigroup computations of N data items, and O(N1/6) time was required. It is found that square machines are not the best form for semigroup computations, and an O(N1/8)-time algorithm is derived on an N5/8×N3/8 rectangular 2-MCCMB. This time complexity can be further reduced to O(N1/9) if fewer processing elements are used. Parallel algorithms for prefix computations with the same time complexities are derived  相似文献   

3.
A new parallel algorithm is proposed for fat image labeling using local operators on image pixels. The algorithm can be implemented on an n×n mesh-connected computer such that, for any integer k in the range [1, log (2n)], the algorithm requires Θ(kn1k/) bits of local memory per processor and takes Θ(kn) time. Bit-serial processors and communication links can be used without affecting the asymptotic time complexity of the algorithm. The time complexity of the algorithm has very small leading constant factors, which makes it superior to previous mesh computer labeling algorithms for most practical image sizes (e.g. up to 4096×4096 images). Furthermore, the algorithm is based on using stacks that can be realized using very fast shift registers within each processing element  相似文献   

4.
An algorithm is presented for extracting and localizing a common structure in a family of strings with time complexity O(N2 L2 log2 L) where N is the number of strings and L their maximum length. The method could be extended to two-dimensional image analysis. This structure appears as alignments of words which are similar but not necessarily identical and which occur approximately at the same location in all the strings. The method works in two successive stages. First, a fast algorithm is used for drawing up a directory of exactly repeated patterns appearing in a given majority of strings. Second, the algorithm constructs recursively anchoring patterns by a divide-and-conquer strategy and converges on a maximum number of alignments. This algorithm has been applied to find common a priori unknown features in families of biological macromolecules, with quite good results. One of these families included 23 strings of about 100 characters each. Each characteristic structure has been achieved within less than one minute on a MULTIX-DPS8 system  相似文献   

5.
An important midlevel task for computer vision is addressed. The problem consists of labeling connected components in N1/2 ×N2/2 binary images. This task can be solved with parallel computers by using a simple and novel algorithm. The parallel computing model used is a synchronous fine-grained shared-memory model where only one processor can read from or write to the same memory location at a given time. This model is known as the exclusive-read exclusive-write parallel RAM (EREW PRAM). Using this model, the algorithm presented has O(log N) complexity. The algorithm can run on parallel machines other than the EREW PRAM. In particular, it offers an optimal image component labeling algorithm for mesh-connected computers  相似文献   

6.
A novel discrete relaxation architecture   总被引:1,自引:0,他引:1  
The discrete relaxation algorithm (DRA) is a computational technique that enforces arc consistency (AC) in a constraint satisfaction problem (CSP). The original sequential AC-1 algorithm suffers from O(n3m3) time complexity, and even the optimal sequential AC-4 algorithm is O (n2m2) for an n-object and m-label DRA problem. Sample problem runs show that these algorithms are all too slow to meet the need for any useful, real-time CSP applications. A parallel DRA5 algorithm that reaches a lower bound of O(nm) (where the number of processors is polynomial in the problem size) is given. A fine-grained, massively parallel hardware computer architecture has been designed for the DRA5 algorithm. For practical problems, many orders of magnitude of efficiency improvement can be reached on such a hardware architecture  相似文献   

7.
An O(n2) time serial algorithm is developed for obtaining the medial axis transform (MAT) of an n×n image. An O(log n) time CREW PRAM algorithm and an O(log2 n) time SIMD hypercube parallel algorithm for the MAT are also developed. Both of these use O(n2) processors. Two problems associated with the MAT, the area and perimeter reporting problem, are studied. An O(log n) time hypercube algorithm is developed for both of them, where n is the number of squares in the MAT, and the algorithms use O(n2) processors  相似文献   

8.
An algorithm for convolving a k×k window of weighting coefficients with an n×n image matrix on a pyramid computer of O(n2) processors in time O(logn+k2), excluding the time to load the image matrix, is presented. If k=Ω (√log n), which is typical in practice, the algorithm has a processor-time product O(n 2 k2) which is optimal with respect to the usual sequential algorithm. A feature of the algorithm is that the mechanism for controlling the transmission and distribution of data in each processor is finite state, independent of the values of n and k. Thus, for convolving two {0, 1}-valued matrices using Boolean operations rather than the typical sum and product operations, the processors of the pyramid computer are finite-state  相似文献   

9.
Parallel algorithms on SIMD (single-instruction stream multiple-data stream) machines for hierarchical clustering and cluster validity computation are proposed. The machine model uses a parallel memory system and an alignment network to facilitate parallel access to both pattern matrix and proximity matrix. For a problem with N patterns, the number of memory accesses is reduced from O(N 3) on a sequential machine to O(N2) on an SIMD machine with N PEs  相似文献   

10.
The number of distinct entries among the m2n entries of the nth Kronecker power of an m×m matrix is derived. An algorithm to find the value of each entry of the Kronecker power is presented  相似文献   

11.
Exact analytical expressions are obtained for the likelihood and likelihood gradient stationary autoregressive moving average (ARMA) models. Denote the sample size by N, the autoregressive order by p, and the moving average order by q. The calculation of the likelihood requires (p+2q+1)N +o(N) multiply-add operations, and the calculation of the likelihood gradient requires (2p+6q+2)N+o(N) multiply-add operations. These expressions may be used to obtain an iterative, Newton-Raphson-type converging algorithm, with superlinear convergence rate, that computes the maximum-likelihood estimator in (2 p+6q+2)N+o(N) multiply-add operations per iteration  相似文献   

12.
Relations among banyan multistage interconnection networks (MINs) of differing switch sizes are studied. If two N×N networks W and W' have switch sizes r and s, respectively, and if r>s, then W realizes a larger number of permutations than W'. Consequently, the two networks can never be equivalent. However, W may realize all the permutations of W', in which case W is said to functionally cover W' in the strict sense. More generally, W is said to functionally cover W' in the wide sense if the terminals of W can be relabeled so that W realizes all the permutations of W'. Functional covering is topologically characterized, and an optimal algorithm to decide strict functional covering is developed  相似文献   

13.
The transitive closure problem in O(1) time is solved by a new method that is far different from the conventional solution method. On processor arrays with reconfigurable bus systems, two O (1) time algorithms are proposed for computing the transitive closure of an undirected graph. One is designed on a three-dimensional n×n×n processor array with a reconfigurable bus system, and the other is designed on a two-dimensional n2×n2 processor array with a reconfigurable bus system, where n is the number of vertices in the graph. Using the O(1) time transitive closure algorithms, many other graph problems are solved in O(1) time. These problems include recognizing bipartite graphs and finding connected components, articulation points, biconnected components, bridges, and minimum spanning trees in undirected graphs  相似文献   

14.
A parallel memory system for efficient parallel array access using perfect latin squares as skewing functions is discussed. Simple construction methods for building perfect latin squares are presented. The resulting skewing scheme provides conflict free access to several important subsets of an array. The address generation can be performed in constant time with simple circuitry. The skewing scheme can provide constant time access to rows, columns, diagonals, and N1/2 ×N1/2 subarrays of an N× N array with maximum memory utilization. Self-routing Benes networks can be used to realize the permutations needed between the processing elements and the memory modules. Two skewing schemes that provide conflict free access to three-dimensional arrays are also discussed. Combined with self-routing Benes networks, these schemes provide efficient access to frequently used subsets of three-dimensional arrays  相似文献   

15.
A method for model reduction of linear discrete systems is proposed. It is based on the impulse-response Gramian proposed by the authors (1989) for discrete systems. This Gramian is an extension of the one proposed for linear continuous systems and contains information on the input-output behavior of the system. The rth-order reduced-order models are made to retain the first r Markov parameters and the first r×r elements of the impulse-response Gramian of the original system. The relation between this method and the q-Markov cover method is also discussed. The method is illustrated by a numerical example  相似文献   

16.
Efficient parallel processing of image contours   总被引:1,自引:0,他引:1  
Describes two parallel algorithms for ranking the pixels on a curve in O (log N) time using either an EREW or CREW PRAM model. The algorithms accomplish this with N processors for a √N×√N image. After applying such an algorithm to an image, it is possible to move the pixels from a curve into processors having consecutive addresses. This is important because one can subsequently apply many algorithms to the curve (such as piecewise linear approximation algorithms or point in polygon tests) using segmented scan operations (i.e. parallel prefix operations). Scan operations can be executed in logarithmic time on many interconnection networks, such as hypercube, tree, butterfly, and shuffle exchange machines as well as on the EREW PRAM. The algorithms were implemented on the hypercube structured Connection Machine, and various performance tests were conducted  相似文献   

17.
Using a directed acyclic graph (DAG) model of algorithms, the paper focuses on time-minimal multiprocessor schedules that use as few processors as possible. Such a processor-time-minimal scheduling of an algorithm's DAG first is illustrated using a triangular shaped 2-D directed mesh (representing, for example, an algorithm for solving a triangular system of linear equations). Then, algorithms represented by an n×n×n directed mesh are investigated. This cubical directed mesh is fundamental; it represents the standard algorithm for computing matrix product as well as many other algorithms. Completion of the cubical mesh required 3n-2 steps. It is shown that the number of processing elements needed to achieve this time bound is at least [3n2/4]. A systolic array for the cubical directed mesh is then presented. It completes the mesh using the minimum number of steps and exactly [3n 2/4] processing elements it is processor-time-minimal. The systolic array's topology is that of a hexagonally shaped, cylindrically connected, 2-D directed mesh  相似文献   

18.
A closed-form expression has been reported in the literature for LN, the number of digital line segments of length N that correspond to lines of the form y=ax+β, O⩽α, β<1. The authors prove an asymptotic estimate for LN that might prove useful for many applications, namely, LN=N 32+O(N2 log N). An application to an image registration problem is given  相似文献   

19.
Even though exact algorithms exist for permutation routine of n2 messages on a n×n mesh of processors which require constant size queues, the constants are very large and the algorithms very complicated to implement. A novel, simple heuristic for the above problem is presented. It uses constant and very small size queues (size=2). For all the simulations run on randomly generated data, the number of routing steps that is required by the algorithm is almost equal to the maximum distance a packet has to travel. A pathological case is demonstrated where the routing takes more than the optimal, and it is proved that the upper bound on the number of required steps is O(n2). Furthermore, it is shown that the heuristic routes in optimal time inversion, transposition, and rotations, three special routing problems that appear very often in the design of parallel algorithms  相似文献   

20.
A parallel algorithm is proposed for the two-dimensional discrete Fourier transform (2-D DFT) computation which eliminates interprocessor communications and uses only O(N) processors. The mapping of the algorithm onto architectures with broadcast and report capabilities is discussed. Expressions are obtained for estimating the speed performance on these machines as a function of the size N×N of the 2-D DFT, the bandwidth of the communications channel, the time for an addition, the time T( FN) for a single processing element to perform an N-point DFT, and the degree of parallelism. For single I/O channel machines that are capable of exploiting the full degree of parallelism of the algorithm, attainable execution times are as low as the time T(FN) plus the I/O time for data upload and download. An implementation on a binary tree computer is discussed  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号