首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
An algorithm for convolving a k×k window of weighting coefficients with an n×n image matrix on a pyramid computer of O(n2) processors in time O(logn+k2), excluding the time to load the image matrix, is presented. If k=Ω (√log n), which is typical in practice, the algorithm has a processor-time product O(n 2 k2) which is optimal with respect to the usual sequential algorithm. A feature of the algorithm is that the mechanism for controlling the transmission and distribution of data in each processor is finite state, independent of the values of n and k. Thus, for convolving two {0, 1}-valued matrices using Boolean operations rather than the typical sum and product operations, the processors of the pyramid computer are finite-state  相似文献   

2.
An adaptive parallel algorithm for inducing a priority queue structure on an n-element array is presented. The algorithm is extended to provide optimal parallel construction algorithms for three other heap-like structures useful in implementing double-ended priority queues, namely min-max heaps, deeps, and min-max-pair heaps. It is shown that an n-element array can be made into a heap, a deap, a min-max heap, or a min-max-pair heap in O(log n+(n /p)) time using no more than n/log n processors, in the exclusive-read-exclusive-write parallel random-access machine model  相似文献   

3.
An O(n2) time serial algorithm is developed for obtaining the medial axis transform (MAT) of an n×n image. An O(log n) time CREW PRAM algorithm and an O(log2 n) time SIMD hypercube parallel algorithm for the MAT are also developed. Both of these use O(n2) processors. Two problems associated with the MAT, the area and perimeter reporting problem, are studied. An O(log n) time hypercube algorithm is developed for both of them, where n is the number of squares in the MAT, and the algorithms use O(n2) processors  相似文献   

4.
The job scheduling problem in a partitionable mesh-connected system in which jobs require square meshes and the system is a square mesh whose size is a power of two is discussed. A heuristic algorithm of time complexity O(n(log n+log p)), in which n is the number of jobs to be scheduled and p is the size of the system is presented. The algorithm adopts the largest-job-first scheduling policy and uses a two-dimensional buddy system as the system partitioning scheme. It is shown that, in the worst case, the algorithm produces a schedule four times longer than an optimal schedule, and, on the average, schedules generated by the algorithm are twice as long as optimal schedules  相似文献   

5.
A hypercube algorithm to solve the list ranking problem is presented. Let n be the length of the list, and let p be the number of processors of the hypercube. The algorithm described runs in time O(n/p) when n=Ω(p 1+ε) for any constant ε>0, and in time O(n log n/p+log3 p) otherwise. This clearly attains a linear speedup when n=Ω(p 1+ε). Efficient balancing and routing schemes had to be used to achieve the linear speedup. The authors use these techniques to obtain efficient hypercube algorithms for many basic graph problems such as tree expression evaluation, connected and biconnected components, ear decomposition, and st-numbering. These problems are also addressed in the restricted model of one-port communication  相似文献   

6.
Parallel algorithms on SIMD (single-instruction stream multiple-data stream) machines for hierarchical clustering and cluster validity computation are proposed. The machine model uses a parallel memory system and an alignment network to facilitate parallel access to both pattern matrix and proximity matrix. For a problem with N patterns, the number of memory accesses is reduced from O(N 3) on a sequential machine to O(N2) on an SIMD machine with N PEs  相似文献   

7.
Two arrays of numbers sorted in nondecreasing order are given: an array A of size n and an array B of size m, where n<m. It is required to determine, for every element of A, the smallest element of B (if one exists) that is larger than or equal to it. It is shown how to solve this problem on the EREW PRAM (exclusive-read exclusive-write parallel random-access machine) in O(logm logn/log log m) time using n processors. The solution is then extended to the case in which fewer than n processors are available. This yields an EREW PRAM algorithm for the problem whose cost is O(n log m, which is O(m)) for nm/log m. It is shown how the solution obtained leads to an improved parallel merging algorithm  相似文献   

8.
A parallel sorting algorithm for sorting n elements evenly distributed over 2d p nodes of a d-dimensional hypercube is presented. The average running time of the algorithm is O((n log n)/p+p log 2n). The algorithm maintains a perfect load balance in the nodes by determining the (kn/p)th elements (k1,. . ., (p-1)) of the final sorted list in advance. These p-1 keys are used to partition the sorted sublists in each node to redistribute data to the nodes to be merged in parallel. The nodes finish the sort with an equal number of elements (n/ p) regardless of the data distribution. A parallel selection algorithm for determining the balanced partition keys in O(p log2n) time is presented. The speed of the sorting algorithm is further enhanced by the distance-d communication capability of the iPSC/2 hypercube computer and a novel conflict-free routing algorithm. Experimental results on a 16-node hypercube computer show that the sorting algorithm is competitive with the previous algorithms and faster for skewed data distributions  相似文献   

9.
Efficient parallel processing of image contours   总被引:1,自引:0,他引:1  
Describes two parallel algorithms for ranking the pixels on a curve in O (log N) time using either an EREW or CREW PRAM model. The algorithms accomplish this with N processors for a √N×√N image. After applying such an algorithm to an image, it is possible to move the pixels from a curve into processors having consecutive addresses. This is important because one can subsequently apply many algorithms to the curve (such as piecewise linear approximation algorithms or point in polygon tests) using segmented scan operations (i.e. parallel prefix operations). Scan operations can be executed in logarithmic time on many interconnection networks, such as hypercube, tree, butterfly, and shuffle exchange machines as well as on the EREW PRAM. The algorithms were implemented on the hypercube structured Connection Machine, and various performance tests were conducted  相似文献   

10.
An approach to vertical partitioning in relational databases in which the attributes of a relation are partitioned according to a set of transactions is proposed. The objective of vertical partitioning is to minimize the number of disk accesses in the system. Since transactions have more semantic meanings than attributes, this approach allows the optimization of the partitioning based on a selected set of important transactions. An optimal binary partitioning (OBP) algorithm based on the branch and bound method is presented, with the worst case complexity of O(2n), where n is the number of transactions. To handle systems with a large number of transactions, an algorithm BPi with complexity varying from O(n) to O(2n) is also developed. The experimental results reveal that the performance of vertical partitioning is sensitive to the skewness of transaction accesses. Further, BPi converges rather rapidly to OBP. Both OBP and BPi yield results comparable with that of global optimum obtained from an exhaustive search  相似文献   

11.
Squared error clustering algorithms for single-instruction multiple-data (SIMD) hypercubes are presented. The algorithms are shown to be asymptotically faster than previously known algorithms and require less memory per processing element (PE). For a clustering problem with N patterns, M features per pattern, and K clusters, the algorithms complete in O(k+log NM ) steps on NM processor hypercubes. This is optimal up to a constant factor. These results are extended to the case in which NMK processors are available. Experimental results from a multiple-instruction, multiple-data (MIMD) medium-grain hypercube are also presented  相似文献   

12.
A family of intervals on the real line provides a natural model for a vast number of scheduling and VLSI problems. Recently, a number of parallel algorithms to solve a variety of practical problems on such a family of intervals have been proposed in the literature. The authors develop computational tools and show how they can be used for the purpose of devising cost-optimal parallel algorithms for a number of interval-related problems, including finding a largest subset of pairwise nonoverlapping intervals, a minimum dominating subset of intervals, along with algorithms to compute the shortest path between a pair of intervals and, based on the shortest path, a parallel algorithm to find the center of the family of intervals. More precisely, with an arbitrary family of n intervals as input, all the algorithms run in O(log n) time using O(n) processors in the EREW-PRAM model of computation  相似文献   

13.
Performing reduction operations with distributed memory machines whose interconnection networks are reconfigurable is considered. The focus is on machines whose interconnection graph can be configured as any graph of maximum degree d. The best way of interconnecting the p processors as a function of p,d and some problem- and machine-dependent parameters that characterize the ratio communication/arithmetic for the reduction operation are discussed. Experiments on transputer-based networks are in good accordance with the theoretical results  相似文献   

14.
A distributed knot detection algorithm for general graphs is presented. The knot detection algorithm uses at most O(n log n+m) messages and O(m+n log n) bits of memory to detect all knots' nodes in the network (where n is the number of nodes and m is the number of links). This is compared to O(n2) messages needed in the best algorithm previously published. The knot detection algorithm makes use of efficient cycle detection and clustering techniques. Various applications for the knot detection algorithms are presented. In particular, its importance to deadlock detection in store and forward communication networks and in transaction systems is demonstrated  相似文献   

15.
Exact analytical expressions are obtained for the likelihood and likelihood gradient stationary autoregressive moving average (ARMA) models. Denote the sample size by N, the autoregressive order by p, and the moving average order by q. The calculation of the likelihood requires (p+2q+1)N +o(N) multiply-add operations, and the calculation of the likelihood gradient requires (2p+6q+2)N+o(N) multiply-add operations. These expressions may be used to obtain an iterative, Newton-Raphson-type converging algorithm, with superlinear convergence rate, that computes the maximum-likelihood estimator in (2 p+6q+2)N+o(N) multiply-add operations per iteration  相似文献   

16.
A novel discrete relaxation architecture   总被引:1,自引:0,他引:1  
The discrete relaxation algorithm (DRA) is a computational technique that enforces arc consistency (AC) in a constraint satisfaction problem (CSP). The original sequential AC-1 algorithm suffers from O(n3m3) time complexity, and even the optimal sequential AC-4 algorithm is O (n2m2) for an n-object and m-label DRA problem. Sample problem runs show that these algorithms are all too slow to meet the need for any useful, real-time CSP applications. A parallel DRA5 algorithm that reaches a lower bound of O(nm) (where the number of processors is polynomial in the problem size) is given. A fine-grained, massively parallel hardware computer architecture has been designed for the DRA5 algorithm. For practical problems, many orders of magnitude of efficiency improvement can be reached on such a hardware architecture  相似文献   

17.
Several fast and space-optimal sequential and parallel algorithms for solving the satisfaction problem of functional and multivalued dependencies (FDs and MVDs) are presented. Two frameworks to verify an MVD for a relation and their implementation by exploring the existing fast space-optimal sorting techniques are described. The space optimality means that only a constant amount of extra memory space is needed for the sequential implementations, and O(M) amount of extra memory space for parallel algorithms that use M processors. This feature makes the algorithms attractive whenever space is a critical resource and I/O transfers should be reduced to the minimal, as is often the case for relational database systems. The time requirements for in-place FD and MVD verification are given in terms of M and of N, which is the number of tuples in a relation. The effect of relation modification on FD and MVD verification is examined  相似文献   

18.
For the comparison-based self-diagnosis of multiprocessor systems, an extended model that considers both processor and comparator faults is presented. It is shown that in this model the system diagnosability is tZδ/2Z, where δ is the minimum vertex degree of the system graph. However, if the number of faulty comparators is assumed not to exceed the number of faulty processors, the diagnosability of the model reaches t⩽δ. An optimal O(|E|) algorithm, where E is the set of comparators, is given for identifying all faulty processors and comparators, provided that the total number of faulty components does not exceed the system diagnosability, and an O(|E|)2 algorithm for the case t⩽δ is also presented. These efficient algorithms determine the faulty processors by calculating each processor's weight, which is mainly defined by the number of adjacent relative tests stating `agreement'. After sorting the processors according to their weights, the algorithms determine all faulty components by separating the sorted processor list  相似文献   

19.
The problem of distributed leader election in an asynchronous complete network, in the presence of faults that occurred prior to the execution of the election algorithm, is discussed. Failures of this type are encountered, for example, during a recovery from a crash in the network. For a network with n processors, k of which start the algorithm that uses at most O(n log k +n+kt) messages is presented and shown to be optimal. An optimal algorithm for the case where the identities of the neighbors are known is also presented. It is noted that the order of the message complexity of a t-resilient algorithm is not always higher than that of a nonresilient one. The t-resilient algorithm is a systematic modification of an existing algorithm for a fault-free network  相似文献   

20.
Parallel algorithms for several important combinatorial problems such as the all nearest smaller values problem, triangulating a monotone polygon, and line packing are presented. These algorithms achieve linear speedups on the pipelined hypercube, and provably optimal speedups on the shuffle-exchange and the cube-connected-cycles for any number p of processors satisfying 1⩽pn/((log3n)(loglog n)2), where n is the input size. The lower bound results are established under no restriction on how the input is mapped into the local memories of the different processors  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号