期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Two-dimensional convolution on a pyramid computer

Chang J.H. Ibarra O.H. Pong T.-C. Sohn S.M. 《IEEE transactions on pattern analysis and machine intelligence》1988,10(4):590-593

An algorithm for convolving a k×k window of weighting coefficients with an n×n image matrix on a pyramid computer of O(n²) processors in time O(logn+k²), excluding the time to load the image matrix, is presented. If k=Ω (√log n), which is typical in practice, the algorithm has a processor-time product O(n ² k²) which is optimal with respect to the usual sequential algorithm. A feature of the algorithm is that the mechanism for controlling the transmission and distribution of data in each processor is finite state, independent of the values of n and k. Thus, for convolving two {0, 1}-valued matrices using Boolean operations rather than the typical sum and product operations, the processors of the pyramid computer are finite-state 相似文献

2.

A novel discrete relaxation architecture 总被引：1，自引：0，他引：1

Gu J. Wang W. 《IEEE transactions on pattern analysis and machine intelligence》1992,14(8):857-865

The discrete relaxation algorithm (DRA) is a computational technique that enforces arc consistency (AC) in a constraint satisfaction problem (CSP). The original sequential AC-1 algorithm suffers from O(n³m³) time complexity, and even the optimal sequential AC-4 algorithm is O (n²m²) for an n-object and m-label DRA problem. Sample problem runs show that these algorithms are all too slow to meet the need for any useful, real-time CSP applications. A parallel DRA5 algorithm that reaches a lower bound of O(nm) (where the number of processors is polynomial in the problem size) is given. A fine-grained, massively parallel hardware computer architecture has been designed for the DRA5 algorithm. For practical problems, many orders of magnitude of efficiency improvement can be reached on such a hardware architecture 相似文献

3.

An efficient distributed knot detection algorithm

Cidon I. 《IEEE transactions on pattern analysis and machine intelligence》1989,15(5):644-649

A distributed knot detection algorithm for general graphs is presented. The knot detection algorithm uses at most O(n log n+m) messages and O(m+n log n) bits of memory to detect all knots' nodes in the network (where n is the number of nodes and m is the number of links). This is compared to O(n²) messages needed in the best algorithm previously published. The knot detection algorithm makes use of efficient cycle detection and clustering techniques. Various applications for the knot detection algorithms are presented. In particular, its importance to deadlock detection in store and forward communication networks and in transaction systems is demonstrated 相似文献

4.

Comments on `Parallel algorithms for hierarchical clustering andcluster validity'

Murtagh F. 《IEEE transactions on pattern analysis and machine intelligence》1992,14(10):1056-1057

In the above-titled paper (ibid., vol.12, no.11, p.1088-92, Nov. 1990), parallel implementations of hierarchical clustering algorithms that achieve O(n²) computational time complexity and thereby improve on the baseline of sequential implementations are described. The latter are stated to be O( n³), with the exception of the single-link method. The commenter points out that state-of-the-art hierarchical clustering algorithms have O(n²) time complexity and should be referred to in preference to the O(n³) algorithms, which were described in many texts in the 1970s. Some further references in the parallelizing of hierarchic clustering algorithms are provided 相似文献

5.

Odd even shifts in SIMD hypercubes

Ranka S. Sahni S. 《Parallel and Distributed Systems, IEEE Transactions on》1990,1(1):77-82

A linear-time algorithm is developed to perform all odd (even) length circular shifts of data in an SIMD (single-instruction-stream, multiple-data-stream) hypercube. As an application, the algorithm is used to obtain an O(M²+log N) time and O(1) memory per processor algorithm to compute the two-dimensional convolution of an N×N image and an M×M template on an N² processor SIMD hypercube. This improves the previous best complexity of O(M² log M+log N) 相似文献

6.

Efficient algorithms for list ranking and for solving graphproblems on the hypercube

Ryu K.W. Jaja J. 《Parallel and Distributed Systems, IEEE Transactions on》1990,1(1):83-90

A hypercube algorithm to solve the list ranking problem is presented. Let n be the length of the list, and let p be the number of processors of the hypercube. The algorithm described runs in time O(n/p) when n=Ω(p ^1+ε) for any constant ε>0, and in time O(n log n/p+log³ p) otherwise. This clearly attains a linear speedup when n=Ω(p ^1+ε). Efficient balancing and routing schemes had to be used to achieve the linear speedup. The authors use these techniques to obtain efficient hypercube algorithms for many basic graph problems such as tree expression evaluation, connected and biconnected components, ear decomposition, and st-numbering. These problems are also addressed in the restricted model of one-port communication 相似文献

7.

Constant time algorithms for the transitive closure and somerelated graph problems on processor arrays with reconfigurable bussystems

Wang B.-F. Chen G.-H. 《Parallel and Distributed Systems, IEEE Transactions on》1990,1(4):500-507

The transitive closure problem in O(1) time is solved by a new method that is far different from the conventional solution method. On processor arrays with reconfigurable bus systems, two O (1) time algorithms are proposed for computing the transitive closure of an undirected graph. One is designed on a three-dimensional n×n×n processor array with a reconfigurable bus system, and the other is designed on a two-dimensional n²×n² processor array with a reconfigurable bus system, where n is the number of vertices in the graph. Using the O(1) time transitive closure algorithms, many other graph problems are solved in O(1) time. These problems include recognizing bipartite graphs and finding connected components, articulation points, biconnected components, bridges, and minimum spanning trees in undirected graphs 相似文献

8.

An efficient heuristic for permutation packet routing on mesheswith low buffer requirements

Makedon F. Symvonis A. 《Parallel and Distributed Systems, IEEE Transactions on》1993,4(3):270-276

Even though exact algorithms exist for permutation routine of n² messages on a n×n mesh of processors which require constant size queues, the constants are very large and the algorithms very complicated to implement. A novel, simple heuristic for the above problem is presented. It uses constant and very small size queues (size=2). For all the simulations run on randomly generated data, the number of routing steps that is required by the algorithm is almost equal to the maximum distance a packet has to travel. A pathological case is demonstrated where the routing takes more than the optimal, and it is proved that the upper bound on the number of required steps is O(n²). Furthermore, it is shown that the heuristic routes in optimal time inversion, transposition, and rotations, three special routing problems that appear very often in the design of parallel algorithms 相似文献

9.

Designing efficient parallel algorithms on mech-connected computerswith multiple broadcasting

Chen Y.-C. Chen W.-T. Chen G.-H. Sheu J.-P. 《Parallel and Distributed Systems, IEEE Transactions on》1990,1(2):241-246

Semigroup and prefix computations on two-dimensional mesh-connected computers with multiple broadcasting (2-MCCMBs) are studied. Previously, only square 2-MCCMBs with N processing elements were considered for semigroup computations of N data items, and O(N^1/6) time was required. It is found that square machines are not the best form for semigroup computations, and an O(N^1/8)-time algorithm is derived on an N^5/8×N^3/8 rectangular 2-MCCMB. This time complexity can be further reduced to O(N^1/9) if fewer processing elements are used. Parallel algorithms for prefix computations with the same time complexities are derived 相似文献

10.

Parallel binary search

Akl S.G. Meijer H. 《Parallel and Distributed Systems, IEEE Transactions on》1990,1(2):247-250

Two arrays of numbers sorted in nondecreasing order are given: an array A of size n and an array B of size m, where n<m. It is required to determine, for every element of A, the smallest element of B (if one exists) that is larger than or equal to it. It is shown how to solve this problem on the EREW PRAM (exclusive-read exclusive-write parallel random-access machine) in O(logm logn/log log m) time using n processors. The solution is then extended to the case in which fewer than n processors are available. This yields an EREW PRAM algorithm for the problem whose cost is O(n log m, which is O(m)) for n⩽m/log m. It is shown how the solution obtained leads to an improved parallel merging algorithm 相似文献

11.

Optimal parallel initialization algorithms for a class of priorityqueues

Olariu S. Wen Z. 《Parallel and Distributed Systems, IEEE Transactions on》1991,2(4):423-429

An adaptive parallel algorithm for inducing a priority queue structure on an n-element array is presented. The algorithm is extended to provide optimal parallel construction algorithms for three other heap-like structures useful in implementing double-ended priority queues, namely min-max heaps, deeps, and min-max-pair heaps. It is shown that an n-element array can be made into a heap, a deap, a min-max heap, or a min-max-pair heap in O(log n+(n /p)) time using no more than n/log n processors, in the exclusive-read-exclusive-write parallel random-access machine model 相似文献

12.

A processor-time-minimal systolic array for transitive closure

Scheiman C.J. Cappello P.R. 《Parallel and Distributed Systems, IEEE Transactions on》1992,3(3):257-269

Using a directed acyclic graph (DAG) model of algorithms, the authors focus on processor-time-minimal multiprocessor schedules: time-minimal multiprocessor schedules that use as few processors as possible. The Kung, Lo, and Lewis (KLL) algorithm for computing the transitive closure of a relation over a set of n elements requires at least 5n-4 parallel steps. As originally reported. their systolic array comprises n² processing elements. It is shown that any time-minimal multiprocessor schedule of the KLL algorithm's dag needs at least n²/3 processing elements. Then a processor-time-minimal systolic array realizing the KLL dag is constructed. Its processing elements are organized as a cylindrically connected 2-D mesh, when n=0 mod 3. When n≠0 mod 3, the 2-D mesh is connected as a torus 相似文献

13.

Memory and processing architecture for 3D voxel-based imagery 总被引：1，自引：0，他引：1

Kaufman A. Bakalash R. 《Computer Graphics and Applications, IEEE》1988,8(6):10-23

A versatile voxel-based architecture for 3-D volume visualization, called the Cube architecture, is introduced. A small-scale prototype of the architecture has been realized in hardware and has been operating in true real-time, faster than the alternative voxel systems. The Cube architecture is centered around a 3-D cubic frame buffer, of voxels, and it entertains three processors that access the frame buffer to input sampled and synthetic data, to manipulate the 3-D images, and to project and render them. To cope with the huge quantity of voxels and still perform in real-time, two special features were incorporated within the architecture: a unique skewed memory organization, which permits the retrieval and storage of voxels in parallel, and a multiple-write bus, which speeds up the viewing process. These features allow Cube, for example, to project an image of n³ voxels in O(n ² log n) time rather than the conventional O( n³) time 相似文献

14.

A processor-time-minimal systolic array for cubical mesh algorithms

Cappello P. 《Parallel and Distributed Systems, IEEE Transactions on》1992,3(1):4-13

Using a directed acyclic graph (DAG) model of algorithms, the paper focuses on time-minimal multiprocessor schedules that use as few processors as possible. Such a processor-time-minimal scheduling of an algorithm's DAG first is illustrated using a triangular shaped 2-D directed mesh (representing, for example, an algorithm for solving a triangular system of linear equations). Then, algorithms represented by an n×n×n directed mesh are investigated. This cubical directed mesh is fundamental; it represents the standard algorithm for computing matrix product as well as many other algorithms. Completion of the cubical mesh required 3n-2 steps. It is shown that the number of processing elements needed to achieve this time bound is at least [3n^2/4]. A systolic array for the cubical directed mesh is then presented. It completes the mesh using the minimum number of steps and exactly [3n ^2/4] processing elements it is processor-time-minimal. The systolic array's topology is that of a hexagonally shaped, cylindrically connected, 2-D directed mesh 相似文献

15.

Fast image labeling using local operators on mesh-connectedcomputers

Alnuweiri H.M. Kumar V.K.P. 《IEEE transactions on pattern analysis and machine intelligence》1991,13(2):202-207

A new parallel algorithm is proposed for fat image labeling using local operators on image pixels. The algorithm can be implemented on an n×n mesh-connected computer such that, for any integer k in the range [1, log (2n)], the algorithm requires Θ(kn¹k/) bits of local memory per processor and takes Θ(kn) time. Bit-serial processors and communication links can be used without affecting the asymptotic time complexity of the algorithm. The time complexity of the algorithm has very small leading constant factors, which makes it superior to previous mesh computer labeling algorithms for most practical image sizes (e.g. up to 4096×4096 images). Furthermore, the algorithm is based on using stacks that can be realized using very fast shift registers within each processing element 相似文献

16.

Balanced parallel sort on hypercube multiprocessors

Abali B. Ozguner F. Bataineh A. 《Parallel and Distributed Systems, IEEE Transactions on》1993,4(5):572-581

A parallel sorting algorithm for sorting n elements evenly distributed over 2^d p nodes of a d-dimensional hypercube is presented. The average running time of the algorithm is O((n log n)/p+p log 2n). The algorithm maintains a perfect load balance in the nodes by determining the (kn/p)th elements (k1,. . ., (p-1)) of the final sorted list in advance. These p-1 keys are used to partition the sorted sublists in each node to redistribute data to the nodes to be merged in parallel. The nodes finish the sort with an equal number of elements (n/ p) regardless of the data distribution. A parallel selection algorithm for determining the balanced partition keys in O(p log2n) time is presented. The speed of the sorting algorithm is further enhanced by the distance-d communication capability of the iPSC/2 hypercube computer and a novel conflict-free routing algorithm. Experimental results on a 16-node hypercube computer show that the sorting algorithm is competitive with the previous algorithms and faster for skewed data distributions 相似文献

17.

Optimal distributed t-resilient election in completenetworks

Itai A. Kutten S. Wolfstahl Y. Zaks S. 《IEEE transactions on pattern analysis and machine intelligence》1990,16(4):415-420

The problem of distributed leader election in an asynchronous complete network, in the presence of faults that occurred prior to the execution of the election algorithm, is discussed. Failures of this type are encountered, for example, during a recovery from a crash in the network. For a network with n processors, k of which start the algorithm that uses at most O(n log k +n+kt) messages is presented and shown to be optimal. An optimal algorithm for the case where the identities of the neighbors are known is also presented. It is noted that the order of the message complexity of a t-resilient algorithm is not always higher than that of a nonresilient one. The t-resilient algorithm is a systematic modification of an existing algorithm for a fault-free network 相似文献

18.

Optimal algorithms on the pipelined hypercube and related networks

JaJa J. Ryu K.W. 《Parallel and Distributed Systems, IEEE Transactions on》1993,4(5):582-591

Parallel algorithms for several important combinatorial problems such as the all nearest smaller values problem, triangulating a monotone polygon, and line packing are presented. These algorithms achieve linear speedups on the pipelined hypercube, and provably optimal speedups on the shuffle-exchange and the cube-connected-cycles for any number p of processors satisfying 1⩽p⩽n/((log³n)(loglog n)²), where n is the input size. The lower bound results are established under no restriction on how the input is mapped into the local memories of the different processors 相似文献

19.

Optimal broadcasting on the star graph 总被引：2，自引：0，他引：2

Mendia V.E. Sarkar D. 《Parallel and Distributed Systems, IEEE Transactions on》1992,3(4):389-396

The star graph has been show to be an attractive alternative to the widely used n-cube. Like the n-cube, the star graph possesses rich structure and symmetry as well as fault tolerant capabilities, but has a smaller diameter and degree. However, very few algorithms exists to show its potential as a multiprocessor interconnection network. Many fast and efficient parallel algorithms require broadcasting as a basic step. An optimal algorithm for one-to-all broadcasting in the star graph is proposed. The algorithm can broadcast a message to N processors in O(log₂ N) time. The algorithm exploits the rich structure of the star graph and works by recursively partitioning the original star graph into smaller star graphs. In addition, an optimal all-to-all broadcasting algorithm is developed 相似文献

20.

Partitioning and labeling of loops by unimodular transformations

D'Hollander E.H. 《Parallel and Distributed Systems, IEEE Transactions on》1992,3(4):465-476

A general method for the identification of the independent subsets in loops with constant dependence vectors is presented. It is shown that the dependence relation remains invariant under a unimodular transformation. Then a unimodular transformation is used to bring the dependence matrix into a form where the independent subsets are obtained by a direct and inexpensive partitioning algorithm. This leads to a procedure for the automatic conversion of a serial loop into a nest of parallel DO-ALL loops. Another unimodular transformation results in an algorithm to label the dependent iterations of an n-fold nested loop in O(n²) time. This provides a multithreaded dynamic scheduling scheme requiring only one fork and one join primitive 相似文献