期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Ranka S. Sahni S. 《Parallel and Distributed Systems, IEEE Transactions on》1990,1(1):77-82

A linear-time algorithm is developed to perform all odd (even) length circular shifts of data in an SIMD (single-instruction-stream, multiple-data-stream) hypercube. As an application, the algorithm is used to obtain an O(M²+log N) time and O(1) memory per processor algorithm to compute the two-dimensional convolution of an N×N image and an M×M template on an N² processor SIMD hypercube. This improves the previous best complexity of O(M² log M+log N) 相似文献

2.

Designing efficient parallel algorithms on mech-connected computerswith multiple broadcasting

Chen Y.-C. Chen W.-T. Chen G.-H. Sheu J.-P. 《Parallel and Distributed Systems, IEEE Transactions on》1990,1(2):241-246

Semigroup and prefix computations on two-dimensional mesh-connected computers with multiple broadcasting (2-MCCMBs) are studied. Previously, only square 2-MCCMBs with N processing elements were considered for semigroup computations of N data items, and O(N^1/6) time was required. It is found that square machines are not the best form for semigroup computations, and an O(N^1/8)-time algorithm is derived on an N^5/8×N^3/8 rectangular 2-MCCMB. This time complexity can be further reduced to O(N^1/9) if fewer processing elements are used. Parallel algorithms for prefix computations with the same time complexities are derived 相似文献

3.

Convolution on mesh connected multicomputers

Ranka S. Sahni S. 《IEEE transactions on pattern analysis and machine intelligence》1990,12(3):315-318

An efficient parallel algorithm is presented for convolution on a mesh-connected computer with wraparound. The algorithm does not require a broadcast feature for data values, as assumed by previously proposed algorithms. As a result, the algorithm is applicable to both SIMD and MIMD meshes. For an N×N image and a M×M template, the previous algorithms take O (M²q) time on an N×N mesh-connected multicomputer (q is the number of bits in each entry of the convolution matrix). The algorithms have complexity O(M²r), where r=max {number of bits in an image entry, number of bits in a template entry}. In addition to not requiring a broadcast capability, these algorithms are faster for binary images 相似文献

4.

Serial and parallel algorithms for the medial axis transform

Jenq J.-F. Sahni S. 《IEEE transactions on pattern analysis and machine intelligence》1992,14(12):1218-1224

An O(n²) time serial algorithm is developed for obtaining the medial axis transform (MAT) of an n×n image. An O(log n) time CREW PRAM algorithm and an O(log² n) time SIMD hypercube parallel algorithm for the MAT are also developed. Both of these use O(n²) processors. Two problems associated with the MAT, the area and perimeter reporting problem, are studied. An O(log n) time hypercube algorithm is developed for both of them, where n is the number of squares in the MAT, and the algorithms use O(n²) processors 相似文献

5.

Comments on `Parallel algorithms for hierarchical clustering andcluster validity'

Murtagh F. 《IEEE transactions on pattern analysis and machine intelligence》1992,14(10):1056-1057

In the above-titled paper (ibid., vol.12, no.11, p.1088-92, Nov. 1990), parallel implementations of hierarchical clustering algorithms that achieve O(n²) computational time complexity and thereby improve on the baseline of sequential implementations are described. The latter are stated to be O( n³), with the exception of the single-link method. The commenter points out that state-of-the-art hierarchical clustering algorithms have O(n²) time complexity and should be referred to in preference to the O(n³) algorithms, which were described in many texts in the 1970s. Some further references in the parallelizing of hierarchic clustering algorithms are provided 相似文献

6.

Two-dimensional convolution on a pyramid computer

Chang J.H. Ibarra O.H. Pong T.-C. Sohn S.M. 《IEEE transactions on pattern analysis and machine intelligence》1988,10(4):590-593

An algorithm for convolving a k×k window of weighting coefficients with an n×n image matrix on a pyramid computer of O(n²) processors in time O(logn+k²), excluding the time to load the image matrix, is presented. If k=Ω (√log n), which is typical in practice, the algorithm has a processor-time product O(n ² k²) which is optimal with respect to the usual sequential algorithm. A feature of the algorithm is that the mechanism for controlling the transmission and distribution of data in each processor is finite state, independent of the values of n and k. Thus, for convolving two {0, 1}-valued matrices using Boolean operations rather than the typical sum and product operations, the processors of the pyramid computer are finite-state 相似文献

7.

A novel discrete relaxation architecture 总被引：1，自引：0，他引：1

Gu J. Wang W. 《IEEE transactions on pattern analysis and machine intelligence》1992,14(8):857-865

The discrete relaxation algorithm (DRA) is a computational technique that enforces arc consistency (AC) in a constraint satisfaction problem (CSP). The original sequential AC-1 algorithm suffers from O(n³m³) time complexity, and even the optimal sequential AC-4 algorithm is O (n²m²) for an n-object and m-label DRA problem. Sample problem runs show that these algorithms are all too slow to meet the need for any useful, real-time CSP applications. A parallel DRA5 algorithm that reaches a lower bound of O(nm) (where the number of processors is polynomial in the problem size) is given. A fine-grained, massively parallel hardware computer architecture has been designed for the DRA5 algorithm. For practical problems, many orders of magnitude of efficiency improvement can be reached on such a hardware architecture 相似文献

8.

An EREW PRAM algorithm for image component labeling

Cypher R. Sanz J.L.C. Snyder L. 《IEEE transactions on pattern analysis and machine intelligence》1989,11(3):258-262

An important midlevel task for computer vision is addressed. The problem consists of labeling connected components in N^1/2×N^2/2 binary images. This task can be solved with parallel computers by using a simple and novel algorithm. The parallel computing model used is a synchronous fine-grained shared-memory model where only one processor can read from or write to the same memory location at a given time. This model is known as the exclusive-read exclusive-write parallel RAM (EREW PRAM). Using this model, the algorithm presented has O(log N) complexity. The algorithm can run on parallel machines other than the EREW PRAM. In particular, it offers an optimal image component labeling algorithm for mesh-connected computers 相似文献

9.

Latin squares for parallel array access

Kim K. Prasanna V.K. 《Parallel and Distributed Systems, IEEE Transactions on》1993,4(4):361-370

A parallel memory system for efficient parallel array access using perfect latin squares as skewing functions is discussed. Simple construction methods for building perfect latin squares are presented. The resulting skewing scheme provides conflict free access to several important subsets of an array. The address generation can be performed in constant time with simple circuitry. The skewing scheme can provide constant time access to rows, columns, diagonals, and N^1/2×N^1/2 subarrays of an N× N array with maximum memory utilization. Self-routing Benes networks can be used to realize the permutations needed between the processing elements and the memory modules. Two skewing schemes that provide conflict free access to three-dimensional arrays are also discussed. Combined with self-routing Benes networks, these schemes provide efficient access to frequently used subsets of three-dimensional arrays 相似文献

10.

Clustering on a hypercube multicomputer

Ranka S. Sahni S. 《Parallel and Distributed Systems, IEEE Transactions on》1991,2(2):129-137

Squared error clustering algorithms for single-instruction multiple-data (SIMD) hypercubes are presented. The algorithms are shown to be asymptotically faster than previously known algorithms and require less memory per processing element (PE). For a clustering problem with N patterns, M features per pattern, and K clusters, the algorithms complete in O(k+log NM ) steps on NM processor hypercubes. This is optimal up to a constant factor. These results are extended to the case in which NMK processors are available. Experimental results from a multiple-instruction, multiple-data (MIMD) medium-grain hypercube are also presented 相似文献

11.

On the number of digital straight line segments

Berenstein C.A. Lavine D. 《IEEE transactions on pattern analysis and machine intelligence》1988,10(6):880-887

A closed-form expression has been reported in the literature for L_N, the number of digital line segments of length N that correspond to lines of the form y=ax+β, O⩽α, β<1. The authors prove an asymptotic estimate for L_N that might prove useful for many applications, namely, L_N=N ³/π²+O(N² log N). An application to an image registration problem is given 相似文献

12.

Constant time algorithms for the transitive closure and somerelated graph problems on processor arrays with reconfigurable bussystems

Wang B.-F. Chen G.-H. 《Parallel and Distributed Systems, IEEE Transactions on》1990,1(4):500-507

The transitive closure problem in O(1) time is solved by a new method that is far different from the conventional solution method. On processor arrays with reconfigurable bus systems, two O (1) time algorithms are proposed for computing the transitive closure of an undirected graph. One is designed on a three-dimensional n×n×n processor array with a reconfigurable bus system, and the other is designed on a two-dimensional n²×n² processor array with a reconfigurable bus system, where n is the number of vertices in the graph. Using the O(1) time transitive closure algorithms, many other graph problems are solved in O(1) time. These problems include recognizing bipartite graphs and finding connected components, articulation points, biconnected components, bridges, and minimum spanning trees in undirected graphs 相似文献

13.

An efficient distributed knot detection algorithm

Cidon I. 《IEEE transactions on pattern analysis and machine intelligence》1989,15(5):644-649

A distributed knot detection algorithm for general graphs is presented. The knot detection algorithm uses at most O(n log n+m) messages and O(m+n log n) bits of memory to detect all knots' nodes in the network (where n is the number of nodes and m is the number of links). This is compared to O(n²) messages needed in the best algorithm previously published. The knot detection algorithm makes use of efficient cycle detection and clustering techniques. Various applications for the knot detection algorithms are presented. In particular, its importance to deadlock detection in store and forward communication networks and in transaction systems is demonstrated 相似文献

14.

A VLSI constant geometry architecture for the fast Hartley andFourier transforms

Zapata E.L. Arguello F. 《Parallel and Distributed Systems, IEEE Transactions on》1992,3(1):58-70

An application-specific architecture for the parallel calculation of the decimation in time and radix 2 fast Hartley (FHT) and Fourier (FFT) transforms is presented. A real sequence with N=2ⁿ data items is considered as input. The system calculates the FHT and the FFT in n and n+1 stages. respectively. The modular and regular parallel architecture is based on a constant geometry algorithm using butterflies of four data items and the perfect unshuffle permutation. With this permutation, the mapping of the algorithm in VLSI technology is simplified and the communications among processors are minimized. Organization of the processor memory based on first-in, first-out (FIFO) queues facilitates a systolic data flow and permits the implementation in a direct way of the complex data movements and address sequences of the transforms. This is accomplished by means of simple multiplexing operations, using hardwired control. The total calculation time is (Nlog₂N)/4Q cycles for the FHT and N(1+log₂N)/4Q cycles for the FFT, where Q is the number of processors ( Q= 2^q, Q⩽N/4) 相似文献

15.

Partitioning and labeling of loops by unimodular transformations

D'Hollander E.H. 《Parallel and Distributed Systems, IEEE Transactions on》1992,3(4):465-476

A general method for the identification of the independent subsets in loops with constant dependence vectors is presented. It is shown that the dependence relation remains invariant under a unimodular transformation. Then a unimodular transformation is used to bring the dependence matrix into a form where the independent subsets are obtained by a direct and inexpensive partitioning algorithm. This leads to a procedure for the automatic conversion of a serial loop into a nest of parallel DO-ALL loops. Another unimodular transformation results in an algorithm to label the dependent iterations of an n-fold nested loop in O(n²) time. This provides a multithreaded dynamic scheduling scheme requiring only one fork and one join primitive 相似文献

16.

A generalized simultaneous access dictionary machine

Fan Z. Cheng K.-H. 《Parallel and Distributed Systems, IEEE Transactions on》1991,2(2):149-159

A simultaneous access design of a dictionary machine which supports insert, delete, and search operations is presented. The design is able to handle p accesses simultaneously and allows redundant accesses to occur. In the design, processors performing insert or delete operations are free to perform other tasks after submitting their accesses to the design; processors that perform search operations get their response in O(log N) time. Compared to all sequential access designs of a dictionary which require O(p ) time to process p accesses, the presented design provides much higher throughput; specifically, O(p/log p) times better. It also provides a fast mechanism to avoid the sequential access bottleneck in any large multiprocessor system 相似文献

17.

An efficient heuristic for permutation packet routing on mesheswith low buffer requirements

Makedon F. Symvonis A. 《Parallel and Distributed Systems, IEEE Transactions on》1993,4(3):270-276

Even though exact algorithms exist for permutation routine of n² messages on a n×n mesh of processors which require constant size queues, the constants are very large and the algorithms very complicated to implement. A novel, simple heuristic for the above problem is presented. It uses constant and very small size queues (size=2). For all the simulations run on randomly generated data, the number of routing steps that is required by the algorithm is almost equal to the maximum distance a packet has to travel. A pathological case is demonstrated where the routing takes more than the optimal, and it is proved that the upper bound on the number of required steps is O(n²). Furthermore, it is shown that the heuristic routes in optimal time inversion, transposition, and rotations, three special routing problems that appear very often in the design of parallel algorithms 相似文献

18.

Efficient parallel processing of image contours 总被引：1，自引：0，他引：1

Chen L.T. Davis L.S. Kruskal C.P. 《IEEE transactions on pattern analysis and machine intelligence》1993,15(1):69-81

Describes two parallel algorithms for ranking the pixels on a curve in O (log N) time using either an EREW or CREW PRAM model. The algorithms accomplish this with N processors for a √N×√N image. After applying such an algorithm to an image, it is possible to move the pixels from a curve into processors having consecutive addresses. This is important because one can subsequently apply many algorithms to the curve (such as piecewise linear approximation algorithms or point in polygon tests) using segmented scan operations (i.e. parallel prefix operations). Scan operations can be executed in logarithmic time on many interconnection networks, such as hypercube, tree, butterfly, and shuffle exchange machines as well as on the EREW PRAM. The algorithms were implemented on the hypercube structured Connection Machine, and various performance tests were conducted 相似文献

19.

An algorithm for finding a common structure shared by a family ofstrings

Landraud A.M. Avril J.-F. Chretienne P. 《IEEE transactions on pattern analysis and machine intelligence》1989,11(8):890-895

An algorithm is presented for extracting and localizing a common structure in a family of strings with time complexity O(N²L² log₂ L) where N is the number of strings and L their maximum length. The method could be extended to two-dimensional image analysis. This structure appears as alignments of words which are similar but not necessarily identical and which occur approximately at the same location in all the strings. The method works in two successive stages. First, a fast algorithm is used for drawing up a directory of exactly repeated patterns appearing in a given majority of strings. Second, the algorithm constructs recursively anchoring patterns by a divide-and-conquer strategy and converges on a maximum number of alignments. This algorithm has been applied to find common a priori unknown features in families of biological macromolecules, with quite good results. One of these families included 23 strings of about 100 characters each. Each characteristic structure has been achieved within less than one minute on a MULTIX-DPS8 system 相似文献

20.

Bicoprime factorizations of the plant and their relation to right and left-coprime factorizations

Desoer C.A. Gundes A.N. 《Automatic Control, IEEE Transactions on》1988,33(7):672-676

In a general algebraic framework, starting with a bicoprime factorization P=N_prD^-1 N_pl, a right-coprime factorization N_pD_p^-1, a left-coprime factorization D^-1_pN_p, and the generalized Bezout identities associated with the pairs (N_p, D_p) and (D˜ _p, N˜_p) are obtained. The set of all H-stabilizing compensators for P in the unity-feedback configuration S(P, C) are expressed in terms of (N_pr, D, N _pt) and the elements of the Bezout identity. The state-space representation P=C(sI-A)^-1B is included as an example 相似文献