期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

An efficient parallel recognition algorithm forbipartite-permutation graphs

Chang-Wu Yu Gen-Huey Chen 《Parallel and Distributed Systems, IEEE Transactions on》1996,7(1):3-10

We present a parallel recognition algorithm for bipartite-permutation graphs. The algorithm can be executed in O(log n) time on the CRCW PRAM if O(n³/log n) processors are used, or O(log² n) time on the CREW PRAM if O(n³/log²n) processors are used. Chen and Yesha (1993) have presented another CRCW PRAM algorithm that takes O(log²n) time if O(n ³) processors are used. Compared with Chen and Yesha's algorithm, our algorithm requires either less time and fewer processors on the same machine model, or fewer processors on a weaker machine model. Our algorithm can also be applied to determine if two bipartite-permutation graphs are isomorphic 相似文献

2.

Parallel dynamic programming

Huang S.-H.S. Hongfei Liu Viswanathan V. 《Parallel and Distributed Systems, IEEE Transactions on》1994,5(3):326-328

Recurrence formulations for various problems, such as finding an optimal order of matrix multiplication, finding an optimal binary search tree, and optimal triangulation of polygons, assume a similar form. A. Gibbons and W. Rytter (1988) gave a CREW PRAM algorithm to solve such dynamic programming problems. The algorithm uses O(n⁶/log n) processors and runs in O(log² n) time. In this article, a modified algorithm is presented that reduces the processor requirement to O(n⁶/log ⁵n) while maintaining the same time complexity of O(log² n) 相似文献

3.

Parallel algorithms for the longest common subsequence problem

Mi Lu Hua Lin 《Parallel and Distributed Systems, IEEE Transactions on》1994,5(8):835-848

A subsequence of a given string is any string obtained by deleting none or some symbols from the given string. A longest common subsequence (LCS) of two strings is a common subsequence of both that is as long as any other common subsequences. The problem is to find the LCS of two given strings. The bound on the complexity of this problem under the decision tree model is known to be mn if the number of distinct symbols that can appear in strings is infinite, where m and n are the lengths of the two strings, respectively, and m⩽n. In this paper, we propose two parallel algorithms far this problem on the CREW-PRAM model. One takes O(log² m + log n) time with mn/log m processors, which is faster than all the existing algorithms on the same model. The other takes O(log² m log log m) time with mn/(log² m log log m) processors when log² m log log m > log n, or otherwise O(log n) time with mn/log n processors, which is optimal in the sense that the time×processors bound matches the complexity bound of the problem. Both algorithms exploit nice properties of the LCS problem that are discovered in this paper 相似文献

4.

Parallel parsing algorithms for static dictionary compression

Nagumo H. Mi Lu Watson K.L. 《Parallel and Distributed Systems, IEEE Transactions on》1999,10(12):1241-1251

The data compression based on dictionary techniques works by replacing phrases in the input string with indexes into some dictionary. The dictionary can be static or dynamic. In static dictionary compression, the dictionary contains a predetermined fixed set of entries. In dynamic dictionary compression, the dictionary changes its entries during compression. We present parallel algorithms for two parsing strategies for static dictionary compression. One is the optimal parsing strategy with dictionaries that have the prefix properly, for which our algorithm requires O(L+log n) time and O(n) processors, where n is the number of symbols in the input string, and L is the maximum length of the dictionary entries, while previous results run in O(L+log n) time using O(n²) processors or in O(L+log² n) time using O(n) processors. The other is the longest fragment first (LFF) parsing strategy, for which our algorithm requires O(L+log n,) time and O(n log L) processors, while a previous result obtained an O(L log n) time performance on O(n/log n) processors. For both strategies, we derive our parallel algorithms by modifying the on-line algorithms using a pointer doubling technique 相似文献

5.

Square meshes are not optimal for convex hull computation

Bhagavathi D. Gurla H. Olariu S. Schwing J.L. Jingyuan Zhang 《Parallel and Distributed Systems, IEEE Transactions on》1996,7(6):545-554

Recently it has been noticed that for semigroup computations and for selection, rectangular meshes with multiple broadcasting yield faster algorithms than their square counterparts. The contribution of the paper is to provide yet another example of a fundamental problem for which this phenomenon occurs. Specifically, we show that the problem of computing the convex hull of a set of n sorted points in the plane can be solved in O(n^1/8 log ^3/4) time on a rectangular mesh with multiple broadcasting of size n^3/8 log^1/4 n×n^5/8/log^1/4n. The fastest previously known algorithms on a square mesh of size √n×√n run in O(n^1/6) time in case the n points are pixels in a binary image, and in O(n^1/6log^3/2 n) time for sorted points in the plane 相似文献

6.

Fully dynamic maintenance of k-connectivity in parallel

Weifa Liang Brent R.P. Hong Shen 《Parallel and Distributed Systems, IEEE Transactions on》2001,12(8):846-864

Given a graph G=(V, E) with n vertices and m edges, the k-connectivity of G denotes either the k-edge connectivity or the k-vertex connectivity of G. In this paper, we deal with the fully dynamic maintenance of k-connectivity of G in the parallel setting for k=2, 3. We study the problem of maintaining k-edge/vertex connected components of a graph undergoing repeatedly dynamic updates, such as edge insertions and deletions, and answering the query of whether two vertices are included in the same k-edge/vertex connected component. Our major results are the following: (1) An NC algorithm for the 2-edge connectivity problem is proposed, which runs in O(log n log(m/n)) time using O(n^3/4) processors per update and query. (2) It is shown that the biconnectivity problem can be solved in O(log^{2 n}) time using O(nα(2n, n)/logn) processors per update and O(1) time with a single processor per query or in O(log n log_n/^m) time using O(nα(2n, n)/log n) processors per update and O(logn) time using O(nα(2n, n)/logn) processors per query, where α(.,.) is the inverse of Ackermann's function. (3) An NC algorithm for the triconnectivity problem is also derived, which takes O(log n log_n/^m+logn log log n/α(3n, n)) time using O(nα(3n, n)/log n) processors per update and O(1) time with a single processor per query. (4) An NC algorithm for the 3-edge connectivity problem is obtained, which has the same time and processor complexities as the algorithm for the triconnectivity problem. To the best of our knowledge, the proposed algorithms are the first NC algorithms for the problems using O(n) processors in contrast to Ω(m) processors for solving them from scratch. In particular, the proposed NC algorithm for the 2-edge connectivity problem uses only O(n^3/4) processors. All the proposed algorithms run on a CRCW PRAM 相似文献

7.

Optimal Initialization and Gossiping Algorithms for Random Radio Networks 总被引：2，自引：0，他引：2

Vlady Ravelomanana 《Parallel and Distributed Systems, IEEE Transactions on》2007,18(1):17-28

The initialization problem, also known as naming, consists to give a unique identifier ranging from 1 to n to a set of n indistinguishable nodes in a given network. We consider a network where n nodes (processors) are randomly deployed in a square (respectively, cube) X. We assume that the time is slotted and the network is synchronous, two nodes are able to communicate if they are within distance at most of r of each other (r is the transmitting/receiving range). Moreover, if two or more neighbors of a processor u transmit concurrently at the same time slot, then u would not receive either messages. We suppose also that the anonymous nodes know neither the topology of the network nor the number of nodes in the network. Under this extremal scenario, we first show how the transmitting range of the deployed processors affects the typical characteristics of the network. Then, by allowing the nodes to transmit at various ranges we design sublinear randomized initialization protocols: In the two, respectively, three, dimensional case, our randomized initialization algorithms run in O(n^1/2 log n^1/2), respectively, O(n^1/3 log n^2/3), time slots. These latter protocols are based upon an optimal gossiping algorithm which is of independent interest 相似文献

8.

Serial and parallel algorithms for the medial axis transform

Jenq J.-F. Sahni S. 《IEEE transactions on pattern analysis and machine intelligence》1992,14(12):1218-1224

An O(n²) time serial algorithm is developed for obtaining the medial axis transform (MAT) of an n×n image. An O(log n) time CREW PRAM algorithm and an O(log² n) time SIMD hypercube parallel algorithm for the MAT are also developed. Both of these use O(n²) processors. Two problems associated with the MAT, the area and perimeter reporting problem, are studied. An O(log n) time hypercube algorithm is developed for both of them, where n is the number of squares in the MAT, and the algorithms use O(n²) processors 相似文献

9.

Parallel algorithms for relational coarsest partition problems 总被引：2，自引：0，他引：2

Rajasekaran S. Lee I. 《Parallel and Distributed Systems, IEEE Transactions on》1998,9(7):687-699

Relational Coarsest Partition Problems (RCPPs) play a vital role in verifying concurrent systems. It is known that RCPPs are P-complete and hence it may not be possible to design polylog time parallel algorithms for these problems. In this paper, we present two efficient parallel algorithms for RCPP in which its associated label transition system is assumed to have m transitions and n states. The first algorithm runs in O(n^1+ϵ) time using m/n^ϵ CREW PRAM processors, for any fixed ϵ<1. This algorithm is analogous to and optimal with respect to the sequential algorithm of P.C. Kanellakis and S.A. Smolka (1990). The second algorithm runs in O(n log n) time using m/n CREW PRAM processors. This algorithm is analogous to and nearly optimal with respect to the sequential algorithm of R. Paige and R.E. Tarjan (1987) 相似文献

10.

Efficient algorithms for list ranking and for solving graphproblems on the hypercube

Ryu K.W. Jaja J. 《Parallel and Distributed Systems, IEEE Transactions on》1990,1(1):83-90

A hypercube algorithm to solve the list ranking problem is presented. Let n be the length of the list, and let p be the number of processors of the hypercube. The algorithm described runs in time O(n/p) when n=Ω(p ^1+ε) for any constant ε>0, and in time O(n log n/p+log³ p) otherwise. This clearly attains a linear speedup when n=Ω(p ^1+ε). Efficient balancing and routing schemes had to be used to achieve the linear speedup. The authors use these techniques to obtain efficient hypercube algorithms for many basic graph problems such as tree expression evaluation, connected and biconnected components, ear decomposition, and st-numbering. These problems are also addressed in the restricted model of one-port communication 相似文献

11.

Efficient EREW PRAM algorithms for parentheses-matching

Prasad S.K. Das S.K. Chen C.C.-Y. 《Parallel and Distributed Systems, IEEE Transactions on》1994,5(9):995-1008

We present four polylog-time parallel algorithms for matching parentheses on an exclusive-read and exclusive-write (EREW) parallel random-access machine (PRAM) model. These algorithms provide new insights into the parentheses-matching problem. The first algorithm has a time complexity of O(log² n) employing O(n/(log n)) processors for an input string containing n parentheses. Although this algorithm is not cost-optimal, it is extremely simple to implement. The remaining three algorithms, which are based on a different approach, achieve O(log n) time complexity in each case, and represent successive improvements. The second algorithm requires O(n) processors and working space, and it is comparable to the first algorithm in its ease of implementation. The third algorithm uses O(n/(log n)) processors and O(n log n) space. Thus, it is cost-optimal, but uses extra space compared to the standard stack-based sequential algorithm. The last algorithm reduces the space complexity to O(n) while maintaining the same processor and time complexities. Compared to other existing time-optimal algorithms for the parentheses-matching problem that either employ extensive pipelining or use linked lists and comparable data structures, and employ sorting or a linked list ranking algorithm as subroutines, the last two algorithms have two distinct advantages. First, these algorithms employ arrays as their basic data structures, and second, they do not use any pipelining, sorting, or linked list ranking algorithms 相似文献

12.

Efficient nonblocking switching networks for interprocessorcommunications in multiprocessor systems

Fong-Chih Shao Yavuz Oruc A. 《Parallel and Distributed Systems, IEEE Transactions on》1995,6(2):132-141

The performance of a multiprocessor system depends heavily on its ability to provide conflict free paths among its processors. In this paper, we explore the possibility of using a nonblocking network with O(N log N) edges (crosspoints) to interconnect the processors of an N processor system, We combine Bassalygo and Pinsker's implicit design of strictly nonblocking networks with an explicit construction of expanders to obtain a strictly nonblocking network with -765.18N+352.8N log N edges and 2+log(N/5) depth. We present an efficient parallel algorithm for routing connection requests on this network and implement it on three parallel processor topologies. The implementation on a parallel processor whose processing elements are interconnected as in the Bassalygo-Pinsker network requires O(N log N) processing elements, O(N log N) interprocessor links and it takes O(log N) steps to route any single connection request where each step involves a small number (≈72) of bit-level operations. A contracted or folded version of the same implementation reduces the processing element count to O(N) without increasing the link count or the routing time. Finally, we establish that the same algorithm takes O(log³ N) steps on a perfect shuffle processor with O(N) processing elements. These results improve the crosspoint, depth and routing time complexities of the previously reported strictly nonblocking networks 相似文献

13.

A new self-routing multicast network

Yuanyuan Yang Jianchao Wang 《Parallel and Distributed Systems, IEEE Transactions on》1999,10(12):1299-1316

In this paper, we propose a design for a new self-routing multicast network which can realize arbitrary multicast assignments between its inputs and outputs without any blocking. The network design uses a recursive decomposition approach and is based on the binary radix sorting concept. All functional components of the network are reverse banyan networks. Specifically, the new multicast network is recursively constructed by cascading a binary splitting network and two half-size multicast networks. The binary splitting network, in turn, consists of two recursively constructed reverse banyan networks. The first reverse banyan network serves as a scatter network and the second reverse banyan network serves as a quasisorting network. The advantage of this approach is to provide a way to self-route multicast assignments through the network and a possibility to reuse part of network to reduce the network cost. The new multicast network we design is compared favorably with the previously proposed multicast networks. It uses O(n log² n) logic gates, and has O(log² n) depth and O(log² n) routing time where the unit of time is a gate delay. By reusing part of the network, the feedback implementation of the network can further reduce the network cost to O(n log n) 相似文献

14.

All nearest smaller values on the hypercube

Kravets D. Plaxton C.G. 《Parallel and Distributed Systems, IEEE Transactions on》1996,7(5):456-462

Given a sequence of n elements, the All Nearest Smaller Values (ANSV) problem is to find, for each element in the sequence, the nearest element to the left (right) that is smaller, or to report that no such element exists. Time and work optimal algorithms for this problem are known on all the PRAM models but the running time of the best previous hypercube algorithm is optimal only when the number of processors p satisfies 1⩽p⩽n/((lg³ n)(lg lg n)²). In this paper, we prove that any normal hypercube algorithm requires Ω(M) processors to solve the ANSV problem in O(lg n) time, and we present the first normal hypercube ANSV algorithm that is optimal for all values of n and p. We use our ANSV algorithm to give the first O(lg n)-time n-processor normal hypercube algorithms for triangulating a monotone polygon and for constructing a Cartesian tree 相似文献

15.

Designing efficient parallel algorithms on CRAP

Tzong-Wann Kao Shi-Jinn Horng Yue-Li Wang Horng-Ren Tsai 《Parallel and Distributed Systems, IEEE Transactions on》1995,6(5):554-560

A cross-bridge reconfigurable array of processors is a parallel processing system which has the ability to change dynamically the supported interconnection scheme during the execution of an algorithm. Based on this architecture, several O(1) time basic operations such as the transpose, the untranspose, the shift, the unshift and the prefix sum of a binary sequence are first proposed. Then, these basic operations can be used to find the kth smallest element of N m bits unsigned integers in O(m) time using N processors and to sort N data items in O(1) time using O(N^5/3) processors instead of using O(N²) processors as those proposed by other researchers 相似文献

16.

Efficient routing and sorting schemes for de Bruijn networks

Hsu D.F. Wei D.S.L. 《Parallel and Distributed Systems, IEEE Transactions on》1997,8(11):1157-1170

We consider the problems of routing and sorting on a de Bruijn network. First, we show that any deterministic oblivious routing scheme for permutation routing on a d-ary de Bruijn network with N=dⁿ nodes, in the worst case, will take Ω(√N) steps under the single-port model. This improves the existing lower bounds provided d is not a constant. We also show that the lower bound is indeed a tight one. Second, we present a deterministic nonoblivious permutation routing algorithm which runs in O(d.n²) time on a d-ary de Bruijn network with N=dⁿ nodes. This algorithm is currently the fastest known nonoblivious deterministic routing algorithm for de Bruijn networks of arbitrary degree. Finally, we present an efficient general sorting algorithm for the de Bruijn networks of arbitrary degree. This algorithm is the best sorting algorithm known so far. It runs in O((log d).d.n²) time for directed de Bruijn network with dⁿ nodes, degree d, and diameter n. As a corollary, we show that on a binary de Bruijn network of Nnodes, our sorting scheme requires at most 2 log² Nsteps 相似文献

17.

Deques with heap order

《Information Processing Letters》1986,22(4):197-200

A deque with heap order is a deque (double-ended queue) such that each item has a real-valued key and the operation of finding an item of minimum key is allowed as well as the usual deque operations. By combining a standard deque implementation with an auxiliary heap (priority queue) it is possible to implement a deque with heap order so that the worst-case time per operation is O(log n), where n is the number of items on the deque. This paper describes an implementation of deques with heap order for which the worst-case time per operation is O(1). 相似文献

18.

Optimal algorithms on the pipelined hypercube and related networks

JaJa J. Ryu K.W. 《Parallel and Distributed Systems, IEEE Transactions on》1993,4(5):582-591

Parallel algorithms for several important combinatorial problems such as the all nearest smaller values problem, triangulating a monotone polygon, and line packing are presented. These algorithms achieve linear speedups on the pipelined hypercube, and provably optimal speedups on the shuffle-exchange and the cube-connected-cycles for any number p of processors satisfying 1⩽p⩽n/((log³n)(loglog n)²), where n is the input size. The lower bound results are established under no restriction on how the input is mapped into the local memories of the different processors 相似文献

19.

Prefix computations on a generalized mesh-connected computer withmultiple buses

Kun-Liang Chung 《Parallel and Distributed Systems, IEEE Transactions on》1995,6(2):196-199

The mesh-connected computer with multiple buses (MC-CMB) is a well-known parallel organization, providing broadcast facilities in each row and each column. In this paper, we propose a 2D generalized MCCMB (2-GMCCMB) for the purpose of increasing the efficiency of executing some important applications of prefix computations such as solving Linear recurrences and tridiagonaI systems, etc. A k₁n₁×k₁n₂ 2-GMCCMB is constructed from a k ₁n₁×k₁n₂ mesh organization by enhancing the power of each disjoint n₁×n₂ submesh with multiple buses (sub-2-MCCMB). Given n data, a prefix computation can be performed in O(n^1/10) time on an n^3/5×n^2/5 2-GMCCMB, where each disjoint sub-2-MCCMB is of size n^1/2×n^3/10. This time bound is faster than the previous time bound of O(n^1/8) for the same computation on an n^5/8×n^3/8 2-MCCMB. Furthermore, the time bound of our parallel prefix algorithm can be further reduced to O(n^1/11) if fewer processors are used. Our result can be extended to the d-dimensional GMCCMB, giving a time bound of O(n¹(d2(d)+d)/) for any constant d; here, we omit the constant factors. This time bound is less than the previous time bound of O(n¹(d2(d))/) on the d-dimensional MCCMB 相似文献

20.

Hot-potato algorithms for permutation routing

Newman I. Schuster A. 《Parallel and Distributed Systems, IEEE Transactions on》1995,6(11):1168-1176

We develop a methodology for the design of hot-potato algorithms for routing permutations. The basic idea is to convert existing store-and-forward routing algorithms to hot-potato algorithms. Using it, we obtain the following complexity bounds for permutation routing: n×n Mesh: 7n+o(n) steps; 2ⁿ hypercube: O(n²) steps; n×n Torus: 4n+o(n) steps. The algorithm for the two-dimensional grid is the first to be both deterministic and asymptotically optimal. The algorithm for the 2ⁿ-nodes Boolean cube is the first deterministic algorithm that achieves a complexity of o(2ⁿ) steps 相似文献