期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Parallel algorithms for the longest common subsequence problem

Mi Lu Hua Lin 《Parallel and Distributed Systems, IEEE Transactions on》1994,5(8):835-848

A subsequence of a given string is any string obtained by deleting none or some symbols from the given string. A longest common subsequence (LCS) of two strings is a common subsequence of both that is as long as any other common subsequences. The problem is to find the LCS of two given strings. The bound on the complexity of this problem under the decision tree model is known to be mn if the number of distinct symbols that can appear in strings is infinite, where m and n are the lengths of the two strings, respectively, and m⩽n. In this paper, we propose two parallel algorithms far this problem on the CREW-PRAM model. One takes O(log² m + log n) time with mn/log m processors, which is faster than all the existing algorithms on the same model. The other takes O(log² m log log m) time with mn/(log² m log log m) processors when log² m log log m > log n, or otherwise O(log n) time with mn/log n processors, which is optimal in the sense that the time×processors bound matches the complexity bound of the problem. Both algorithms exploit nice properties of the LCS problem that are discovered in this paper 相似文献

2.

Efficient EREW PRAM algorithms for parentheses-matching

Prasad S.K. Das S.K. Chen C.C.-Y. 《Parallel and Distributed Systems, IEEE Transactions on》1994,5(9):995-1008

We present four polylog-time parallel algorithms for matching parentheses on an exclusive-read and exclusive-write (EREW) parallel random-access machine (PRAM) model. These algorithms provide new insights into the parentheses-matching problem. The first algorithm has a time complexity of O(log² n) employing O(n/(log n)) processors for an input string containing n parentheses. Although this algorithm is not cost-optimal, it is extremely simple to implement. The remaining three algorithms, which are based on a different approach, achieve O(log n) time complexity in each case, and represent successive improvements. The second algorithm requires O(n) processors and working space, and it is comparable to the first algorithm in its ease of implementation. The third algorithm uses O(n/(log n)) processors and O(n log n) space. Thus, it is cost-optimal, but uses extra space compared to the standard stack-based sequential algorithm. The last algorithm reduces the space complexity to O(n) while maintaining the same processor and time complexities. Compared to other existing time-optimal algorithms for the parentheses-matching problem that either employ extensive pipelining or use linked lists and comparable data structures, and employ sorting or a linked list ranking algorithm as subroutines, the last two algorithms have two distinct advantages. First, these algorithms employ arrays as their basic data structures, and second, they do not use any pipelining, sorting, or linked list ranking algorithms 相似文献

3.

Optimal and load balanced mapping of parallel priority queues inhypercubes

Das S.K. Pinotti M.C. Sarkar F. 《Parallel and Distributed Systems, IEEE Transactions on》1996,7(6):555-564

We efficiently map a priority queue on the hypercube architecture in a load balanced manner, with no additional communication overhead, and present optimal parallel algorithms for performing insert and deletemin operations. Two implementations for such operations are proposed on the single port hypercube model. In a b-bandwidth, n-item priority queue in which every node contains b items in sorted order, the first implementation achieves optimal speed up of O(min{log n, b log n/log b+log log n}) for inserting b presorted items or deleting b smallest items, where b=O(n¹c/) with c>1. In particular, single insertion and deletion operations are cost optimal and require O(log n/p+log p) time using O(log n/log log n) processors. The second implementation is more scalable since it uses a larger number of processors, and attains a “nearly” optimal speedup on the single hypercube. Namely, the insertion of log n presorted items or the deletion of the log n smallest items is accomplished in O(log log n²) time using O(log² n/log log n) processors. Finally, on the slightly more powerful pipelined hypercube model, the second implementation performs log n operations in O(log log n) time using O(log² n/log log n) processors, thus achieving an optimal speed up. To the best of our knowledge, our algorithms are the first implementations of b-bandwidth distributed priority queues, which are load balanced and yet guarantee optimal speed ups 相似文献

4.

Fast sorting algorithms on a linear array with a reconfigurablepipelined bus system

Datta A. Soundaralakshmi S. Owens R. 《Parallel and Distributed Systems, IEEE Transactions on》2002,13(3):212-222

We present two fast algorithms for sorting on a linear array with a reconfigurable pipelined bus system (LARPBS), one of the recently proposed parallel architectures based on optical buses. In our first algorithm, we sort N numbers in O(log N log log N) worst-case time using N processors. In our second algorithm, we sort N numbers in O((log log N)²) worst-case time using N^1+ε processors, for any fixed ε such that 0 < ε < 1. Our algorithms are based on a novel deterministic sampling scheme for merging two sorted arrays of length N each in O(log log N) time on an LARPBS with N processors. To our knowledge, the previous best sorting algorithm on this architecture has a running time of O((log N)²) using N processors 相似文献

5.

An efficient parallel recognition algorithm forbipartite-permutation graphs

Chang-Wu Yu Gen-Huey Chen 《Parallel and Distributed Systems, IEEE Transactions on》1996,7(1):3-10

We present a parallel recognition algorithm for bipartite-permutation graphs. The algorithm can be executed in O(log n) time on the CRCW PRAM if O(n³/log n) processors are used, or O(log² n) time on the CREW PRAM if O(n³/log²n) processors are used. Chen and Yesha (1993) have presented another CRCW PRAM algorithm that takes O(log²n) time if O(n ³) processors are used. Compared with Chen and Yesha's algorithm, our algorithm requires either less time and fewer processors on the same machine model, or fewer processors on a weaker machine model. Our algorithm can also be applied to determine if two bipartite-permutation graphs are isomorphic 相似文献

6.

Fully dynamic maintenance of k-connectivity in parallel

Weifa Liang Brent R.P. Hong Shen 《Parallel and Distributed Systems, IEEE Transactions on》2001,12(8):846-864

Given a graph G=(V, E) with n vertices and m edges, the k-connectivity of G denotes either the k-edge connectivity or the k-vertex connectivity of G. In this paper, we deal with the fully dynamic maintenance of k-connectivity of G in the parallel setting for k=2, 3. We study the problem of maintaining k-edge/vertex connected components of a graph undergoing repeatedly dynamic updates, such as edge insertions and deletions, and answering the query of whether two vertices are included in the same k-edge/vertex connected component. Our major results are the following: (1) An NC algorithm for the 2-edge connectivity problem is proposed, which runs in O(log n log(m/n)) time using O(n^3/4) processors per update and query. (2) It is shown that the biconnectivity problem can be solved in O(log^{2 n}) time using O(nα(2n, n)/logn) processors per update and O(1) time with a single processor per query or in O(log n log_n/^m) time using O(nα(2n, n)/log n) processors per update and O(logn) time using O(nα(2n, n)/logn) processors per query, where α(.,.) is the inverse of Ackermann's function. (3) An NC algorithm for the triconnectivity problem is also derived, which takes O(log n log_n/^m+logn log log n/α(3n, n)) time using O(nα(3n, n)/log n) processors per update and O(1) time with a single processor per query. (4) An NC algorithm for the 3-edge connectivity problem is obtained, which has the same time and processor complexities as the algorithm for the triconnectivity problem. To the best of our knowledge, the proposed algorithms are the first NC algorithms for the problems using O(n) processors in contrast to Ω(m) processors for solving them from scratch. In particular, the proposed NC algorithm for the 2-edge connectivity problem uses only O(n^3/4) processors. All the proposed algorithms run on a CRCW PRAM 相似文献

7.

Parallel algorithms for relational coarsest partition problems 总被引：2，自引：0，他引：2

Rajasekaran S. Lee I. 《Parallel and Distributed Systems, IEEE Transactions on》1998,9(7):687-699

Relational Coarsest Partition Problems (RCPPs) play a vital role in verifying concurrent systems. It is known that RCPPs are P-complete and hence it may not be possible to design polylog time parallel algorithms for these problems. In this paper, we present two efficient parallel algorithms for RCPP in which its associated label transition system is assumed to have m transitions and n states. The first algorithm runs in O(n^1+ϵ) time using m/n^ϵ CREW PRAM processors, for any fixed ϵ<1. This algorithm is analogous to and optimal with respect to the sequential algorithm of P.C. Kanellakis and S.A. Smolka (1990). The second algorithm runs in O(n log n) time using m/n CREW PRAM processors. This algorithm is analogous to and nearly optimal with respect to the sequential algorithm of R. Paige and R.E. Tarjan (1987) 相似文献

8.

Algorithms for search trees on message passing architectures

Colbrook A. Brewer E.A. Dellarocas C.N. Weihl W.E. 《Parallel and Distributed Systems, IEEE Transactions on》1996,7(2):97-108

In this paper we describe a new algorithm for maintaining a balanced search tree on a message-passing MIMD architecture; the algorithm is particularly well suited for implementation on a small number of processors. We introduce a (2^B-2, 2^B) search tree that uses a bidirectional ring of O(log n) processors to store n entries. Update operations use a bottom-up node-splitting scheme, which performs significantly better than top-down search tree algorithms. The bottom-up algorithm requires many fewer messages and results in less blocking due to synchronization than top-down algorithms. Additionally, for a given cost ratio of computation to communication the value of B may be varied to maximize performance. Implementations on a parallel-architecture simulator are described 相似文献

9.

L₂ vector median filters on arrays with reconfigurableoptical buses

Chin-Hsiung Wu Shi-Jinn Horng 《Parallel and Distributed Systems, IEEE Transactions on》2001,12(12):1281-1292

In spite of their good filtering characteristics for vector-valued image processing, the usability of vector median filters is limited by their high computational complexity. Given an N × N image and a W × W window, the computational complexity of vector median filter is O(W⁴N²). In this paper, we design three fast and efficient parallel algorithms for vector median filtering based on the 2-norm (L₂) on the arrays with reconfigurable optical buses (AROB). For 1 ⩽ p ⩽ W ⩽ q ⩽ N, our algorithms run in O(W⁴ log W/p⁴), O(W²N²/p ⁴q² log W) and O(1) times using p⁴N² / log W, p⁴q² / log W, and W⁴N² log N processors, respectively. In the sense of the product of time and the number of processors used, the first two results are cost optimal and the last one is time optimal 相似文献

10.

Work-efficient routing algorithms for rearrangeable symmetricalnetworks

Cam H. Fortes J.A.B. 《Parallel and Distributed Systems, IEEE Transactions on》1999,10(7):733-741

The work performed by a parallel algorithm is the product of its running time and the number of processors it requires. This paper presents work-efficient (or cost-optimal) routing algorithms to determine the switch settings for realizing permutations on rearrangeable symmetrical networks such as Benes and the reduced Ω _NΩ_N^-1. These networks have 2n-1 stages with N=2ⁿ inputs/outputs, each stage consisting of N/2 crossbar switches of size (2×2). Previously known parallel routing algorithms for a rearrangeable network with N inputs determine the states of all switches recursively in O(n) iterations using N processors. Each iteration determines the switch settings of at most two stages of the network and requires at least O(n) time on a computer of N processors, regardless of the type of its interconnection network. Hence, the work of any previously known parallel routing algorithm equals at least O(Nn²) for setting up all the switches of a rearrangeable network. The new routing algorithms run on a computer of p processors, 1⩽p⩽N/n, and perform work O(Nn). Moreover, because the range of p is large, the new routing algorithms do not have to be changed in case some processors become faulty 相似文献

11.

PRAM和LARPBS模型上的近似串匹配并行算法 总被引：15，自引：1，他引：15

钟诚陈国良《软件学报》2004,15(2):159-169

近似串匹配技术在网络信息搜索、数字图书馆、模式识别、文本挖掘、IP路由查找、网络入侵检测、生物信息学、音乐研究计算等领域具有广泛的应用.基于CREW-PRAM(parallel random access machine with concurrent read and exclusive write)模型,采用波前式并行推进的方法直接计算编辑距离矩阵D,设计了一个允许k-差别的近似串匹配动态规划并行算法,该算法使用(m+1)个处理器,时间复杂度为O(n),算法理论上达到线性加速;采取水平和斜向双并行计算编辑距离矩阵D的方法,设计了一个使用((m+1)个处理器和O(n/(+m)时间的、可伸缩的、允许k-差别的近似串匹配动态规划并行算法,.基于分治策略,通过灵活拆分总线和合并子总线动态重构光总线系统,并充分利用光总线的消息播送技术和并行计算前缀和的方法,实现了汉明距离的并行计算,设计了两个基于LARPBS(linear arrays with reconfigurable pipelined bus system)模型的通信高效、可扩放的允许k-误配的近似串匹配并行算法,其中一个算法使用n个处理器,时间为O(m);另一个为常数时间算法,使用mn个处理器. 相似文献

12.

Parallel Parsing Algorithms and VLSI Implementations for Syntactic Pattern Recognition

Chiang YT Fu KS 《IEEE transactions on pattern analysis and machine intelligence》1984,(3):302-314

Earley's algorithm has been commonly used for the parsing of general context-free languages and the error-correcting parsing in syntactic pattern recognition. The time complexity for parsing is 0(n3). This paper presents a parallel Earley's recognition algorithm in terms of an ``X*' operator. By restricting the input context-free grammar to be ?-free, the parallel algorithm can be executed on a triangular-shape VLSI array. This array system has an efficient way of moving data to the right place at the right time. Simulation results show that this system can recognize a string with length n in 2n + 1 system time. We also present a parallel parse-extraction algorithm, a complete parsing algorithm, and an error-correcting recognition algorithm. The parallel complete parsing algorithm has been simulated on a processor array which is similar to the triangular VLSI array. For an input string of length n the processor array will give the correct right-parse at system time 2n + 1 if the string is accepted. The error-correcting recognition algorithm has also been simulated on a triangular VLSI array. This array recognizes an erroneous string of length n in time 2n + 1 and gives the correct error count. These parallel algorithms are especially useful for syntactic pattern recognition. 相似文献

13.

Two minimum spanning forest algorithms on fixed-size hypercube computers

Sajal K. Das Narsingh Deo Sushil Prasad 《Parallel Computing》1990,15(1-3):179-187

Two parallel algorithms for finding minimum spanning forest (MSF) of a weighted undirected graph on hypercube computers, consisting of a fixed number of processors, are presented. One algorithm is suited for sparse graphs, the other for dense graphs. Our design strategy is based on successive elimination of non-MSF edges. The input graph is partitioned equally among different processors, which then repeatedly eliminate non-MSF edges and merge results to gradually construct the desired MSF of the entire graph. Low communication overhead is achieved by restricting the message-flow to between the neighboring processors in the hypercube topology. The correctness of our approach is due to a theorem which states that with total-ordered edges, if an edge of an arbitrary subgraph does not belong to its MSF, then it does not belong to the MSF of the entire graph. For a graph of n vertices and m edges, our first algorithm finds an MSF in O(m log m)/p) time using p processors for p ≤ (mlog m)/n(1+log(m/n)). The second algorithm, efficient for dense graphs, requires O(n²/p) time for p≤n/log n. 相似文献

14.

Parallel marching Poisson solvers

Marian Vajter&#x;ic 《Parallel Computing》1984,1(3-4):325-330

The paper presents parallel algorithms for solving Poisson equation at N² mesh points. The methods based on marching techniques are structured for efficient parallel realization. Using orthogonal decomposition properties of arising matrices, the algorithms can be formulated in terms of transformed vectors. On a MIMD computer with not more than N processors, the computations can be performed in horizontal slices with minimal synchronization requirements. Considering an SIMD machine with N² processors, the complexity bound O(log N) has been achieved, whereby the single marching requires 10 log N steps only. 相似文献

15.

Parallel on-line parsing in constant time per word

Klaas Sikkel 《Theoretical computer science》1993,120(2):303-310

An on-line parser processes each word as soon as it is typed by the user, without waiting for the end of the sentence. Thus, in an interactive system, a sentence will be parsed almost immediately after the last word has been presented.

The complexity of an on-line parser is determined by the resources needed for the analysis of a single word, as it is assumed that previous words have been processed already. Sequential parsing algorithms like CYK or Earley need O(n²) time for the nth word. A parallel implementation in O(n) time on O(n) processors is straightforward. In this paper a novel parallel on-line parser is presented that needs O(1) time on O(n²) processors. 相似文献

16.

Parallel Algorithms for Maximum Matching in Complements of Interval Graphs and Related Problems

M. G. Andrews M. J. Atallah D. Z. Chen D. T. Lee 《Algorithmica》2000,26(2):263-289

Given a set of n intervals representing an interval graph, the problem of finding a maximum matching between pairs of disjoint (nonintersecting) intervals has been considered in the sequential model. In this paper we present parallel algorithms for computing maximum cardinality matchings among pairs of disjoint intervals in interval graphs in the EREW PRAM and hypercube models. For the general case of the problem, our algorithms compute a maximum matching in O( log ³ n) time using O(n/ log² n) processors on the EREW PRAM and using n processors on the hypercubes. For the case of proper interval graphs, our algorithm runs in O( log n ) time using O(n) processors if the input intervals are not given already sorted and using O(n/ log n ) processors otherwise, on the EREW PRAM. On n -processor hypercubes, our algorithm for the proper interval case takes O( log n log log n ) time for unsorted input and O( log n ) time for sorted input. Our parallel results also lead to optimal sequential algorithms for computing maximum matchings among disjoint intervals. In addition, we present an improved parallel algorithm for maximum matching between overlapping intervals in proper interval graphs. Received November 20, 1995; revised September 3, 1998. 相似文献

17.

A fast parallel algorithm for routing unicast assignments in Benesnetworks

Ching-Yi Lee Oruc A.Y. 《Parallel and Distributed Systems, IEEE Transactions on》1995,6(3):329-334

This paper presents a new parallel algorithm for routing unicast (one-to-one) assignments in Benes networks. Parallel routing algorithms for such networks were reported earlier, but these algorithms were designed primarily to route permutation assignments. The routing algorithm presented in this paper removes this restriction without an increase in the order of routing cost or routing time. We realize this new routing algorithm on two different topologies. The algorithm routes a unicast assignment involving O(k) pairs of inputs and outputs in O(lg ² k+lg n) time on a completely connected network of n processors and in O(lg⁴ k+lg² k lg n) time on an extended shuffle-exchange network of n processors. Using O(n lg n) professors, the same algorithm can be pipelined to route α unicast assignments each involving O(k) pairs of inputs and outputs, in O(lg² k+lg n+(α-1) lg k) time on a completely connected network and in O(lg⁴ k+lg² k lg n+(α-1)(lg ³ k+lg k lg n)) time on the extended shuffle-exchange network. These yield an average routing time of O(lg k) in the first case, and O(lg³ k+1g k lg n) in the second case, for all α⩾lg n. These complexities indicate that the algorithm given in this paper is as fast as Nassimi and Sahni's algorithm for unicast assignments, and with pipelining, it is faster than the same algorithm at least by a factor of O(lg n) on both topologies. Furthermore, for sparse assignments, i.e., when k=O(1), it is the first algorithm which has an average routing time of O(1g n) on a topology with O(n) links 相似文献

18.

An Optimal Parallel Co-Connectivity Algorithm

Ka Wong Chong Stavros D. Nikolopoulos Leonidas Palios 《Theory of Computing Systems》2004,37(4):527-546

In this paper we consider the problem of computing the connected components of the complement of a given graph. We describe a simple sequential algorithm for this problem, which works on the input graph and not on its complement, and which for a graph on n vertices and m edges runs in optimal O(n+m) time. Moreover, unlike previous linear co-connectivity algorithms, this algorithm admits efficient parallelization, leading to an optimal O(log n)-time and O((n+m)log n)-processor algorithm on the EREW PRAM model of computation. It is worth noting that, for the related problem of computing the connected components of a graph, no optimal deterministic parallel algorithm is currently available. The co-connectivity algorithms find applications in a number of problems. In fact, we also include a parallel recognition algorithm for weakly triangulated graphs, which takes advantage of the parallel co-connectivity algorithm and achieves an O(log² n) time complexity using O((n+m²) log n) processors on the EREW PRAM model of computation. 相似文献

19.

Cost-optimal parallel algorithms for the tree bisector and relatedproblems

Biing-Feng Wang Shan-Chyun Ku Keng-Hua Shil 《Parallel and Distributed Systems, IEEE Transactions on》2001,12(9):888-898

An edge is a bisector of a simple path if it contains the middle point of the path. Let T=(V,E) be a tree. Given a source vertex s ∈ V, the single-source tree bisector problem is to find, for every vertex υ ∈ V, a bisector of the simple path from s to υ. The all-pairs tree bisector problem is to find for, every pair of vertices u, υ ∈ V, a bisector of the simple path from u to υ. In this paper, it is first shown that solving the single-source tree bisector problem of a weighted tree has a time lower bound Ω(n log n) in the sequential case. Then, efficient parallel algorithms are proposed on the EREW PRAM for the single-source and all-pairs tree bisector problems. Two O(log n) time single-source algorithms are proposed. One uses O(n) work and is for unweighted trees. The other uses O(n log n) work and is for weighted trees. Previous algorithms for the single-source problem could achieve the same time O(log n) and the same optimal work, O(n) for unweighted trees and O(n log n) for weighted trees, on the CRCW PRAM. The contribution of our single-source algorithms is the improvement from CRCW to EREW. One all-pairs parallel algorithm is proposed. It requires O(log n) time using O(n²) work. All the proposed algorithms are cost-optimal. Efficient tree bisector algorithms have practical applications to several location problems on trees. Using the proposed algorithms, efficient parallel solutions for those problems are also presented 相似文献

20.

An efficient parallel algorithm for the efficient domination problem on distance-hereditary graphs

Sun-yuan Hsieh 《Parallel and Distributed Systems, IEEE Transactions on》2002,13(9):985-993

In the literature, there are quite a few sequential and parallel algorithms for solving problems on distance-hereditary graphs. With an n-vertex and m-edge distance-hereditary graph G, we show that the efficient domination problem on G can be solved in O(log/sup 2/ n) time using O(n + m) processors on a CREW PRAM. Moreover, if a binary tree representation of G is given, the problem can be optimally solved in O(log n) time using O(n/log n) processors on an EREW PRAM. 相似文献