首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
单向链表快速排序算法   总被引:2,自引:0,他引:2  
单向链表广泛应用于动态存储结构,当前单向链表的排序算法普遍效率偏低,而平均效率最高的快速排序算法并不适用于单向链表。基于分治策略,使用递归方法,通过重新链接单向链表节点,提出了用于单向链表的快速排序算法,其平均时间复杂度为O(nlog2n),辅助空间复杂度为O(0),平均递归栈空间复杂度为O(log2n);同时,进行了算法分析和实验测试,其效率较其它单向链表排序算法有较大提高,且较传统基于线性表的快速排序算法也有一定提高。研究结果解决了当前单向链表排序效率较低的问题。  相似文献   

2.
In this paper, we present a new parallel sorting algorithm which maximizes the overlap between the disk, network, and CPU subsystems of a processing node. This algorithm is shown to be of similar complexity to known efficient sorting algorithms. The pipelining effect exploited by our algorithm should lead to higher levels of performance on distributed memory parallel processors. In order to achieve the best results using this strategy, the CPU, network, and disk operations must take comparable time. We suggest acceptable levels of system balance for sorting machines and analyze the performance of the sorting algorithm as system parameters vary.  相似文献   

3.
A parallel algorithm for Euclidean distance transform (EDT) on linear array with reconfigurable pipeline bus system (LARPBS) is presented. For an image with n/spl times/n pixels, the algorithm can complete EDT transform in O(n log n/c(n) log d(n)) time using n/spl middot/d(n)/spl middot/c(n) processors, where c(n) and d(n) are parameters satisfying 1/spl les/c(n)/spl les/n, and 1相似文献   

4.
A subsequence of a given string is any string obtained by deleting none or some symbols from the given string. A longest common subsequence (LCS) of two strings is a common subsequence of both that is as long as any other common subsequences. The problem is to find the LCS of two given strings. The bound on the complexity of this problem under the decision tree model is known to be mn if the number of distinct symbols that can appear in strings is infinite, where m and n are the lengths of the two strings, respectively, and m⩽n. In this paper, we propose two parallel algorithms far this problem on the CREW-PRAM model. One takes O(log2 m + log n) time with mn/log m processors, which is faster than all the existing algorithms on the same model. The other takes O(log2 m log log m) time with mn/(log2 m log log m) processors when log2 m log log m > log n, or otherwise O(log n) time with mn/log n processors, which is optimal in the sense that the time×processors bound matches the complexity bound of the problem. Both algorithms exploit nice properties of the LCS problem that are discovered in this paper  相似文献   

5.
We define a new type of recurrence equations called “Simple Indexed Recurrences” (SIR). In this type of equations, ordinary recurrences are generalized to X[g(i)]=opi(X[f(i)], X[g(i)]), where f, g : {1...n}→{1...m}, opi(x, y) is a binary associative operator and g is distinct, i.e., ∀i≠j g(i)≠g(j). This enables us to model certain sequential loops as a sequence of SIR equations. A parallel algorithm that solves a set of SIR equations will, in fact, parallelize sequential loops of the above type. Such a parallel SIR algorithm must be efficient enough to compete with the O(n) work complexity of the original loop. We show why efficient parallel algorithms for the related problems of list ranking and tree contraction, which require O(n) work, cannot be applied to solving SIR. We instead use repeated iterations of pointer jumping to compute the final values of X[] in n/p·log p steps and n·log p work, with p processors. A sequence of experiments was performed to test the effect of synchronous and asynchronous executions on the actual performance of the algorithm. These experiments show that pointer jumping requires O(n)) work in most practical cases of SIR loops. An efficient solution is given for the special case where we know how to compute the inverse of opi, and finally, useful applications of SIR to the well-known Livermore loops benchmark are presented  相似文献   

6.
Given a graph G=(V, E) with n vertices and m edges, the k-connectivity of G denotes either the k-edge connectivity or the k-vertex connectivity of G. In this paper, we deal with the fully dynamic maintenance of k-connectivity of G in the parallel setting for k=2, 3. We study the problem of maintaining k-edge/vertex connected components of a graph undergoing repeatedly dynamic updates, such as edge insertions and deletions, and answering the query of whether two vertices are included in the same k-edge/vertex connected component. Our major results are the following: (1) An NC algorithm for the 2-edge connectivity problem is proposed, which runs in O(log n log(m/n)) time using O(n3/4) processors per update and query. (2) It is shown that the biconnectivity problem can be solved in O(log2 n ) time using O(nα(2n, n)/logn) processors per update and O(1) time with a single processor per query or in O(log n logn/m) time using O(nα(2n, n)/log n) processors per update and O(logn) time using O(nα(2n, n)/logn) processors per query, where α(.,.) is the inverse of Ackermann's function. (3) An NC algorithm for the triconnectivity problem is also derived, which takes O(log n logn/m+logn log log n/α(3n, n)) time using O(nα(3n, n)/log n) processors per update and O(1) time with a single processor per query. (4) An NC algorithm for the 3-edge connectivity problem is obtained, which has the same time and processor complexities as the algorithm for the triconnectivity problem. To the best of our knowledge, the proposed algorithms are the first NC algorithms for the problems using O(n) processors in contrast to Ω(m) processors for solving them from scratch. In particular, the proposed NC algorithm for the 2-edge connectivity problem uses only O(n3/4) processors. All the proposed algorithms run on a CRCW PRAM  相似文献   

7.
并行归并排序算法   总被引:3,自引:0,他引:3  
构造效率为O(1)的并行算法是一个引人注目的问题。[1]和[2]分别提出了并行度为O(logn)和O(n^1/2)的、效率为O(1)的并行排序算法。本文提出一种新的并行排序算法,其效率为O(1),而并行步数小于[1]和[2]的算法的并行步数。经过改进后,在保持效率为O(1)的情况下,可进一步将并行度扩大到O(n^1/2log n)。  相似文献   

8.
The computational complexity of a parallel algorithm depends critically on the model of computation. We describe a simple and elegant rule-based model of computation in which processors apply rules asynchronously to pairs of objects from a global object space. Application of a rule to a pair of objects results in the creation of a new object if the objects satisfy the guard of the rule. The model can be efficiently implemented as a novel MIMD array processor architecture, the Intersecting Broadcast Machine. For this model of computation, we describe an efficient parallel sorting algorithm based on mergesort. The computational complexity of the sorting algorithm isO(nlog2 n), comparable to that for specialized sorting networks and an improvement on theO(n 1.5) complexity of conventional mesh-connected array processors.  相似文献   

9.
The problem of merging k (k⩾2) sorted lists is considered. We give an optimal parallel algorithm which takes O((n log k/p)+log n) time using p processors on a parallel random access machine that allows concurrent reads and exclusive writes, where n is the total size of the input lists. This algorithm achieves O(log n) time using p=n log k/log n processors. Most of the previous log n research for this problem has been focused on the case when k=2. Very recently, parallel solutions for the case when k=2 have been reported. Our solution is the first logarithmic time optimal parallel algorithm for the problem when k⩾2. It can also be seen as a unified optimal parallel algorithm for sorting and merging. In order to support the algorithm, a new processor assignment strategy is also presented  相似文献   

10.
In this paper we consider the problem of computing the connected components of the complement of a given graph. We describe a simple sequential algorithm for this problem, which works on the input graph and not on its complement, and which for a graph on n vertices and m edges runs in optimal O(n+m) time. Moreover, unlike previous linear co-connectivity algorithms, this algorithm admits efficient parallelization, leading to an optimal O(log n)-time and O((n+m)log n)-processor algorithm on the EREW PRAM model of computation. It is worth noting that, for the related problem of computing the connected components of a graph, no optimal deterministic parallel algorithm is currently available. The co-connectivity algorithms find applications in a number of problems. In fact, we also include a parallel recognition algorithm for weakly triangulated graphs, which takes advantage of the parallel co-connectivity algorithm and achieves an O(log2 n) time complexity using O((n+m2) log n) processors on the EREW PRAM model of computation.  相似文献   

11.
This paper presents new efficient shortest path algorithms to solve single origin shortest path problems (SOSP problems) and multiple origins shortest path problems (MOSP problems) for hierarchically clustered data networks. To solve an SOSP problem for a network with n nodes, the distributed version of our algorithm reaches the time complexity of O(log(n)), which is less than the time complexity of O(log 2 (n)) achieved by the best existing algorithm. To solve an MOSP problem, our algorithm minimizes the needed computation resources, including computation processors and communication links for the computation of each shortest path so that we can achieve massive parallelization. The time complexity of our algorithm for an MOSP problem is O(m log(n)), which is much less than the time complexity of O(M log2 (0)) of the best previous algorithm. Here, M is the number of the shortest paths to be computed and m is a positive number related to the network topology and the distribution of the nodes incurring communications, m is usually much smaller than M. Our experiment shows that m is almost a constant when the network size increases. Accordingly, our algorithm is significantly faster than the best previous algorithms to solve MOSP problems for large data networks  相似文献   

12.
目前基因拼接软件中应用最广泛的技术是基于De Bruijn图的基因拼接算法,需要对长达数十亿BP长度的基因组测序数据进行处理.针对海量的基因测序数据,快速、高效和可扩展的基因拼接算法非常重要.虽然已出现一些并行拼接算法(如YAGA)开始研究这些问题,但是拼接过程中时间、空间消耗较大的构图和单链化简这两大步骤在海量数据的挑战下仍然是最主要的计算瓶颈.这是因为现有工作在处理这几个步骤时通常使用了并行的表排序(list ranking),而该方法需要多次对De Bruijn图的海量顶点信息进行分布式的排序,产生了大量的计算节点间的通信.单链化简可由1次De Bruijn 图深度优先遍历完成而不再需要表排序,于是提出一种基于分布式海量图遍历方法对单链化简进行优化,极大地减少了处理器间的通信和计算节点之间的数据移动,因而取得较好的扩展性,其算法复杂度为O(g/p),通信复杂度为O(g),这里g为参考序列的长度,p为处理器的核数.当对E.coli和Yeast数据集进行测试,处理器的核数从8个增加到512个时,算法可以得到13倍和10倍的加速比;当对C.elegans和人类1号染色体(chr1)数据集进行测试,处理器的核数从32个增加到512个时,算法可以得到7倍和10倍的加速比.  相似文献   

13.
The work performed by a parallel algorithm is the product of its running time and the number of processors it requires. This paper presents work-efficient (or cost-optimal) routing algorithms to determine the switch settings for realizing permutations on rearrangeable symmetrical networks such as Benes and the reduced Ω NΩN-1. These networks have 2n-1 stages with N=2n inputs/outputs, each stage consisting of N/2 crossbar switches of size (2×2). Previously known parallel routing algorithms for a rearrangeable network with N inputs determine the states of all switches recursively in O(n) iterations using N processors. Each iteration determines the switch settings of at most two stages of the network and requires at least O(n) time on a computer of N processors, regardless of the type of its interconnection network. Hence, the work of any previously known parallel routing algorithm equals at least O(Nn2) for setting up all the switches of a rearrangeable network. The new routing algorithms run on a computer of p processors, 1⩽p⩽N/n, and perform work O(Nn). Moreover, because the range of p is large, the new routing algorithms do not have to be changed in case some processors become faulty  相似文献   

14.
Consider a set P of points in the plane sorted by the x-coordinate. A point p in P is said to be a proximate point if there exists a point q on the x-axis such that p is the closest point to q over all points in P. The proximate point problem is to determine all the proximate points in P. Our main contribution is to propose optimal parallel algorithms for solving instances of size n of the proximate points problem. We begin by developing a work-time optimal algorithm running in O(log log n) time and using n/loglogn Common-CRCW processors. We then go on to show that this algorithm can be implemented to run in O(log n) time using n/logn EREW processors. In addition to being work-time optimal, our EREW algorithm turns out to also be time-optimal. Our second main contribution is to show that the proximate points problem finds interesting, and quite unexpected, applications to digital geometry and image processing. As a first application, we present a work-time optimal parallel algorithm for finding the convex hull of a set of n points in the plane sorted by x-coordinate; this algorithm runs in O(log log n) time using n/logn Common-CRCW processors. We then show that this algorithm can be implemented to run in O(log n) time using n/logn EREW processors. Next, we show that the proximate points algorithms afford us work-time optimal (resp, time-optimal) parallel algorithms for various fundamental digital geometry and image processing problems  相似文献   

15.
We present two fast algorithms for sorting on a linear array with a reconfigurable pipelined bus system (LARPBS), one of the recently proposed parallel architectures based on optical buses. In our first algorithm, we sort N numbers in O(log N log log N) worst-case time using N processors. In our second algorithm, we sort N numbers in O((log log N)2) worst-case time using N1+ε processors, for any fixed ε such that 0 < ε < 1. Our algorithms are based on a novel deterministic sampling scheme for merging two sorted arrays of length N each in O(log log N) time on an LARPBS with N processors. To our knowledge, the previous best sorting algorithm on this architecture has a running time of O((log N)2) using N processors  相似文献   

16.
G. Sajith  S. Saxena 《Algorithmica》2000,27(2):187-197
The problem of finding a sublogarithmic time optimal parallel algorithm for 3 -colouring rooted forests has been open for long. We settle this problem by obtaining an O(( log log n) log * ( log * n)) time optimal parallel algorithm on a TOLERANT Concurrent Read Concurrent Write (CRCW) Parallel Random Access Machine (PRAM). Furthermore, we show that if f(n) is the running time of the best known algorithm for 3 -colouring a rooted forest on a COMMON or TOLERANT CRCW PRAM, a fractional independent set of the rooted forest can be found in O(f(n)) time with the same number of processors, on the same model. Using these results, it is shown that decomposable top-down algebraic computation and, hence, depth computation (ranking), 2 -colouring and prefix summation on rooted forests can be done in O( log n) optimal time on a TOLERANT CRCW PRAM. These algorithms have been obtained by proving a result of independent interest, one concerning the self-simulation property of TOLERANT: an N -processor TOLERANT CRCW PRAM that uses an address space of size O(N) only, can be simulated on an n -processor TOLERANT PRAM in O(N/n) time, with no asymptotic increase in space or cost, when n=O(N/ log log N) . Received May 20, 1997; revised June 15, 1998.  相似文献   

17.
We present a fast parallel algorithm for computing the dominators of a directed acyclic graph. The model of computation used in a parallel random access machine that allows simultaneous reads but prohibits simultaneous writes into the same memory location. Let Pt(n) be the processor complexity of computing the transitive closure of an n-vertex directed graph on this model. The only known parallel algorithm for dominators requires O(log2n) time and uses O(nPt(n)) processors. Our algorithm for dominators has the same time complexity but uses O(Pt(n)) processors, thereby improving the processor complexity of the previously known algorithm by a factor of n.  相似文献   

18.
The computation model on which the algorithms are developed is the reconfigurable array of processors with wider bus networks (abbreviated to RAPWBN). The main difference between the RAPWBN model and other existing reconfigurable parallel processing systems is that the bus width of each network is bounded within the range [2,[/spl radic/(N)]]. Such a strategy not only saves the silicon area of the chip as well as increases the computational power enormously, but the strategy also allows the execution speed of the proposed algorithms to be tuned by the bus bandwidth. To demonstrate the computational power of the RAPWBN, the channel-assignment problem is derived in this paper. For the channel-assignment problem with N pairs of components, we first design an O(T + [N//spl omega/]) time parallel algorithm using 2N processors with a 2N-row by 2N-column bus network, where the bus width of each bus network is /spl omega/-bit for 2 /spl les/ /spl omega/ /spl les/ [/spl radic/N] and T = [log/sub /spl omega//N] + 1. By tuning the bus bandwidth to the natural log N-bit and the extended N/sup 1/c/-bit (N/sup 1/c/ > log N) for any constant c and c /spl ges/ 1, two more results which run in O(log N/log log N) and O(1) time, respectively, are also derived. When compared to the algorithms proposed by Olariu et al. [17] and Lin [14], it is shown that our algorithm runs in the equivalent time complexity while significantly reducing the number of processors to O(N).  相似文献   

19.
We consider the following partition problem: Given a set S of n elements that is organized as k sorted subsets of size n/k each and given a parameter h with 1/k ≤ h ≤ n/k , partition S into g = O(n/(hk)) subsets D 1 , D 2 , . . . , D g of size Θ(hk) each, such that, for any two indices i and j with 1 ≤ i < j ≤ g , no element in D i is bigger than any element in D j . Note that with various combinations of the values of parameters h and k , several fundamental problems, such as merging, sorting, and finding an approximate median, can be formulated as or be reduced to this partition problem. The partition problem also finds many applications in solving problems of parallel computing and computational geometry. In this paper we present efficient parallel algorithms for solving the partition problem and a number of its applications. Our parallel partition algorithm runs in O( log n) time using processors in the EREW PRAM model. The complexity bounds of our parallel partition algorithm on the respective special cases match those of the optimal EREW PRAM algorithms for merging, sorting, and finding an approximate median. Using our parallel partition algorithm, we are also able to obtain better complexity bounds (even possibly on a weaker parallel model) than the previously best known parallel algorithms for several important problems, including parallel multiselection, parallel multiranking, and parallel sorting of k sorted subsets. Received May 5, 1996; revised July 30, 1998.  相似文献   

20.
An edge is a bisector of a simple path if it contains the middle point of the path. Let T=(V,E) be a tree. Given a source vertex s ∈ V, the single-source tree bisector problem is to find, for every vertex υ ∈ V, a bisector of the simple path from s to υ. The all-pairs tree bisector problem is to find for, every pair of vertices u, υ ∈ V, a bisector of the simple path from u to υ. In this paper, it is first shown that solving the single-source tree bisector problem of a weighted tree has a time lower bound Ω(n log n) in the sequential case. Then, efficient parallel algorithms are proposed on the EREW PRAM for the single-source and all-pairs tree bisector problems. Two O(log n) time single-source algorithms are proposed. One uses O(n) work and is for unweighted trees. The other uses O(n log n) work and is for weighted trees. Previous algorithms for the single-source problem could achieve the same time O(log n) and the same optimal work, O(n) for unweighted trees and O(n log n) for weighted trees, on the CRCW PRAM. The contribution of our single-source algorithms is the improvement from CRCW to EREW. One all-pairs parallel algorithm is proposed. It requires O(log n) time using O(n2) work. All the proposed algorithms are cost-optimal. Efficient tree bisector algorithms have practical applications to several location problems on trees. Using the proposed algorithms, efficient parallel solutions for those problems are also presented  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号