首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Many routing problems in parallel processing, such as concentration and permutation problems, can be cast as sorting problems. In this paper, we consider the problem of sorting on a new model, called an adaptive sorting network. We show that any sequence of n bits can be sorted on this model in O(lg2 n) bit-level delay using O(n) constant fanin gates. This improves the cost complexity of K.E. Batcher's binary sorters (1968) by a factor of O(lg2 n) while matching their sorting time. The only other network that can sort binary sequences in O(n) cost is the network version of columnsort algorithm, but this requires excessive pipelining. In addition, using binary sorters, we construct permutation networks with O(n lg n) bit-level cost and O(lg3 n) bit-level delay. These results provide the asymptotically least-cost practical concentrators and permutation networks to date. We note, of course, that the well-known AKS sorting network has O(lg n) sorting time and O(n lg n) cost, but the constants hidden in these complexities are so large that our complexities outperform those of the AKS sorting network until n becomes extremely large  相似文献   

2.
Given a sequence of n elements, the All Nearest Smaller Values (ANSV) problem is to find, for each element in the sequence, the nearest element to the left (right) that is smaller, or to report that no such element exists. Time and work optimal algorithms for this problem are known on all the PRAM models but the running time of the best previous hypercube algorithm is optimal only when the number of processors p satisfies 1⩽p⩽n/((lg3 n)(lg lg n)2). In this paper, we prove that any normal hypercube algorithm requires Ω(M) processors to solve the ANSV problem in O(lg n) time, and we present the first normal hypercube ANSV algorithm that is optimal for all values of n and p. We use our ANSV algorithm to give the first O(lg n)-time n-processor normal hypercube algorithms for triangulating a monotone polygon and for constructing a Cartesian tree  相似文献   

3.
Given a graph G=(V, E) with n vertices and m edges, the k-connectivity of G denotes either the k-edge connectivity or the k-vertex connectivity of G. In this paper, we deal with the fully dynamic maintenance of k-connectivity of G in the parallel setting for k=2, 3. We study the problem of maintaining k-edge/vertex connected components of a graph undergoing repeatedly dynamic updates, such as edge insertions and deletions, and answering the query of whether two vertices are included in the same k-edge/vertex connected component. Our major results are the following: (1) An NC algorithm for the 2-edge connectivity problem is proposed, which runs in O(log n log(m/n)) time using O(n3/4) processors per update and query. (2) It is shown that the biconnectivity problem can be solved in O(log2 n ) time using O(nα(2n, n)/logn) processors per update and O(1) time with a single processor per query or in O(log n logn/m) time using O(nα(2n, n)/log n) processors per update and O(logn) time using O(nα(2n, n)/logn) processors per query, where α(.,.) is the inverse of Ackermann's function. (3) An NC algorithm for the triconnectivity problem is also derived, which takes O(log n logn/m+logn log log n/α(3n, n)) time using O(nα(3n, n)/log n) processors per update and O(1) time with a single processor per query. (4) An NC algorithm for the 3-edge connectivity problem is obtained, which has the same time and processor complexities as the algorithm for the triconnectivity problem. To the best of our knowledge, the proposed algorithms are the first NC algorithms for the problems using O(n) processors in contrast to Ω(m) processors for solving them from scratch. In particular, the proposed NC algorithm for the 2-edge connectivity problem uses only O(n3/4) processors. All the proposed algorithms run on a CRCW PRAM  相似文献   

4.
The work performed by a parallel algorithm is the product of its running time and the number of processors it requires. This paper presents work-efficient (or cost-optimal) routing algorithms to determine the switch settings for realizing permutations on rearrangeable symmetrical networks such as Benes and the reduced Ω NΩN-1. These networks have 2n-1 stages with N=2n inputs/outputs, each stage consisting of N/2 crossbar switches of size (2×2). Previously known parallel routing algorithms for a rearrangeable network with N inputs determine the states of all switches recursively in O(n) iterations using N processors. Each iteration determines the switch settings of at most two stages of the network and requires at least O(n) time on a computer of N processors, regardless of the type of its interconnection network. Hence, the work of any previously known parallel routing algorithm equals at least O(Nn2) for setting up all the switches of a rearrangeable network. The new routing algorithms run on a computer of p processors, 1⩽p⩽N/n, and perform work O(Nn). Moreover, because the range of p is large, the new routing algorithms do not have to be changed in case some processors become faulty  相似文献   

5.
The performance of a multiprocessor system depends heavily on its ability to provide conflict free paths among its processors. In this paper, we explore the possibility of using a nonblocking network with O(N log N) edges (crosspoints) to interconnect the processors of an N processor system, We combine Bassalygo and Pinsker's implicit design of strictly nonblocking networks with an explicit construction of expanders to obtain a strictly nonblocking network with -765.18N+352.8N log N edges and 2+log(N/5) depth. We present an efficient parallel algorithm for routing connection requests on this network and implement it on three parallel processor topologies. The implementation on a parallel processor whose processing elements are interconnected as in the Bassalygo-Pinsker network requires O(N log N) processing elements, O(N log N) interprocessor links and it takes O(log N) steps to route any single connection request where each step involves a small number (≈72) of bit-level operations. A contracted or folded version of the same implementation reduces the processing element count to O(N) without increasing the link count or the routing time. Finally, we establish that the same algorithm takes O(log3 N) steps on a perfect shuffle processor with O(N) processing elements. These results improve the crosspoint, depth and routing time complexities of the previously reported strictly nonblocking networks  相似文献   

6.
Unicast in hypercubes with large number of faulty nodes   总被引:1,自引:0,他引:1  
Unicast in computer/communication networks is a one-to-one communication between a source node s and a destination node t. We propose three algorithms which find a nonfaulty routing path between s and t for unicast in the hypercube with a large number of faulty nodes. Given the n-dimensional hypercube Hn and a set F of faulty nodes, node uϵ Hn is called k-safe if u has at least k nonfaulty neighbors. The Hn is called k-safe if every node of Hn is k-safe. It has been known that for 0⩽k⩽n/2, a k-safe Hn is connected if |F|⩽2k(n-k)-1. Our first algorithm finds a nonfaulty path of length at most d(s,t)+4 in O(n) time for unicast between 1-safe s and t in the Hn with |F|⩽2n-3, where d(s,t) is the distance between s and t. The second algorithm finds a nonfaulty path of length at most d(s,t)+6 in O(n) time for unicast in the 2-safe Hn with |F|⩽4n-9. The third algorithm finds a nonfaulty path of length at most d(s,t)+O(k2) in time O(|F|+n) for unicast in the k-safe Hn with |F|⩽2k(n-k)-1 (0⩽k⩽n/2). The time complexities of the algorithms are optimal. We show that in the worst case, the length of the nonfaulty path between s and t in a k-safe Hn with |F|⩽2k(n-k)-1 is at least d(s,t)+2(k+1) for 0⩽k⩽n/2. This implies that the path lengths found by the algorithms for unicast in the 1-safe and 2-safe hypercubes are optimal  相似文献   

7.
We present a parallel recognition algorithm for bipartite-permutation graphs. The algorithm can be executed in O(log n) time on the CRCW PRAM if O(n3/log n) processors are used, or O(log2 n) time on the CREW PRAM if O(n3/log2 n) processors are used. Chen and Yesha (1993) have presented another CRCW PRAM algorithm that takes O(log2n) time if O(n 3) processors are used. Compared with Chen and Yesha's algorithm, our algorithm requires either less time and fewer processors on the same machine model, or fewer processors on a weaker machine model. Our algorithm can also be applied to determine if two bipartite-permutation graphs are isomorphic  相似文献   

8.
A constant-time algorithm for labeling the connected components of an N×N image on a reconfigurable network of N3 processors is presented. The main contribution of the algorithm is a novel constant-time technique for determining the minimum-labeled PE in each component. The number of processors used by the algorithm can be reduced to N/sup 2+(1/d/), for any 1⩽d⩽log N, if O(d) time is allowed  相似文献   

9.
The data compression based on dictionary techniques works by replacing phrases in the input string with indexes into some dictionary. The dictionary can be static or dynamic. In static dictionary compression, the dictionary contains a predetermined fixed set of entries. In dynamic dictionary compression, the dictionary changes its entries during compression. We present parallel algorithms for two parsing strategies for static dictionary compression. One is the optimal parsing strategy with dictionaries that have the prefix properly, for which our algorithm requires O(L+log n) time and O(n) processors, where n is the number of symbols in the input string, and L is the maximum length of the dictionary entries, while previous results run in O(L+log n) time using O(n2) processors or in O(L+log2 n) time using O(n) processors. The other is the longest fragment first (LFF) parsing strategy, for which our algorithm requires O(L+log n,) time and O(n log L) processors, while a previous result obtained an O(L log n) time performance on O(n/log n) processors. For both strategies, we derive our parallel algorithms by modifying the on-line algorithms using a pointer doubling technique  相似文献   

10.
In this paper, we propose a design for a new self-routing multicast network which can realize arbitrary multicast assignments between its inputs and outputs without any blocking. The network design uses a recursive decomposition approach and is based on the binary radix sorting concept. All functional components of the network are reverse banyan networks. Specifically, the new multicast network is recursively constructed by cascading a binary splitting network and two half-size multicast networks. The binary splitting network, in turn, consists of two recursively constructed reverse banyan networks. The first reverse banyan network serves as a scatter network and the second reverse banyan network serves as a quasisorting network. The advantage of this approach is to provide a way to self-route multicast assignments through the network and a possibility to reuse part of network to reduce the network cost. The new multicast network we design is compared favorably with the previously proposed multicast networks. It uses O(n log2 n) logic gates, and has O(log2 n) depth and O(log2 n) routing time where the unit of time is a gate delay. By reusing part of the network, the feedback implementation of the network can further reduce the network cost to O(n log n)  相似文献   

11.
We consider the problems of routing and sorting on a de Bruijn network. First, we show that any deterministic oblivious routing scheme for permutation routing on a d-ary de Bruijn network with N=dn nodes, in the worst case, will take Ω(√N) steps under the single-port model. This improves the existing lower bounds provided d is not a constant. We also show that the lower bound is indeed a tight one. Second, we present a deterministic nonoblivious permutation routing algorithm which runs in O(d.n2) time on a d-ary de Bruijn network with N=dn nodes. This algorithm is currently the fastest known nonoblivious deterministic routing algorithm for de Bruijn networks of arbitrary degree. Finally, we present an efficient general sorting algorithm for the de Bruijn networks of arbitrary degree. This algorithm is the best sorting algorithm known so far. It runs in O((log d).d.n2) time for directed de Bruijn network with dn nodes, degree d, and diameter n. As a corollary, we show that on a binary de Bruijn network of Nnodes, our sorting scheme requires at most 2 log2 Nsteps  相似文献   

12.
A new family of network topologies containing multiple loops is discussed in this paper. In the proposed structure, N processors are interconnected to form a graph G(m, N), m 3, where m is a parameter of the graph such that N is an even multiple of m and (m − 1) × 2[(m− l)/2]+ < N m × 2[m/2]+1. The graph G(m, N) is hamiltonian with an average node degree (3 + l/m), when m is even and exactly 3 when m is odd. Whereas, the maximum node degree is 4. The diameter of G(m, N) is upper bounded by [11m/8]+ 1. A point to point routing algorithm has been presented. Implementation of ASCEND/DESCEND algorithms in O(m) time has been discussed. It has been shown that in case of a single node failure, the diameter increases by at most 6.  相似文献   

13.
Given N matrices A1, A2,...,AN of size NtimesN, the matrix chain product problem is to compute A1timesA2times...timesAN. Given an NtimesN matrix A, the matrix powers problem is to calculate the first N powers of A, that is, A, A2, A3,..., AN. We solve the two problems on distributed memory systems (DMSs) with p processors that can support one-to-one communications in T(p) time. Assume that the fastest sequential matrix multiplication algorithm has time complexity O(Nalpha), where the currently best value of a is less than 2.3755. Let p be arbitrarily chosen in the range 1lesplesNalpha+1/(log N)2. We show that the two problems can be solved by a DMS with p processors in Tchain(N,p)=O((Nalpha+1/p)+T(p))((N2(2+1/alpha/p2/alpha)(log+p/N)1-2/alpha+log+((p log N)/Nalpha)) and Tpower (N,p)=O(Nalpha+1/p+T(p)((N2(1+1/alpha)/p2/alpha)(log+p/2 log N)1-2/alpha+(log N)2))) times, respectively, where the function log+ is defined as follows: log+ x=log x if xges1 and log+ x=1 if 0相似文献   

14.
We present two fast algorithms for sorting on a linear array with a reconfigurable pipelined bus system (LARPBS), one of the recently proposed parallel architectures based on optical buses. In our first algorithm, we sort N numbers in O(log N log log N) worst-case time using N processors. In our second algorithm, we sort N numbers in O((log log N)2) worst-case time using N1+ε processors, for any fixed ε such that 0 < ε < 1. Our algorithms are based on a novel deterministic sampling scheme for merging two sorted arrays of length N each in O(log log N) time on an LARPBS with N processors. To our knowledge, the previous best sorting algorithm on this architecture has a running time of O((log N)2) using N processors  相似文献   

15.
Parallel algorithms for relational coarsest partition problems   总被引:2,自引:0,他引:2  
Relational Coarsest Partition Problems (RCPPs) play a vital role in verifying concurrent systems. It is known that RCPPs are P-complete and hence it may not be possible to design polylog time parallel algorithms for these problems. In this paper, we present two efficient parallel algorithms for RCPP in which its associated label transition system is assumed to have m transitions and n states. The first algorithm runs in O(n1+ϵ) time using m/nϵ CREW PRAM processors, for any fixed ϵ<1. This algorithm is analogous to and optimal with respect to the sequential algorithm of P.C. Kanellakis and S.A. Smolka (1990). The second algorithm runs in O(n log n) time using m/n CREW PRAM processors. This algorithm is analogous to and nearly optimal with respect to the sequential algorithm of R. Paige and R.E. Tarjan (1987)  相似文献   

16.
PRAM和LARPBS模型上的近似串匹配并行算法   总被引:15,自引:1,他引:15  
钟诚  陈国良 《软件学报》2004,15(2):159-169
近似串匹配技术在网络信息搜索、数字图书馆、模式识别、文本挖掘、IP路由查找、网络入侵检测、生物信息学、音乐研究计算等领域具有广泛的应用.基于CREW-PRAM(parallel random access machine with concurrent read and exclusive write)模型,采用波前式并行推进的方法直接计算编辑距离矩阵D,设计了一个允许k-差别的近似串匹配动态规划并行算法,该算法使用(m+1)个处理器,时间复杂度为O(n),算法理论上达到线性加速;采取水平和斜向双并行计算编辑距离矩阵D的方法,设计了一个使用((m+1)个处理器和O(n/(+m)时间的、可伸缩的、允许k-差别的近似串匹配动态规划并行算法,.基于分治策略,通过灵活拆分总线和合并子总线动态重构光总线系统,并充分利用光总线的消息播送技术和并行计算前缀和的方法,实现了汉明距离的并行计算,设计了两个基于LARPBS(linear arrays with reconfigurable pipelined bus system)模型的通信高效、可扩放的允许k-误配的近似串匹配并行算法,其中一个算法使用n个处理器,时间为O(m);另一个为常数时间算法,使用mn个处理器.  相似文献   

17.
Recurrence formulations for various problems, such as finding an optimal order of matrix multiplication, finding an optimal binary search tree, and optimal triangulation of polygons, assume a similar form. A. Gibbons and W. Rytter (1988) gave a CREW PRAM algorithm to solve such dynamic programming problems. The algorithm uses O(n6/log n) processors and runs in O(log2 n) time. In this article, a modified algorithm is presented that reduces the processor requirement to O(n6/log 5n) while maintaining the same time complexity of O(log2 n)  相似文献   

18.
Optimal Initialization and Gossiping Algorithms for Random Radio Networks   总被引:2,自引:0,他引:2  
The initialization problem, also known as naming, consists to give a unique identifier ranging from 1 to n to a set of n indistinguishable nodes in a given network. We consider a network where n nodes (processors) are randomly deployed in a square (respectively, cube) X. We assume that the time is slotted and the network is synchronous, two nodes are able to communicate if they are within distance at most of r of each other (r is the transmitting/receiving range). Moreover, if two or more neighbors of a processor u transmit concurrently at the same time slot, then u would not receive either messages. We suppose also that the anonymous nodes know neither the topology of the network nor the number of nodes in the network. Under this extremal scenario, we first show how the transmitting range of the deployed processors affects the typical characteristics of the network. Then, by allowing the nodes to transmit at various ranges we design sublinear randomized initialization protocols: In the two, respectively, three, dimensional case, our randomized initialization algorithms run in O(n1/2 log n1/2), respectively, O(n1/3 log n2/3), time slots. These latter protocols are based upon an optimal gossiping algorithm which is of independent interest  相似文献   

19.
The main contribution of this work is to present elegant broadcast-efficient protocols for permutation routing, ranking, and sorting on single-hop Mobile Radio Networks with p stations and k radio channels, denoted by MRN(p,k). Clearly, any protocol performing these tasks on n items must perform n/k broadcast rounds because each item must be broadcast at least once. We begin by presenting an optimal off-line permutation routing protocol using n /k broadcast rounds for arbitrary k, p, and n. Further, we show that optimal on-line routing can be performed in n/ k broadcast rounds, provided that either k=1 or p=n. We then go on to develop an online routing protocol that takes 2n/ k+k-1 broadcast rounds on the MRN(p,k), whenever k⩽√p/2. Using these routing protocols as basic building blocks, we develop a ranking protocol that takes 2n /k+o(n/k) broadcast rounds as well as a sorting protocol that takes 3n/k+o(n/k) broadcast rounds, provided that k ϵ o(√n) and p=n. Finally, we develop a ranking protocol that takes 3n/k+o(n/ k) broadcast rounds, as well as a sorting protocol that takes 4n/k+o(n/k) broadcast rounds on the MRN(p,k), provided that k⩽√p/2 and p ϵ o(n). Featuring very low proportionality constants, our protocols offer a vast improvement over the state of the art  相似文献   

20.
We efficiently map a priority queue on the hypercube architecture in a load balanced manner, with no additional communication overhead, and present optimal parallel algorithms for performing insert and deletemin operations. Two implementations for such operations are proposed on the single port hypercube model. In a b-bandwidth, n-item priority queue in which every node contains b items in sorted order, the first implementation achieves optimal speed up of O(min{log n, b log n/log b+log log n}) for inserting b presorted items or deleting b smallest items, where b=O(n1c/) with c>1. In particular, single insertion and deletion operations are cost optimal and require O(log n/p+log p) time using O(log n/log log n) processors. The second implementation is more scalable since it uses a larger number of processors, and attains a “nearly” optimal speedup on the single hypercube. Namely, the insertion of log n presorted items or the deletion of the log n smallest items is accomplished in O(log log n2) time using O(log2 n/log log n) processors. Finally, on the slightly more powerful pipelined hypercube model, the second implementation performs log n operations in O(log log n) time using O(log2 n/log log n) processors, thus achieving an optimal speed up. To the best of our knowledge, our algorithms are the first implementations of b-bandwidth distributed priority queues, which are load balanced and yet guarantee optimal speed ups  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号