首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper presents a new parallel algorithm for routing unicast (one-to-one) assignments in Benes networks. Parallel routing algorithms for such networks were reported earlier, but these algorithms were designed primarily to route permutation assignments. The routing algorithm presented in this paper removes this restriction without an increase in the order of routing cost or routing time. We realize this new routing algorithm on two different topologies. The algorithm routes a unicast assignment involving O(k) pairs of inputs and outputs in O(lg 2 k+lg n) time on a completely connected network of n processors and in O(lg4 k+lg2 k lg n) time on an extended shuffle-exchange network of n processors. Using O(n lg n) professors, the same algorithm can be pipelined to route α unicast assignments each involving O(k) pairs of inputs and outputs, in O(lg2 k+lg n+(α-1) lg k) time on a completely connected network and in O(lg4 k+lg2 k lg n+(α-1)(lg 3 k+lg k lg n)) time on the extended shuffle-exchange network. These yield an average routing time of O(lg k) in the first case, and O(lg3 k+1g k lg n) in the second case, for all α⩾lg n. These complexities indicate that the algorithm given in this paper is as fast as Nassimi and Sahni's algorithm for unicast assignments, and with pipelining, it is faster than the same algorithm at least by a factor of O(lg n) on both topologies. Furthermore, for sparse assignments, i.e., when k=O(1), it is the first algorithm which has an average routing time of O(1g n) on a topology with O(n) links  相似文献   

2.
In this paper, we propose a design for a new self-routing multicast network which can realize arbitrary multicast assignments between its inputs and outputs without any blocking. The network design uses a recursive decomposition approach and is based on the binary radix sorting concept. All functional components of the network are reverse banyan networks. Specifically, the new multicast network is recursively constructed by cascading a binary splitting network and two half-size multicast networks. The binary splitting network, in turn, consists of two recursively constructed reverse banyan networks. The first reverse banyan network serves as a scatter network and the second reverse banyan network serves as a quasisorting network. The advantage of this approach is to provide a way to self-route multicast assignments through the network and a possibility to reuse part of network to reduce the network cost. The new multicast network we design is compared favorably with the previously proposed multicast networks. It uses O(n log2 n) logic gates, and has O(log2 n) depth and O(log2 n) routing time where the unit of time is a gate delay. By reusing part of the network, the feedback implementation of the network can further reduce the network cost to O(n log n)  相似文献   

3.
We consider the problems of routing and sorting on a de Bruijn network. First, we show that any deterministic oblivious routing scheme for permutation routing on a d-ary de Bruijn network with N=dn nodes, in the worst case, will take Ω(√N) steps under the single-port model. This improves the existing lower bounds provided d is not a constant. We also show that the lower bound is indeed a tight one. Second, we present a deterministic nonoblivious permutation routing algorithm which runs in O(d.n2) time on a d-ary de Bruijn network with N=dn nodes. This algorithm is currently the fastest known nonoblivious deterministic routing algorithm for de Bruijn networks of arbitrary degree. Finally, we present an efficient general sorting algorithm for the de Bruijn networks of arbitrary degree. This algorithm is the best sorting algorithm known so far. It runs in O((log d).d.n2) time for directed de Bruijn network with dn nodes, degree d, and diameter n. As a corollary, we show that on a binary de Bruijn network of Nnodes, our sorting scheme requires at most 2 log2 Nsteps  相似文献   

4.
Given a sequence of n elements, the All Nearest Smaller Values (ANSV) problem is to find, for each element in the sequence, the nearest element to the left (right) that is smaller, or to report that no such element exists. Time and work optimal algorithms for this problem are known on all the PRAM models but the running time of the best previous hypercube algorithm is optimal only when the number of processors p satisfies 1⩽p⩽n/((lg3 n)(lg lg n)2). In this paper, we prove that any normal hypercube algorithm requires Ω(M) processors to solve the ANSV problem in O(lg n) time, and we present the first normal hypercube ANSV algorithm that is optimal for all values of n and p. We use our ANSV algorithm to give the first O(lg n)-time n-processor normal hypercube algorithms for triangulating a monotone polygon and for constructing a Cartesian tree  相似文献   

5.
Shuffle-unshuffle sorting networks are a class of comparator networks whose structure maps efficiently to the hypercube and any of its bounded degree variants. Recently, n -input shuffle-unshuffle sorting networks with depth have been discovered. These networks are the only known sorting networks of depth o( lg 2 n) that are not based on expanders, and their existence raises the question of whether a depth of O( lg n) can be achieved by any shuffle-unshuffle sorting network. In this paper we resolve this question by establishing an Ω( lg n lg lg n/lg lg lg n) lower bound on the depth of any n -input shuffle-unshuffle sorting network. Our lower bound can be extended to certain restricted classes of nonoblivious sorting algorithms on hypercubic machines. Received September 9, 1999, and in final form December 20, 1999.  相似文献   

6.
We generalize the well-known odd-even merge sorting algorithm, originally due to Batcher (1968), and show how this generalized algorithm can be applied to sorting on product networks. If G is an arbitrary factor graph with N nodes, its r-dimensional product contains Nr nodes. Our algorithm sorts Nr keys stored in the r-dimensional product of G in O(rrF(N)) time, where F(N) depends on G. We show that, for any factor graph G, F(N) is, at most, O(N), establishing an upper bound of O(r2 N) for the time complexity of sorting Nr keys on any product network. For product networks with bounded r(e.g. for grids), this leads to the asymptotic complexity of O(N) to sort Nr keys, which is optimal for several instances of product networks. There are factor graphs for which F(N)=O(log2 N), which leads to the asymptotic running time of O(log2 N) to sort Nr keys. For networks with bounded N (e.g. in the hypercube N=2, fixed), the asymptotic complexity becomes O(r2). We show how to apply the algorithm to several cases of well-known product networks, as well as others introduced recently. We compare the performance of our algorithm to well-known algorithms developed specifically for these networks, as well as others. The result of these comparisons led us to conjecture that the proposed algorithm is probably the best deterministic algorithm that can be found in terms of the low asymptotic complexity with a small constant  相似文献   

7.
We study the connection capacity of a class of rearrangeable nonblocking (RNB) and strictly nonblocking (SNB) networks with/without crosstalk-free constraint, model their routing problems as weak or strong edge-colorings of bipartite graphs, and propose efficient routing algorithms for these networks using parallel processing techniques. This class of networks includes networks constructed from banyan networks by horizontal concatenation of extra stages and/or vertical stacking of multiple planes. We present a parallel algorithm that runs in O(lg/sup 2/ N) time for the RNB networks of complexities ranging from O(N lg N) to O(N/sup 1.5/ lg N) crosspoints and parallel algorithms that run in O(min{d* lg N, /spl radic/N}) time for the SNB networks of O(N/sup 1.5/ lg N) crosspoints, using a completely connected multiprocessor system of N processing elements. Our algorithms can be translated into algorithms with an O(lg N lg lg N) slowdown factor for the class of N-processor hypercubic networks, whose structures are no more complex than a single plane in the RNB and SNB networks considered.  相似文献   

8.
The performance of a multiprocessor system depends heavily on its ability to provide conflict free paths among its processors. In this paper, we explore the possibility of using a nonblocking network with O(N log N) edges (crosspoints) to interconnect the processors of an N processor system, We combine Bassalygo and Pinsker's implicit design of strictly nonblocking networks with an explicit construction of expanders to obtain a strictly nonblocking network with -765.18N+352.8N log N edges and 2+log(N/5) depth. We present an efficient parallel algorithm for routing connection requests on this network and implement it on three parallel processor topologies. The implementation on a parallel processor whose processing elements are interconnected as in the Bassalygo-Pinsker network requires O(N log N) processing elements, O(N log N) interprocessor links and it takes O(log N) steps to route any single connection request where each step involves a small number (≈72) of bit-level operations. A contracted or folded version of the same implementation reduces the processing element count to O(N) without increasing the link count or the routing time. Finally, we establish that the same algorithm takes O(log3 N) steps on a perfect shuffle processor with O(N) processing elements. These results improve the crosspoint, depth and routing time complexities of the previously reported strictly nonblocking networks  相似文献   

9.
在并行和分布式环境中 ,多个结点之间的通讯一直是研究工作的焦点问题 ,这些通讯主要包括置换、多播(Multicast)及多源点多播 (Multiple Multicast) .L ai提出了适用于一类与带缓存的 Cube网相互拓扑等价的网络的置换算法 ,硬件代价为 O(N log N ) ,算法的时间复杂度为 O(N ) .Feng提出的 inside- out算法实现了 Omega- 1 × Omega网上的置换 ,总的路由时间复杂度为 O(N log N) ,硬件代价为 O(N log N) .支持严格无阻塞置换的三级 Clos网的总的交叉点数为O(N32 ) .支持严格无阻塞的多播的三级 Clos网为 O(N32 logrloglogr) .Yang提出了一种新的多播网络 ,硬件复杂度为 O(N log2 N ) .为了支持多源点多播 ,多级互联网硬件代价往往大幅上升 .本文借鉴了 L ai提出的规则改变开关状态的方法 ,并把这种算法推广到了多源点多播的情况 .提出了一种新的多源点多播路由算法 ,此算法也同样适用于这一类相互拓扑等价的多级互联网 ,包括 Baseline、Om ega、Cube等 .在此算法中 ,每个数据流被划分为固定大小的数据包 ,在网络中独立传送 ,网络中的每个开关的状态按照固定的状态每个时步规则变化 ,每个数据包两次经过网络后到达目标结点 .此类的网络的硬件代价为 O(Nlog N ) ,此多源点多播路由算法的时间复杂度为 O(N )  相似文献   

10.
There is a growing demand for network support for group applications, in which messages from one or more sender(s) are delivered to a large number of receivers. Here, we propose a network architecture for supporting a fundamental type of group communication, conferencing. A conference refers to a group of members in a network who communicate with each other within the group. We consider adopting a class of multistage networks, such as a baseline, an omega, or an indirect binary cube network, composed of switch modules with fan-in and fan-out capability for a conference network which supports multiple disjoint conferences. The key issue in designing a conference network is to determine the multiplicity of routing conflicts, which is the maximum number of conflict parties competing a single interstage link when multiple disjoint conferences simultaneously present in the network. Our results show that, for a network of size n /spl times/ n, the multiplicities of routing conflicts are small constants (between 2 and 4) for an omega network or an indirect binary cube network; while it can be as large as /spl radic/n/q + 1 for a baseline network, where q is the minimum allowable conference size. Thus, our design for conference networks is based on an omega network or an indirect binary cube network. We also develop fast self-routing algorithms for setting up routing paths in the newly designed conference networks. As can be seen, such an n /spl times/ n conference network has O(logn) routing time and communication delay and O(nlogn) hardware cost. The conference networks are superior to existing designs in terms of routing complexity, communication delay and hardware cost. The conference network proposed is rearrangeably nonblocking in general, and is strictly nonblocking under some conference service policy. It can be used in applications that require efficient or real-time group communication.  相似文献   

11.
This paper presents bitonic sorting schemes for special-purpose parallel architectures such as sorting networks and for general-purpose parallel architectures such as SIMD and/or MIMD computers. First, bitonic sorting algorithms for shared-memory SIMD and/or MIMD computers are developed. Shared-memory accesses through the interconnection network of shared memory SIMD and/or MIMD computers can be very time consuming. A scheme is introduced which reduces the number of such accesses. This scheme is based on the parity strategy which is the main idea of the paper. By reducing the communication through the network, a performance improvement is achieved. Second, a recirculating bitonic sorting network is presented, which is composed of one level of N/2 comparators plus an Ω-network of (log N-1) switch levels. This network reduces the cost complexity to O(N log N) compared with the O(N log2 N) of the original bitonic sorting network, while preserving the same time complexity. Finally, a simplified multistage bitonic sorting network, is presented. For simplifying the interlevel wiring, the parity strategy is used, so N/2 keys are wired straight through the network  相似文献   

12.
This paper presents new efficient shortest path algorithms to solve single origin shortest path problems (SOSP problems) and multiple origins shortest path problems (MOSP problems) for hierarchically clustered data networks. To solve an SOSP problem for a network with n nodes, the distributed version of our algorithm reaches the time complexity of O(log(n)), which is less than the time complexity of O(log 2 (n)) achieved by the best existing algorithm. To solve an MOSP problem, our algorithm minimizes the needed computation resources, including computation processors and communication links for the computation of each shortest path so that we can achieve massive parallelization. The time complexity of our algorithm for an MOSP problem is O(m log(n)), which is much less than the time complexity of O(M log2 (0)) of the best previous algorithm. Here, M is the number of the shortest paths to be computed and m is a positive number related to the network topology and the distribution of the nodes incurring communications, m is usually much smaller than M. Our experiment shows that m is almost a constant when the network size increases. Accordingly, our algorithm is significantly faster than the best previous algorithms to solve MOSP problems for large data networks  相似文献   

13.
Permutation networks have been used in the literature to model interprocessor and processor-memory interconnections in parallel computers. This paper introduces new permutation network designs and generalizes the notion of a permutation network to provide a more flexible model of such interconnections. The new designs are based concentrators and superconcentrators, and for n inputs they can be optimized to obtain self-routing permutation networks with O(n lg n) cost, O(lg n) depth, and O(lg2n) routing time. The main feature of these new network designs is that they do not require complex routing schemes such as Clos networks since they are inherently self-routing. Generalizations of these designs are also given to obtain permutation networks in which the numbers of inputs and outputs may be different, and/or the maximum number of parallel routes between inputs and outputs can be less than the number of inputs or outputs, or both. For n inputs, αn outputs, and O(nϵ) parallel routes, where 0 < α ≤ 1, 0 < ϵ < 1, these generalized designs can be optimized to have permutation networks with O(n) cost, O(lg n), depth, and O(lg2n) routing time. It is shown that the previously known designs, such as Clos networks, result in inferior realizations when compared with these new designs.  相似文献   

14.
Recurrence formulations for various problems, such as finding an optimal order of matrix multiplication, finding an optimal binary search tree, and optimal triangulation of polygons, assume a similar form. A. Gibbons and W. Rytter (1988) gave a CREW PRAM algorithm to solve such dynamic programming problems. The algorithm uses O(n6/log n) processors and runs in O(log2 n) time. In this article, a modified algorithm is presented that reduces the processor requirement to O(n6/log 5n) while maintaining the same time complexity of O(log2 n)  相似文献   

15.
In (2n−1)-stage rearrangeable networks, the routing time for any arbitrary permutation is Ω(n2) compared to its propagation delay O(n) only. Here, we attempt to identify the sets of permutations, which are routable in O(n) time in these networks. We define four classes of self-routable permutations for Benes network. An O(n) algorithm is presented here, that identifies if any permutation P belongs to one of the proposed self-routable classes, and if yes, it also generates the necessary control vectors for routing P. Therefore, the identification, as well as the switch setting, both problems are resolved in O(n) time by this algorithm. It covers all the permutations that are self-routable by anyone of the proposed techniques. Some interesting relationships are also explored among these four classes of permutations, by applying the concept of ‘group-transformations’ [N. Das, B.B. Bhattacharya, J. Dattagupta, Hierarchical classification of permutation classes in multistage interconnection networks, IEEE Trans. Comput. (1993) 665–677] on these permutations. The concepts developed here for Benes network, can easily be extended to a class of (2n−1)-stage networks, which are topologically equivalent to Benes network. As a result, the set of permutations routable in a (2n−1)-stage rearrangeable network, in a time comparable to its propagation delay has been extended to a large extent.  相似文献   

16.
We describe algorithms for constructing optimal binary search trees, in which the access cost of a key depends on the k preceding keys which were reached in the path to it. This problem has applications to searching on secondary memory and robotics. Two kinds of optimal trees are considered, namely optimal worst case trees and weighted average case trees. The time and space complexities of both algorithms are O(nk+2) and O(nk+1), respectively. The algorithms are based on a convenient decomposition and characterizations of sequences of keys which are paths of special kinds in binary search trees. Finally, using generating functions, we present an exact analysis of the number of steps performed by the algorithms.  相似文献   

17.
文中提出了一种新的多路归并排序网络,该网络基于倾斜与振荡多路归并排序算法.该网络有两个主要特点.一是其基本构件为k-sorters,即k个数的排序器,k为任意素数,而传统的排序网络的基本构件为两个数的排序,即2-sorters.二是该网络的延迟可以小于传统的基于2-sorters的Batcher排序网络.文中给出了该排序网络的具体实现;作为实例给出了N=27,k=3时的排序网络;分析了该网络的时间延迟;通过具体设计排序网络的基本构件2-sorters和3-sorters,表明这种新的多路归并排序网络和Batcher排序网络相比是一种高速的排序网络.  相似文献   

18.
The main contribution of this work is to present elegant broadcast-efficient protocols for permutation routing, ranking, and sorting on single-hop Mobile Radio Networks with p stations and k radio channels, denoted by MRN(p,k). Clearly, any protocol performing these tasks on n items must perform n/k broadcast rounds because each item must be broadcast at least once. We begin by presenting an optimal off-line permutation routing protocol using n /k broadcast rounds for arbitrary k, p, and n. Further, we show that optimal on-line routing can be performed in n/ k broadcast rounds, provided that either k=1 or p=n. We then go on to develop an online routing protocol that takes 2n/ k+k-1 broadcast rounds on the MRN(p,k), whenever k⩽√p/2. Using these routing protocols as basic building blocks, we develop a ranking protocol that takes 2n /k+o(n/k) broadcast rounds as well as a sorting protocol that takes 3n/k+o(n/k) broadcast rounds, provided that k ϵ o(√n) and p=n. Finally, we develop a ranking protocol that takes 3n/k+o(n/ k) broadcast rounds, as well as a sorting protocol that takes 4n/k+o(n/k) broadcast rounds on the MRN(p,k), provided that k⩽√p/2 and p ϵ o(n). Featuring very low proportionality constants, our protocols offer a vast improvement over the state of the art  相似文献   

19.
Permutation is a frequently-used communication pattern in parallel and distributed computing systems and telecommunication networks. Node-disjoint routing has important applications in guided wave optical interconnects where the optical "crosstalk" between messages passing the same switch should be avoided. In this paper, we consider routing arbitrary permutations on an optical baseline network (or reverse baseline network) with node-disjoint paths. We first prove the equivalence between the set of admissible permutations (or semipermutations) of a baseline network and that of its reverse network based on a step-by-step permutation routing. We then show that an arbitrary permutation can be realized in a baseline network (or a reverse baseline network) with node-disjoint paths in four passes, which beats the existing results [M. Vaez et al., (2000)], [G. Maier et al., (2001)] that a permutation can be realized in an n /spl times/ n banyan network with node-disjoint paths in O(n/sup 1/2/) passes. This represents the currently best-known result for the number of passes required for routing an arbitrary permutation with node-disjoint paths in unique-path multistage networks. Unlike other unique path MINs (such as omega networks or banyan networks), only baseline networks have been found to possess such four-pass routing property. We present routing algorithms in both self-routing style and central-controlled style. Different from the recent work in [Y. Yang et al., (2003)], which also gave a four-pass node-disjoint routing algorithm for permutations, the new algorithm is efficient in transmission time for messages of any length, while the algorithm in [Y. Yang et al., (2003)] can work efficiently only for long messages. Comparisons with previous results demonstrate that routing in a baseline network proposed in this paper could be a better choice for routing permutations due to its lowest hardware cost and near-optimal transmission time.  相似文献   

20.
This paper presents several deterministic algorithms for selecting the k th largest record from a set of n records on any n -node hypercubic network. All of the algorithms are based on the selection algorithm of Cole and Yap, as well as on various sorting algorithms for hypercubic networks. Our fastest algorithm runs in O( lg n lg * n) time, very nearly matching the trivial lower bound. Previously, the best upper bound known for selection was O( lg n lg lg n) . A key subroutine in our O( lg n lg * n) time selection algorithm is a sparse version of the Sharesort algorithm that sorts n records using p processors, , in O( lg n ( lg lg p - lg lg (p/n) ) 2 ) time. Received March 23, 1994; revised October 30, 1997.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号