期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A fast parallel algorithm for routing unicast assignments in Benesnetworks

Ching-Yi Lee Oruc A.Y. 《Parallel and Distributed Systems, IEEE Transactions on》1995,6(3):329-334

This paper presents a new parallel algorithm for routing unicast (one-to-one) assignments in Benes networks. Parallel routing algorithms for such networks were reported earlier, but these algorithms were designed primarily to route permutation assignments. The routing algorithm presented in this paper removes this restriction without an increase in the order of routing cost or routing time. We realize this new routing algorithm on two different topologies. The algorithm routes a unicast assignment involving O(k) pairs of inputs and outputs in O(lg ² k+lg n) time on a completely connected network of n processors and in O(lg⁴ k+lg² k lg n) time on an extended shuffle-exchange network of n processors. Using O(n lg n) professors, the same algorithm can be pipelined to route α unicast assignments each involving O(k) pairs of inputs and outputs, in O(lg² k+lg n+(α-1) lg k) time on a completely connected network and in O(lg⁴ k+lg² k lg n+(α-1)(lg ³ k+lg k lg n)) time on the extended shuffle-exchange network. These yield an average routing time of O(lg k) in the first case, and O(lg³ k+1g k lg n) in the second case, for all α⩾lg n. These complexities indicate that the algorithm given in this paper is as fast as Nassimi and Sahni's algorithm for unicast assignments, and with pipelining, it is faster than the same algorithm at least by a factor of O(lg n) on both topologies. Furthermore, for sparse assignments, i.e., when k=O(1), it is the first algorithm which has an average routing time of O(1g n) on a topology with O(n) links 相似文献

2.

Randomized self-routing algorithms for Clos networks

Abdou Youssef 《Computers & Electrical Engineering》1993,19(6):419-429

As the VLSI technology makes large crossbar switches affordable, Clos networks have become a feasible option of large interconnection networks. However, to make these networks practical and useful, efficient routing algorithms need to be developed. This paper will develop and study several randomized routing algorithms for Clos networks. The algorithms are based on the idea that if the first column of Clos is set to some configuration somehow, then the resulting network becomes self-routed using the destination addresses. Each of the randomized algorithms sets the first column to a configuration selected by a random process. The algorithms are then self-routed and take no computation time to set the switches. Probabilistic analysis and simulation measurements of the communication delay of permutation routing are conducted. It is shown that the communication delay of any permutation is 3–6 cycles in networks of up to 1024 processors. Although other routing algorithms route arbitrary permutations in one cycle over Clos/Benes networks and 2 cycles over δ networks, these algorithms take prohibitively large times to compute the appropriate switch settings, while our randomized algorithms are self-routed and spend NO time on computing the switch settings. This makes our algorithms superior to any universal nonrandomized routing algorithm for Clos/Benes networks or δ networks. The speed, universality and ease of implementation of our randomized algorithms make Clos networks highly attractive for large parallel computer systems. 相似文献

3.

A new routing algorithm for multirate rearrangeable Clos networks

Hung Q. Ngo 《Theoretical computer science》2003,290(3):2157-2167

In this paper, we study the problem of finding routing algorithms on the multirate rearrangeable Clos networks which use as few number of middle-stage switches as possible. We propose a new routing algorithm called the “grouping algorithm”. This is a simple algorithm which uses fewer middle-stage switches than all known strategies, given that the number of input-stage switches and output-stage switches are relatively small compared to the size of input and output switches. In particular, the grouping algorithm implies that m = 2n+(n−1)/2^k is a sufficient number of middle-stage switches for the symmetric three-stage Clos network C(n,m,r) to be multirate rearrangeable, where k is any positive integer and rn/(2^k−1). 相似文献

4.

Efficient nonblocking switching networks for interprocessorcommunications in multiprocessor systems

Fong-Chih Shao Yavuz Oruc A. 《Parallel and Distributed Systems, IEEE Transactions on》1995,6(2):132-141

The performance of a multiprocessor system depends heavily on its ability to provide conflict free paths among its processors. In this paper, we explore the possibility of using a nonblocking network with O(N log N) edges (crosspoints) to interconnect the processors of an N processor system, We combine Bassalygo and Pinsker's implicit design of strictly nonblocking networks with an explicit construction of expanders to obtain a strictly nonblocking network with -765.18N+352.8N log N edges and 2+log(N/5) depth. We present an efficient parallel algorithm for routing connection requests on this network and implement it on three parallel processor topologies. The implementation on a parallel processor whose processing elements are interconnected as in the Bassalygo-Pinsker network requires O(N log N) processing elements, O(N log N) interprocessor links and it takes O(log N) steps to route any single connection request where each step involves a small number (≈72) of bit-level operations. A contracted or folded version of the same implementation reduces the processing element count to O(N) without increasing the link count or the routing time. Finally, we establish that the same algorithm takes O(log³ N) steps on a perfect shuffle processor with O(N) processing elements. These results improve the crosspoint, depth and routing time complexities of the previously reported strictly nonblocking networks 相似文献

5.

Parallel routing algorithms for nonblocking electronic and photonic switching networks

Lu E. Zheng S.Q. 《Parallel and Distributed Systems, IEEE Transactions on》2005,16(8):702-713

We study the connection capacity of a class of rearrangeable nonblocking (RNB) and strictly nonblocking (SNB) networks with/without crosstalk-free constraint, model their routing problems as weak or strong edge-colorings of bipartite graphs, and propose efficient routing algorithms for these networks using parallel processing techniques. This class of networks includes networks constructed from banyan networks by horizontal concatenation of extra stages and/or vertical stacking of multiple planes. We present a parallel algorithm that runs in O(lg/sup 2/ N) time for the RNB networks of complexities ranging from O(N lg N) to O(N/sup 1.5/ lg N) crosspoints and parallel algorithms that run in O(min{d* lg N, /spl radic/N}) time for the SNB networks of O(N/sup 1.5/ lg N) crosspoints, using a completely connected multiprocessor system of N processing elements. Our algorithms can be translated into algorithms with an O(lg N lg lg N) slowdown factor for the class of N-processor hypercubic networks, whose structures are no more complex than a single plane in the RNB and SNB networks considered. 相似文献

6.

A family of parallel prefix algorithms embedded in networks

Takesue M. 《Parallel and Distributed Systems, IEEE Transactions on》1993,4(10):1179-1184

This paper presents a family of algorithms for producing, from (υ₀, υ₁, ..., υ_n-1), all initial prefixes x_i=υ₀&thetas;υ₁&thetas;···&thetas;υ_i (i=0, 1, ..., n-1) in parallel in interconnection networks such as the omega network and the hypercube, where &thetas; is an associative binary operator. Each algorithm can be embedded in the switches and interconnections of the network, and can be executed in O((log₂ r+1) log_r n) time steps provided that the network connecting n processors is constructed by using an r×r switch, and that parallelism within as well as among individual switches is exploited. The objective of these algorithms is to attain a communication pattern that fits the topology of the network. One type of network can be made equivalent to, or can be embedded in, another type of network, so a family of algorithms can be derived from one basic algorithm. In the basic algorithm, every processor p_i upward multicasts υ_i to processors p_k (k=i+1, i+2, ..., n - 1). En route to p_i, υ_j (j=0, 1, ..., i - 1) are combined in the switches to produce the (i - 1)th initial prefix x_i-1 that is received by p_i, which can then compute the ith initial prefix x_i=x_i-1&thetas;υ_i 相似文献

7.

光交换网络中的路由算法

下载免费PDF全文

刘勇军杨俊波《计算机工程》2008,34(8):123-125

对于一类对称的可重排的多级光交换网络,提出一种有效的路由算法,将输入输出信号终端数映射到光交换网络的中央级,得到2个交换组,通过交换组内对应终端数的交换,完成中央级输入输出端所映射终端数的排列,从而确定中央级节点开关状态。再同时向2个方向进行类似操作,可依次确定各级节点开关状态。该路由算法操作时间短,通过O(N)步即可完成路由确定,可以有效处理对称光交换网络的路由问题,对于利用光交换网络实现全光交换和排序具有一定应用价值。相似文献

8.

Efficient routing and sorting schemes for de Bruijn networks

Hsu D.F. Wei D.S.L. 《Parallel and Distributed Systems, IEEE Transactions on》1997,8(11):1157-1170

We consider the problems of routing and sorting on a de Bruijn network. First, we show that any deterministic oblivious routing scheme for permutation routing on a d-ary de Bruijn network with N=dⁿ nodes, in the worst case, will take Ω(√N) steps under the single-port model. This improves the existing lower bounds provided d is not a constant. We also show that the lower bound is indeed a tight one. Second, we present a deterministic nonoblivious permutation routing algorithm which runs in O(d.n²) time on a d-ary de Bruijn network with N=dⁿ nodes. This algorithm is currently the fastest known nonoblivious deterministic routing algorithm for de Bruijn networks of arbitrary degree. Finally, we present an efficient general sorting algorithm for the de Bruijn networks of arbitrary degree. This algorithm is the best sorting algorithm known so far. It runs in O((log d).d.n²) time for directed de Bruijn network with dⁿ nodes, degree d, and diameter n. As a corollary, we show that on a binary de Bruijn network of Nnodes, our sorting scheme requires at most 2 log² Nsteps 相似文献

9.

Parallel parsing algorithms for static dictionary compression

Nagumo H. Mi Lu Watson K.L. 《Parallel and Distributed Systems, IEEE Transactions on》1999,10(12):1241-1251

The data compression based on dictionary techniques works by replacing phrases in the input string with indexes into some dictionary. The dictionary can be static or dynamic. In static dictionary compression, the dictionary contains a predetermined fixed set of entries. In dynamic dictionary compression, the dictionary changes its entries during compression. We present parallel algorithms for two parsing strategies for static dictionary compression. One is the optimal parsing strategy with dictionaries that have the prefix properly, for which our algorithm requires O(L+log n) time and O(n) processors, where n is the number of symbols in the input string, and L is the maximum length of the dictionary entries, while previous results run in O(L+log n) time using O(n²) processors or in O(L+log² n) time using O(n) processors. The other is the longest fragment first (LFF) parsing strategy, for which our algorithm requires O(L+log n,) time and O(n log L) processors, while a previous result obtained an O(L log n) time performance on O(n/log n) processors. For both strategies, we derive our parallel algorithms by modifying the on-line algorithms using a pointer doubling technique 相似文献

10.

基于二进制寻路法和多Omega网络的自路由无阻塞多级网

张联顾乃杰刘刚《计算机应用》2005,25(12):2923-2924

提出了一种可以无阻塞地传输其输入与输出间任意多播信号的新型自路由无阻塞多级网。该网络采用了循环重建法,以二进制扩散概念为基础。它由一个二进制扩散网络和两个二分之一大小的多播路由网络循环构建而成。多播信号由第一个Omega网复制并二分扩散到输出端口,进入N×N的Omega×Omega-1网络,再进入紧随其后的N/2×N/2的Omega×Omega-1网络……。每个Omega×Omega-1网络负责依照目的地址的有效标志位将输入置换到输出的上半部分和下半部分,再分别进入上下两个子Omega×Omega-1网络中做同样的处理,如此类推,直到全部地址有效位处理完毕,从而完成自路由无阻塞的多播传输。由于各大小不等的Omega×Omega-1网络皆可并行设置和并行路由,故此种新型多Omega网络的设置时间为O(NlogN),路由时间为O(log2N),硬件代价则为O(Nlog2N)。它比现行已知的多播网络设计具有较优的代价。 相似文献

11.

耐故障Clos网及其路由算法

下载免费PDF全文

段新明杨愚鲁《计算机工程》2007,33(16):20-22

提出了一种新的耐故障Clos网,通过在基础Clos网各段中增加冗余的交换单元,使其能够在发生少量故障的情况下正常工作,从而提供更可靠的服务。针对耐故障Clos网,给出一种耐故障Clos路由算法,该算法采用最小分布优先的策略逐列计算Clos网连接说明矩阵,通过重排完全实现无阻塞路由,该算法的时间复杂度在最坏情况下仅为O(N3/2)。该耐故障Clos网及其算法设计可以用于实现更为可靠的Clos网络。相似文献

12.

PRAM和LARPBS模型上的近似串匹配并行算法 总被引：15，自引：1，他引：15

钟诚陈国良《软件学报》2004,15(2):159-169

近似串匹配技术在网络信息搜索、数字图书馆、模式识别、文本挖掘、IP路由查找、网络入侵检测、生物信息学、音乐研究计算等领域具有广泛的应用.基于CREW-PRAM(parallel random access machine with concurrent read and exclusive write)模型,采用波前式并行推进的方法直接计算编辑距离矩阵D,设计了一个允许k-差别的近似串匹配动态规划并行算法,该算法使用(m+1)个处理器,时间复杂度为O(n),算法理论上达到线性加速;采取水平和斜向双并行计算编辑距离矩阵D的方法,设计了一个使用((m+1)个处理器和O(n/(+m)时间的、可伸缩的、允许k-差别的近似串匹配动态规划并行算法,.基于分治策略,通过灵活拆分总线和合并子总线动态重构光总线系统,并充分利用光总线的消息播送技术和并行计算前缀和的方法,实现了汉明距离的并行计算,设计了两个基于LARPBS(linear arrays with reconfigurable pipelined bus system)模型的通信高效、可扩放的允许k-误配的近似串匹配并行算法,其中一个算法使用n个处理器,时间为O(m);另一个为常数时间算法,使用mn个处理器. 相似文献

13.

O(n) routing in rearrangeable networks

《Journal of Systems Architecture》2000,46(6):529-542

In (2n−1)-stage rearrangeable networks, the routing time for any arbitrary permutation is Ω(n²) compared to its propagation delay O(n) only. Here, we attempt to identify the sets of permutations, which are routable in O(n) time in these networks. We define four classes of self-routable permutations for Benes network. An O(n) algorithm is presented here, that identifies if any permutation P belongs to one of the proposed self-routable classes, and if yes, it also generates the necessary control vectors for routing P. Therefore, the identification, as well as the switch setting, both problems are resolved in O(n) time by this algorithm. It covers all the permutations that are self-routable by anyone of the proposed techniques. Some interesting relationships are also explored among these four classes of permutations, by applying the concept of ‘group-transformations’ [N. Das, B.B. Bhattacharya, J. Dattagupta, Hierarchical classification of permutation classes in multistage interconnection networks, IEEE Trans. Comput. (1993) 665–677] on these permutations. The concepts developed here for Benes network, can easily be extended to a class of (2n−1)-stage networks, which are topologically equivalent to Benes network. As a result, the set of permutations routable in a (2n−1)-stage rearrangeable network, in a time comparable to its propagation delay has been extended to a large extent. 相似文献

14.

Fast sorting algorithms on a linear array with a reconfigurablepipelined bus system

Datta A. Soundaralakshmi S. Owens R. 《Parallel and Distributed Systems, IEEE Transactions on》2002,13(3):212-222

We present two fast algorithms for sorting on a linear array with a reconfigurable pipelined bus system (LARPBS), one of the recently proposed parallel architectures based on optical buses. In our first algorithm, we sort N numbers in O(log N log log N) worst-case time using N processors. In our second algorithm, we sort N numbers in O((log log N)²) worst-case time using N^1+ε processors, for any fixed ε such that 0 < ε < 1. Our algorithms are based on a novel deterministic sampling scheme for merging two sorted arrays of length N each in O(log log N) time on an LARPBS with N processors. To our knowledge, the previous best sorting algorithm on this architecture has a running time of O((log N)²) using N processors 相似文献

15.

Performing permutations on interconnection networks by regularlychanging switch states

Wei Kuang Lai 《Parallel and Distributed Systems, IEEE Transactions on》2000,11(8):829-837

In this paper, we present an algorithm for performing permutations of messages on multistage interconnection networks. Permutations of messages are needed in many parallel algorithms. The proposed algorithm is feasible for any networks that can connect each input to each output using a set of N nonblocking connections, where N is the number of ports on the network. Messages are segmented into N submessages that are sent independently in each time step. For any permutation, the settings of switches are changed with fixed patterns. Partitioning of the network into independent subnetworks is also supported, each capable of simultaneously routing a different permutation 相似文献

16.

Adaptive binary sorting schemes and associated interconnectionnetworks

Chien M.V. Yavuz Oruc A. 《Parallel and Distributed Systems, IEEE Transactions on》1994,5(6):561-572

Many routing problems in parallel processing, such as concentration and permutation problems, can be cast as sorting problems. In this paper, we consider the problem of sorting on a new model, called an adaptive sorting network. We show that any sequence of n bits can be sorted on this model in O(lg² n) bit-level delay using O(n) constant fanin gates. This improves the cost complexity of K.E. Batcher's binary sorters (1968) by a factor of O(lg² n) while matching their sorting time. The only other network that can sort binary sequences in O(n) cost is the network version of columnsort algorithm, but this requires excessive pipelining. In addition, using binary sorters, we construct permutation networks with O(n lg n) bit-level cost and O(lg³ n) bit-level delay. These results provide the asymptotically least-cost practical concentrators and permutation networks to date. We note, of course, that the well-known AKS sorting network has O(lg n) sorting time and O(n lg n) cost, but the constants hidden in these complexities are so large that our complexities outperform those of the AKS sorting network until n becomes extremely large 相似文献

17.

Summation and routing on a partitioned optical passive stars network with large group size

Amitava Datta Subbiah Soundaralakshmi 《Parallel and Distributed Systems, IEEE Transactions on》2003,14(12):1275-1285

In a partitioned optical passive stars (POPS) network, n=dg processors are divided into g groups of d processors each, and such a POPS network is denoted by POPS(d,g). There is an optical passive star (OPS) coupler between every pair of groups. Hence, a POPS(d,g) requires g/sup 2/ couplers. It is likely that, in a practical system, the number of couplers will be less than the number of processors, i.e., d>/spl radic/n>g and the number of groups will be smaller than the number of processors in a group. Hence, it is important to design fast algorithms for basic operations on such POPS networks with large group size. We present fast algorithms for data sum, prefix sum, and permutation routing on a POPS(d,g) such that d>/spl radic/n>g. Our data sum and prefix sum algorithms improve upon the best known algorithms for these problems designed by Sahni (2000). Permutation routing can be solved on a POPS network by simulating a hypercube sorting algorithm. Our algorithm for permutation routing is more efficient compared to this simulated hypercube sorting algorithm. 相似文献

18.

Routing permutations on baseline networks with node-disjoint paths

Yang Y. Wang J. 《Parallel and Distributed Systems, IEEE Transactions on》2005,16(8):737-746

Permutation is a frequently-used communication pattern in parallel and distributed computing systems and telecommunication networks. Node-disjoint routing has important applications in guided wave optical interconnects where the optical "crosstalk" between messages passing the same switch should be avoided. In this paper, we consider routing arbitrary permutations on an optical baseline network (or reverse baseline network) with node-disjoint paths. We first prove the equivalence between the set of admissible permutations (or semipermutations) of a baseline network and that of its reverse network based on a step-by-step permutation routing. We then show that an arbitrary permutation can be realized in a baseline network (or a reverse baseline network) with node-disjoint paths in four passes, which beats the existing results [M. Vaez et al., (2000)], [G. Maier et al., (2001)] that a permutation can be realized in an n /spl times/ n banyan network with node-disjoint paths in O(n/sup 1/2/) passes. This represents the currently best-known result for the number of passes required for routing an arbitrary permutation with node-disjoint paths in unique-path multistage networks. Unlike other unique path MINs (such as omega networks or banyan networks), only baseline networks have been found to possess such four-pass routing property. We present routing algorithms in both self-routing style and central-controlled style. Different from the recent work in [Y. Yang et al., (2003)], which also gave a four-pass node-disjoint routing algorithm for permutations, the new algorithm is efficient in transmission time for messages of any length, while the algorithm in [Y. Yang et al., (2003)] can work efficiently only for long messages. Comparisons with previous results demonstrate that routing in a baseline network proposed in this paper could be a better choice for routing permutations due to its lowest hardware cost and near-optimal transmission time. 相似文献

19.

An efficient parallel recognition algorithm forbipartite-permutation graphs

Chang-Wu Yu Gen-Huey Chen 《Parallel and Distributed Systems, IEEE Transactions on》1996,7(1):3-10

We present a parallel recognition algorithm for bipartite-permutation graphs. The algorithm can be executed in O(log n) time on the CRCW PRAM if O(n³/log n) processors are used, or O(log² n) time on the CREW PRAM if O(n³/log²n) processors are used. Chen and Yesha (1993) have presented another CRCW PRAM algorithm that takes O(log²n) time if O(n ³) processors are used. Compared with Chen and Yesha's algorithm, our algorithm requires either less time and fewer processors on the same machine model, or fewer processors on a weaker machine model. Our algorithm can also be applied to determine if two bipartite-permutation graphs are isomorphic 相似文献

20.

A new self-routing multicast network

Yuanyuan Yang Jianchao Wang 《Parallel and Distributed Systems, IEEE Transactions on》1999,10(12):1299-1316

In this paper, we propose a design for a new self-routing multicast network which can realize arbitrary multicast assignments between its inputs and outputs without any blocking. The network design uses a recursive decomposition approach and is based on the binary radix sorting concept. All functional components of the network are reverse banyan networks. Specifically, the new multicast network is recursively constructed by cascading a binary splitting network and two half-size multicast networks. The binary splitting network, in turn, consists of two recursively constructed reverse banyan networks. The first reverse banyan network serves as a scatter network and the second reverse banyan network serves as a quasisorting network. The advantage of this approach is to provide a way to self-route multicast assignments through the network and a possibility to reuse part of network to reduce the network cost. The new multicast network we design is compared favorably with the previously proposed multicast networks. It uses O(n log² n) logic gates, and has O(log² n) depth and O(log² n) routing time where the unit of time is a gate delay. By reusing part of the network, the feedback implementation of the network can further reduce the network cost to O(n log n) 相似文献