期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

《Journal of Systems Architecture》2000,46(6):529-542

In (2n−1)-stage rearrangeable networks, the routing time for any arbitrary permutation is Ω(n²) compared to its propagation delay O(n) only. Here, we attempt to identify the sets of permutations, which are routable in O(n) time in these networks. We define four classes of self-routable permutations for Benes network. An O(n) algorithm is presented here, that identifies if any permutation P belongs to one of the proposed self-routable classes, and if yes, it also generates the necessary control vectors for routing P. Therefore, the identification, as well as the switch setting, both problems are resolved in O(n) time by this algorithm. It covers all the permutations that are self-routable by anyone of the proposed techniques. Some interesting relationships are also explored among these four classes of permutations, by applying the concept of ‘group-transformations’ [N. Das, B.B. Bhattacharya, J. Dattagupta, Hierarchical classification of permutation classes in multistage interconnection networks, IEEE Trans. Comput. (1993) 665–677] on these permutations. The concepts developed here for Benes network, can easily be extended to a class of (2n−1)-stage networks, which are topologically equivalent to Benes network. As a result, the set of permutations routable in a (2n−1)-stage rearrangeable network, in a time comparable to its propagation delay has been extended to a large extent. 相似文献

2.

The deflection self-routing Delta network: a dynamically fault-tolerant high-radix multistage interconnection network

Jae-Hyun Park 《The Journal of supercomputing》2011,55(3):432-447

High-radix multistage interconnection networks are popular interconnection technologies for parallel supercomputers and cluster computers. In this paper, we presented a new dynamically fault-tolerant high-radix multistage interconnection network using a fully-adaptive self-routing. To devise the fully-adaptive self-routing for recovering the misrouting around link faults in such network, we introduce an abstract algebraic analysis of the topological structure of the high-radix Delta network. The presented interconnection network provides multiple paths by using all the links of all the stages of the network. We also present a mathematical analysis of the reliability of the interconnection network for quantitative comparison against other networks. The MTTF of 64×64 network proposed is 2.2 times greater than that of the cyclic Banyan network. The hardware cost of the proposed network is half that of the cyclic Banyan network and the 2D ring-Banyan network. 相似文献

3.

Randomized self-routing algorithms for Clos networks

Abdou Youssef 《Computers & Electrical Engineering》1993,19(6):419-429

As the VLSI technology makes large crossbar switches affordable, Clos networks have become a feasible option of large interconnection networks. However, to make these networks practical and useful, efficient routing algorithms need to be developed. This paper will develop and study several randomized routing algorithms for Clos networks. The algorithms are based on the idea that if the first column of Clos is set to some configuration somehow, then the resulting network becomes self-routed using the destination addresses. Each of the randomized algorithms sets the first column to a configuration selected by a random process. The algorithms are then self-routed and take no computation time to set the switches. Probabilistic analysis and simulation measurements of the communication delay of permutation routing are conducted. It is shown that the communication delay of any permutation is 3–6 cycles in networks of up to 1024 processors. Although other routing algorithms route arbitrary permutations in one cycle over Clos/Benes networks and 2 cycles over δ networks, these algorithms take prohibitively large times to compute the appropriate switch settings, while our randomized algorithms are self-routed and spend NO time on computing the switch settings. This makes our algorithms superior to any universal nonrandomized routing algorithm for Clos/Benes networks or δ networks. The speed, universality and ease of implementation of our randomized algorithms make Clos networks highly attractive for large parallel computer systems. 相似文献

4.

A three-stage construction for multiconnection networks

Christian Ronse 《Acta Informatica》1983,20(2):197-206

Summary A multiconnection network of size N is a switching network with N inputs and N outputs which realizes multiconnections, i.e., connections between the N inputs and N outputs in such a way that every output is connected to exactly one input, but an input can be connected to an arbitrary number of outputs. That network is complete if it can realize all N ^N multiconnections. This structure generalizes the permutation network. We consider here the design of multiconnection networks by a three-stage Clos network using complete substitution networks as its building cells and we show that the resulting multiconnection network is complete if and only if the cells in the middle stage have size 2. Moreover, we describe the control algorithm for such a network. This leads to the design of cellular multiconnection networks of arbitrary size with a relatively simple control algorithm. 相似文献

5.

Limited width parallel prefix circuits

David A. Carlson Binay Sugla 《The Journal of supercomputing》1990,4(2):107-129

In this paper, we present lower and upper bounds on the size of limited width, bounded and unbounded fan-out parallel prefix circuits. The lower bounds on the sizes of such circuits are a function of the depth, width, and number of inputs. The size requirement of an N input bounded fan-out parallel prefix circuit having limited width W and extra depth k (the difference between allowed and minimum possible depth) is shown to be (N log₂ W/2^k + N) for k log₂ W. This implies that insisting on minimum depth causes the circuit size to be nonlinear, while as little as log₂log₂ W of extra depth can possibly reduce the size to linear. Also, we show that there is a clear difference between the two cases of bounded and unbounded fan-out by proving the size of a limited width, unbounded fan-out parallel prefix circuit lies between a lower bound of ((2 + 2^1–k/3)N) and an upper bound of O((2 + 2^1–k)N).Uniform, systolic constructions of limited width parallel prefix circuits are provided here and shown to be asymptotically optimal. By associating the width of the circuit with the number of processors and the fan-out capabilities of the circuit with the interconnection structure of a multiprocessor, time- and processor-efficient algorithms may be developed. 相似文献

6.

Metacube—a versatile family of interconnection networks for extremely large-scale supercomputers

Yamin Li Shietung Peng Wanming Chu 《The Journal of supercomputing》2010,53(2):329-351

In the next decade, the high-performance supercomputers will consist of several millions of CPUs. The interconnection networks in such supercomputers play an important role for achieving high performance. In this paper, we introduce the Metacube (MC), a versatile family of interconnection network that can connect an extremely large number of nodes with a small number of links per node and keep the diameter rather low. An MC network has a 2-level hypercube structure. An MC(k,m) network can connect 2^{2^km+k}2^{2^{k}m+k} nodes with m+k links per node, where k is the dimension of the high-level hypercubes (classes) and m is the dimension of the low-level hypercubes (clusters). An MC is a symmetric network with short diameter, easy and efficient routing and broadcasting similar to that of the hypercube. However, the MC network can connect millions of nodes with up to 6 links per node. An MC(2,3) with 5 links per node has 16,384 nodes and an MC(3,3) with 6 links per node has 134,217,728 nodes. We describe the MC network’s structure, topological properties, routing and broadcasting algorithms, and the Hamiltonian cycle embedding in the Metacube networks. 相似文献

7.

The Cayley network of expanded Pancake graphs

《国际计算机数学杂志》2012,89(11):1371-1378

Signed permutation group has important applications in genome rearrangement as well as the construction of networks. In this paper, we propose a new interconnection network named extended Pancake graph, we investigate its topological properties, and give a routing algorithm with the diameter upper bounded by 2n?1. Some embedding properties are also derived. In conclusion, we present a comparison of some familiar networks with the Cayley graph EP_n. 相似文献

8.

A Study of Permutation Networks: New Designs and Some Generalizations

《Journal of Parallel and Distributed Computing》1994,22(2):359-366

Permutation networks have been used in the literature to model interprocessor and processor-memory interconnections in parallel computers. This paper introduces new permutation network designs and generalizes the notion of a permutation network to provide a more flexible model of such interconnections. The new designs are based concentrators and superconcentrators, and for n inputs they can be optimized to obtain self-routing permutation networks with O(n lg n) cost, O(lg n) depth, and O(lg²n) routing time. The main feature of these new network designs is that they do not require complex routing schemes such as Clos networks since they are inherently self-routing. Generalizations of these designs are also given to obtain permutation networks in which the numbers of inputs and outputs may be different, and/or the maximum number of parallel routes between inputs and outputs can be less than the number of inputs or outputs, or both. For n inputs, αn outputs, and O(n^ϵ) parallel routes, where 0 < α ≤ 1, 0 < ϵ < 1, these generalized designs can be optimized to have permutation networks with O(n) cost, O(lg n), depth, and O(lg²n) routing time. It is shown that the previously known designs, such as Clos networks, result in inferior realizations when compared with these new designs. 相似文献

9.

Optimization of a subclass of conjunctive queries

Joachim Biskup Pratul Dublish Yehoshua Sagiv 《Acta Informatica》1995,32(1):1-26

The optimization problem for a subclass of conjunctive queries which is formed by the union of the class of fan-out free queries and a subclass of typed fan-out queries is investigated. The typed fan-out queries in this class are obtained from simple tableaux by allowing atmost one attribute to violate the simple-tablau property. The optimization problem for several restricted subsets of typed fan-out queries is already known to be NP-hard. It is shown that the queries under consideration possess several useful properties which are then used to obtain an O(n ²) optimization algorithm based on the implication graph technique. The optimization of typed fan-out queries, obtained from simple tableaux by allowing atmost two attributes to violate the simple tableau property, is shown to be NP-hard. The optimization of simple tableaux in the presence of functional dependencies is also investigated and is shown to be NP-hard.Supported by DFG grant Bi 311/1-2; present address: School of Computer & Systems Sciences, Jawaharlal Nehru University, New Delhi 110067, India.Supported by NSF grant IRI-87-22886, AFOSR grant 88-0266 and a grant from IBM Corp. 相似文献

10.

The design of complete encryption networks using cryptographically equivalent permutations

F. Ayoub 《Computers & Security》1983,2(3):261-267

A design procedure for complete substitution-permutation encryption networks is presented. The cryptographically important property of completeness is achieved after three iterations, the minimum possible number for all networks of size N = i · n² (i = 1,2,…,n), where n is the size of the substitution function used. The permutation stage is constructed by choosing a single member of a class of cryptographically equivalent permutations for all the network rounds, hence having the advantage of simplifying the network implementation. An algorithm for generating members of the class of cryptographically equivalent permutations is given. 相似文献

11.

On the Computational Power of Max-Min Propagation Neural Networks

Estévez Pablo A. Okabe Yoichi 《Neural Processing Letters》2004,19(1):11-23

We investigate the computational power of max-min propagation (MMP) neural networks, composed of neurons with maximum (Max) or minimum (Min) activation functions, applied over the weighted sums of inputs. The main results presented are that a single-layer MMP network can represent exactly any pseudo-Boolean function F:{0,1}ⁿ [0,1], and that two-layer MMP neural networks are universal approximators. In addition, it is shown that several well-known fuzzy min-max (FMM) neural networks, such as Simpson's FMM, are representable by MMP neural networks. 相似文献

12.

A unified framework for off-line permutation routing in parallel networks

Marc Baumslag Fred Annexstein 《Theory of Computing Systems》1991,24(1):233-251

In this paper we present a general strategy for finding efficient permutation routes in parallel networks. Among the popular parallel networks to which the strategy applies are mesh networks, hypercube networks, hypercube-derivative networks, ring networks, and star networks. The routes produced are generally congestion-free and take a number of routing steps that is within a small constant factor of the diameter of the network. Our basic strategy is derived from an algorithm that finds (in polynomial time) efficient permutation routes for aproduct network, G×H, given efficient permutation routes forG andH. We investigate the use of this algorithm for routingmultiple permutations and extend its applicability to a wide class of graphs, including several families ofCayley graphs. Finally, we show that our approach can be used to find efficient permutation routes among the remaining live nodes infaulty networks.This research was supported in part by a grant from the NSF, Grant No. CCR-88-12567. 相似文献

13.

On the maximal connected component of hypercube with faulty vertices

《国际计算机数学杂志》2012,89(5):515-525

In evaluating an interconnection network, it is indispensable to estimate the size of the maximal connected components of the underlying graph when the network begins to lose processors. Hypercube is one of the most popular interconnection networks. This article addresses the maximal connected components of an n-dimensional cube with faulty processors. We first prove that an n-cube with a set F of at most 2n???3 failing processors has a component of size ≥2ⁿ???|F|???1. We then prove that an n-cube with a set F of at most 3n???6 missing processors has a component of size ≥2ⁿ???|F|???2. 相似文献

14.

Hierarchical Hypercube Networks (HHN) for Massively Parallel Computers

Sang-Kyun Yun Kyu Ho Park 《Journal of Parallel and Distributed Computing》1996,37(2):194

The hypercube is one of the most widely used topologies because it provides small diameter and embedding of various interconnection networks. For very large systems, however, the number of links needed with the hypercube may become prohibitively large. In this paper, we propose a hierarchical interconnection network based on hypercubes called hierarchical hypercube network (HHN) for massively parallel computers. The HHN has a smaller number of links than the comparable hypercube and in particular, when we construct networks with 2^Knodes, the node degree of HHN with the minimum node degree isO([formula]) while that of hypercube isO(K). Regardless of its smaller node degree, many parallel algorithms can be executed in HHN with the same time complexity as in the hypercube. 相似文献

15.

A new approach to model virtual channels in interconnection networks

N. Alzeidi A. Khonsari 《Journal of Computer and System Sciences》2007,73(8):1121-1130

Dealing with virtual channels has always been a critical issue in developing analytical performance models for interconnection networks. Almost all previous studies relied on a method proposed by Dally to capture the effect of virtual channels multiplexing in the performance of interconnection networks. This paper presents a new method to model the effect of virtual channel multiplexing in high-speed wormhole-switched interconnection networks. Dally's method loses its accuracy as the traffic load increases due to blocking nature of wormhole-switched networks. Our new method is based on a finite capacity queue, M/G/1/V and comparing to Dally's method achieves a higher degree of accuracy under low, moderate and high traffic loads. Furthermore, its simplicity eases its employment under different network conditions and setup. The presented model is validated by means of an event driven simulator and a detailed comparison with Dally's method is presented. 相似文献

16.

Hypercube sandwich approach to conferencing

Joanne F. Houlahan Lenore J. Cowen Gerald M. Masson 《The Journal of supercomputing》1996,10(3):271-283

This paper presents a novel cascaded conference network that provides distributed processing and signal transmission among members of disjoint sets of generic send/receive devices called conferees. It assumes an online request model in which idle groups of conferees may request the formation of a conference interconnection. Once a conference is established, all conferees remain connected until the entire conference is dissolved. The Hypercube Sandwich Network (HSN) consists of two components. A bidirectional permutation network is used for routing purposes to and from a hypercube of special processing elements for the purpose of conference formation. The HSN achieves strictly nonblocking performance for N conferees using O(Nlog N) processing elements, and this is shown to be tight to within a log ^1/4 N factor. Previous constructions required a quadratic number of processing elements for strictly nonblocking performance or could only provide wide-sense nonblocking conferencing. If the stronger requirement is made that the communication delay is logarithmic in the conference size, a simple algorithm is presented for wide-sense nonblocking conferencing in an HSN with O(N log N) processing elements.An earlier version of this paper was presented at the 1995 International Conference on Parallel Processing Techniques and Applications. 相似文献

17.

Complexity and Approximations for Multimessage Multicasting

Teofilo F. Gonzalez 《Journal of Parallel and Distributed Computing》1998,55(2):129

We consider multimessage multicasting over thenprocessor complete (or fully connected) static network (MM_C). First we present a linear time algorithm that constructs for every degreedproblem instance a communication schedule with total communication time at mostd², wheredis the maximum number of messages that each processor may send or receive. Then we present degreedproblem instances such that all their communication schedules have total communication time at leastd². We observe that our lower bound applies when the fan-out (maximum number of processors receiving any given message) is huge, and thus the number of processors is also huge. Since this environment is not likely to arise in the near future, we turn our attention to the study of important subproblems that are likely to arise in practice. We show that when each message has fan-outk=1 theMM_Cproblem corresponds to the makespan openshop preemptive scheduling problem which can be solved in polynomial time and show that fork?2 our problem is NP-complete and remains NP-complete even when forwarding is allowed. We present an algorithm to generate a communication schedule with total communication time 2d−1 for any degreedproblem instance with fan-outk=2. Our main result is anO(q·d·e) time algorithm, wheree?nd(the input length), with an approximation bound ofqd+k^1/q(d−1), for any integerqsuch thatk>q?2. Our algorithms are centralized and require all the communication information ahead of time. Applications where all of this information is readily available include iterative algorithms for solving linear equations, and most dynamic programming procedures. The Meiko CS-2 machine and computer systems with processors communicating via dynamic permutation networks whose basic switches can act as data replicators (e.g.,nbynBenes network with 2 by 2 switches that can also act as data replicators) will also benefit from our results at the expense of doubling the number of communication phases. 相似文献

18.

STCSP — structured temporal constraint satisfaction problems

Mira Balaban Tzachi Rosen 《Annals of Mathematics and Artificial Intelligence》1999,25(1-2):35-67

Temporal Constraint Satisfaction Problems (TCSP) is a well-known approach for representing and processing temporal knowledge. Important properties of the knowledge can be inferred by computing the minimal networks of TCSPs. Consistency and feasible values are immediately obtained; computing solutions can be assisted. Yet, in general, computing the minimal network of a disjunctive TCSP is intractable. The minimal network approach requires computation of the full network in order to answer a query. In this paper we characterize TCSPs for which subsets of the minimal network can be computed without having to compute the whole network. The partial computation is enabled by decomposition of the problem into a tree of sub-problems that share at most pairs of time points. Such decompositions are termed sim/2-tree decompositions. For TCSPs that have sim/2-tree decompositions, minimal constraints of input propositions can be computed by independent computations of the minimal networks of the sub-problems at most twice. It is also shown that the sim/2-tree characterization is a minimal set of conditions. The sim/2-tree decomposition extends former results about decomposition of a TCSP into bi-connected components. An algorithm for identifying a sim/2-tree decomposition of a TCSP is provided as well. Finally, the sim/2-tree decomposition is generalized in an inductive manner, which enables components of a decomposition to be further decomposed. For that purpose a model of Structured Temporal Constraint Satisfaction Problems (STCSP⁽ⁿ⁾, 0 ⩽ n), where STCSP⁽⁰⁾ is simply TCSP, STCSP⁽¹⁾ is a set of STCSP⁽⁰⁾s, and in general, STCSP⁽ⁿ⁾ for 1 ⩽ n is a set of STCSP^{(n − 1)}s, is introduced. This revised version was published online in June 2006 with corrections to the Cover Date. 相似文献

19.

Scheduling a Single Server in a Two-machine Flow Shop 总被引：1，自引：0，他引：1

T. C. E. Cheng M. Y. Kovalyov 《Computing》2003,70(2):167-180

We study the problem of scheduling a single server that processes n jobs in a two-machine flow shop environment. A machine dependent setup time is needed whenever the server switches from one machine to the other. The problem with a given job sequence is shown to be reducible to a single machine batching problem. This result enables several cases of the server scheduling problem to be solved in O(n log n) by known algorithms, namely, finding a schedule feasible with respect to a given set of deadlines, minimizing the maximum lateness and, if the job processing times are agreeable, minimizing the total completion time. Minimizing the total weighted completion time is shown to be NP-hard in the strong sense. Two pseudopolynomial dynamic programming algorithms are presented for minimizing the weighted number of late jobs. Minimizing the number of late jobs is proved to be NP-hard even if setup times are equal and there are two distinct due dates. This problem is solved in O(n ³) time when all job processing times on the first machine are equal, and it is solved in O(n ⁴) time when all processing times on the second machine are equal. Received November 20, 2001; revised October 18, 2002 Published online: January 16, 2003 相似文献

20.

Universality of iterated networks

R. F. Chamberlain C. M. Fiduccia 《Theory of Computing Systems》1994,27(5):381-430

The multistage shuffle-exchange network is generalized, by replacing the perfect shuffle with an arbitrary permutation , in order to pass all permutations using aminimal number of stages. The universality of such a network is shown to be equivalent to theprimitivity of a related regular digraph, thus showing that universality is decidable in polynomial time. Extensive computational results are presented that characterizeall nonisomorphic minimal networks with up to 12 inputs. In addition, strong evidence is presented relating the minimal number of network stages needed to achieve universality totwice the index of primitivity of the corresponding digraph. 相似文献