首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
We give an optimal algorithm that broadcasts on an n-dimensional hypercube in O(n/ log2 (n+1)) routing steps with wormhole, e-cube routing and all-port communication. Previously, the best algorithm of P.K. McKinley and C. Trefftz (1993) requires [n/2] routing steps. We also give routing algorithms that achieve tight time bounds for n ⩽7  相似文献   

2.
A new approach to broadcast in wormhole-routed two- and three-dimensional torus networks is proposed. The underlying network is assumed to support only deterministic, dimension-ordered unicast routing. The approach extends the graph theoretical concept of dominating nodes by accounting for the relative distance-insensitivity of the wormhole routing switching strategy. The proposed algorithm also takes advantage of an all-port communication architecture, which allows each node to simultaneously transmit messages on different outgoing channels. The resulting broadcast operation is based on a tree structure that uses multiple levels of extended dominating nodes(EDNs). Performance results are presented that confirm the advantage of this method over other approaches  相似文献   

3.
A unicast-based fault-tolerant multicasting method is proposed for hypercubes, which can still work well when the system contains enough faults. A multicast message may be unable to reach a destination if Hamming distance between the destination and the multicast source is large enough. A multicast message fails if any one of the destinations is unreachable from the source. An effective destination ordering scheme of the destinations is proposed for one-port systems first, it is extended to all-port systems for unicast-based fault-tolerant multicasting. Unreachable destinations from the source based on the local safety information are forwarded to a reachable destination, where the multicast message can be routed reliably. Destination ordering is completed based on Hamming distance. A multiple round p-cube routing scheme is presented for a deadlock-free fault-tolerant routing for each unicast step in hypercubes, where the same virtual channel is used for each round of p-cube routing. Sufficient simulation results are presented by comparing with the previous methods.  相似文献   

4.
Multicast is one of the most frequently used collective communication operations in multi-core SoC platforms. Bus as the traditional interconnect architecture for SoC development has been highly efficient in delivering multicast messages. Since the bus is non-scalable, it can not address the bandwidth requirements of the large SoCs. The networks on-chip (NoCs) emerged as a scalable alternative to address the increasing communication demands of such systems. However, due to its hop-to-hop communication, the NoCs may not be able to deliver multicast operations as efficiently as buses do. Adopting multi-port routers has been an approach to improve the performance of the multicast operations in interconnection networks. This paper presents a novel analytical model to compute communication latency of the multicast operation in wormhole-routed interconnection networks employing asynchronous multi-port routers scheme. The model is applied to the Quarc NoC and its validity is verified by comparing the model predictions against the results obtained from a discrete-event simulator developed using OMNET++.  相似文献   

5.
In a mesh multicomputer, submeshes are allocated to perform jobs according to processor allocation schemes, with each task assigned to occupy processors of one submesh with an appropriate size. To assign regions for incoming tasks, task compaction is needed to produce a large contiguous free region. The overhead of task compaction relies mainly on designing an efficient task migration scheme. This paper investigates task migration schemes in 2D wormhole-routed mesh multicomputers with an all-port communication model. Two constraints are given between two submeshes for task migration. Two task migration schemes that follow one of the constraints in 2D mesh multicomputers are then developed. In addition, the proposed schemes are proven to be deadlock-free and congestion-free. Finally, performance analysis is adopted to compare the proposed task migration schemes.  相似文献   

6.
A distributed routing algorithm for faulty hypercubes is described. This algorithm uses a directed depth-first approach to find a path between the sender and receiver of a message whenever at least one non-faulty path exists. We show that, when an arbitrary number of elements of the hypercube can be faulty, the algorithm always routes messages using fewer than 2N hops, whereN is the number of nodes in the hypercube. This performance is shown to be within a factor of two of the optimal worst-case routing efficiency. Through foult simulations, we show that, even when up to half of the elements in the cube are faulty, complete the analysis, we prove that our algorithm is deadlock-free. Finally, we present two extensions of the algorithm. The first uses local storage to reduce the overhead of the algorithm while the second allows reliable broadcasting in the presence of an arbitrary number of faults.Supported in part by the National Science Foundation under Grant CCR-9010547.Supported in part by the National Science Foundation Instrumentation Grant CDA-8820627.  相似文献   

7.
Networks-on-Chip (NoC) emerged to address the technological and design issues related to development of large systems-on-chip (SoCs). Due to diversity of the application's performance requirements, most NoC architectures offer supports for quality of service (QoS). Also, to utilize the available bandwidth efficiently, they might implement mechanisms for delivering collective communication operations. This paper presents an analytical model to predict the average latency of wormhole-routed prioritized broadcast communication in NoCs. The model assumes that the network uses all-port routers scheme and offers differentiated services-based QoS. To verify the analysis, the model predictions are compared against the results obtained from a discrete-event simulator developed using OMNET++.  相似文献   

8.
All-to-all communication is one of the most dense collective communication patterns and occurs in many important applications in parallel and distributed computing. In this paper, we present a new all-to-all broadcast algorithm in multidimensional all-port mesh and torus networks. We propose a broadcast pattern which ensures a balanced traffic load in all dimensions in the network so that the all-to-all broadcast algorithm can achieve a very tight near-optimal transmission time. The algorithm also takes advantage of overlapping of message switching time and transmission time, and the total communication delay asymptotically matches the lower bound of all-to-all broadcast. Finally, the algorithm is conceptually simple and symmetrical for every message and every node so that it can be easily implemented in hardware and achieves the near-optimum in practice  相似文献   

9.
All-to-all (ATA) reliable broadcast is the problem of reliably distributing information from every node to every other node in point-to-point interconnection networks. A good solution to this problem is essential for clock synchronization, distributed agreement, etc. We propose a novel solution in which the reliable broadcasts from individual nodes are interleaved in such a manner that no two packets contend for the same link at any given time-this type of method is particularly suited for systems which use virtual cut-through or wormhole routing for fast communication between nodes. Our solution, called the IHC Algorithm, can be used on a large class of regular interconnection networks including regular meshes and hypercubes. By adjusting a parameter η referred to as the interleaving distance, we can flexibly decrease the link utilization of the IHC algorithm (for normal traffic) at the expense of an increase in the time required for ATA reliable broadcast. We compare the IHC algorithm to several other possible virtual cut-through solutions and a store-and-forward solution. The IHC algorithm with the minimum value of η is shown to be optimal in minimizing the execution time of ATA reliable broadcast when used in a dedicated mode (with no other network traffic)  相似文献   

10.
A new approach to the design of collective communication operations in wormhole-routed mesh networks is described. The approach extends the concept of dominating sets in graph theory by accounting for the relative distance-insensitivity of the wormhole switching strategy and by taking advantage of a multiport communication architecture, which allows each node to simultaneously transmit messages on different outgoing channels. Collective communication operations are defined in terms of sets of extended dominating nodes (EDNs). The nodes in a set of EDNs can deliver (receive) messages to (from) a different, larger set of nodes in a single message-passing step under dimension-ordered wormhole routing and without channel contention among messages. The EDN model can be applied to different collective operations in 2D and 3D mesh networks. The authors focus on EDN-based broadcast and global combine operations. Performance evaluation results are presented that confirm the advantage of this approach over other methods  相似文献   

11.
Multinode broadcast (MNB) in a hypercube and in a ring network of processors is considered. It is assumed that the lengths of the packets that are broadcast are not fixed, but are distributed according to some probabilistic rule, and the optimal times required to execute the MNB are compared for variable and for fixed packet lengths. For large hypercubes, it is shown, under very general probabilistic assumptions on the packet lengths, that the MNB is completed in essentially the same time as when the packet lengths are fixed. In particular, the MNB is completed by time (1+δ)Ts with probability at least 1-ϵ, for any positive ϵ and δ, where T s is the optimal time required to execute the MNB when the packet lengths are fixed at their mean, provided that the size of the hypercube is large enough. In the case of the ring, it is proved that the average time required to execute a MNB when the packet lengths are exponentially distributed exceeds by a factor of ln n the corresponding time for the case there the packet lengths are fixed at their mean, where n is the number of nodes of the ring  相似文献   

12.
《Parallel Computing》1988,6(2):225-233
Fast Fourier Transforms are a widely-used and powerful tool for the analysis and solution of many problems. They have been used in such diverse areas as medicine, acoustics, image processing, system design and many other fields. By transforming the data the problem may be simpler, more tractable or more efficiently solved and for many applications (e.g. speech processing) the data may be much more easily understood in the transform domain. Therefore fast algorithms for implementing transforms are vital for any powerful computer.This paper describes the implementation of a Fast Fourier Transform on a 64-node INTEL hypercube and shows how the hypercube architecture may be efficiently used. Usually the FFT is only a part of the solution process and the data on the hypercube has to be arranged in a certain manner for the efficient solution of the whole problem. A common way is for the data to be distributed according to a Gray code, so that neighbouring points in the domain are in neighbouring processors. We present a number of results on Gray codes which characterise a certain family of Gray codes, and show that Fast Fourier Transforms on data distributed among processors according to a Gray code can also be efficiently implemented on the hypercube.  相似文献   

13.
Multicast communication, in which the same message is delivered from a source node to an arbitrary number of destination nodes, is being increasingly demanded in parallel computing. System supported multicast services can potentially offer improved performance, increased functionality, and simplified programming, and may in turn be used to support various higher-level operations for data movement and global process control. This paper presents efficient algorithms to implement multicast communication in wormhole-routed direct networks, in the absence of hardware multicast support, by exploiting the properties of the switching technology. Minimum-time multicast algorithms are presented for n-dimensional meshes and hypercubes that use deterministic, dimension-ordered routing of unicast messages. Both algorithms can deliver a multicast message to m-1 destinations in [log 2 m] message passing steps, while avoiding contention among the constituent unicast messages. Performance results of implementations on a 64-node nCUBE-2 hypercube and a 168-node Symult 2010 2-D mesh are given  相似文献   

14.
Most MPC networks use wormhole routing to reduce the effect of path length on communication time. Researchers have exploited this by designing ingenious algorithms to speed collective communication. Many projects have addressed the design of efficient collective communication algorithms for wormhole-routed systems. By exploiting the relative distance-insensitivity of wormhole routing, these new algorithms often differ fundamentally from their store-and-forward counterparts. We examine software and hardware approaches to implementing collective communication operations. Although we emphasize methods in which the underlying architecture is a direct network, such as a hypercube or mesh, as opposed to an indirect switch-based network, several approaches apply to systems of either type. We illustrate several issues arising in this research area and describe the major classes of algorithms proposed to solve these problems  相似文献   

15.
This paper addresses the problem of fault-tolerant multicast routing in wormholerouted multicomputers.A new pseudo-cycle-based routing method is presented for constructing deadlock-free multicast routing algorithms.With at most two virtual channels this technique can be applied to any connected networks with arbitrary topologies.Simulation results show that this technique results in negligible performance degradation even in the presence of a large number of faulty nodes.  相似文献   

16.
This paper presents several recursive heuristic methods for multicasting in all-port dimension-ordered wormhole-routed hypercubes. The methods described are stepwise contention-free and are primarily designed to reduce the number of communication steps required to complete the multicast operation. Comparisons show that the number of steps can be significantly reduced compared to depth contention-free solutions previously described. These methods are also shown to be source-controlled depth contention-free and can be considered a generalization of the broadcast method described previously, which is the most efficient method known  相似文献   

17.
All-to-all personalized communication, or complete exchange, is at the heart of numerous applications in parallel computing. It is one of the most dense communication patterns. In this paper, we consider this problem in a torus of any dimension with the wormhole-routing capability. We propose complete exchange algorithms that use optimal numbers of phases (if each side of the tori is a multiple of eight) or asymptotically optimal numbers of phases (otherwise). Interestingly, in order to achieve this, we only make weak assumptions-that a node is capable of sending and receiving at most one message at a time, and the network is capable of supporting the dimension-ordered (or e-cube) minimum routing  相似文献   

18.
This paper presents efficient algorithms that implement one-to-many, or multicast, communication in wormhole-routed torus networks. By exploiting the properties of the switching technology and the use of virtual channels, a minimum-time multicast algorithm is presented for n-dimensional torus networks that use deterministic, dimension-ordered routing of unicast messages. The algorithm can deliver a multicast message to m-1 destinations in [log2 m] message-passing steps, while avoiding contention among the constituent unicast messages. Performance results of a simulation study on torus networks with up to 4096 nodes are also given  相似文献   

19.
The capability of multidestination wormhole allows a message to be propagated along any valid path in a wormhole-routed network conforming to the underlying base routing scheme. The multicast on the path-based routing model is highly dependent on the spatial locality of destinations participating in multicasting. In this paper, we propose two proximity grouping schemes for efficient multicast in wormhole-routed mesh networks with multidestination capability by exploiting the spatial locality of the destination set. The first grouping scheme, graph-based proximity grouping, is proposed to group the destinations together with locality to construct several disjoint sub-meshes. This is achieved by modeling the proximity grouping problem to graph partitioning problem. The second one, pattern-based proximity grouping, is proposed by the pattern classification schemes to achieve the goal of the proximity grouping. By simulation results, we show the routing performance gains over the traditional Hamiltonian-path routing scheme.  相似文献   

20.
In a mesh multicomputer, performing jobs needs to schedule submeshes according to some processor allocation scheme. In order to assign the incoming jobs to a free submesh, a task compaction scheme is needed to generate a larger contiguous free region. The overhead of compaction depends on the efficiency of the task migration scheme. In this paper, two simple task migration schemes are first proposed in n-dimensional mesh multicomputers with supporting dimension-ordered wormhole routing in one-port communication model. Then, a hybrid scheme which combines advantages of the two schemes is discussed. Finally, we evaluate the performance of all of these proposed approaches.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号