首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We give an optimal algorithm that broadcasts on an n-dimensional hypercube in O(n/ log2 (n+1)) routing steps with wormhole, e-cube routing and all-port communication. Previously, the best algorithm of P.K. McKinley and C. Trefftz (1993) requires [n/2] routing steps. We also give routing algorithms that achieve tight time bounds for n ⩽7  相似文献   

2.
A new broadcasting method is presented for hypercubes with wormhole routing mechanism. The communication model assumed allows an n-dimensional hypercube to have at most n concurrent 110 communications along its ports. It further assumes a distance insensitivity of (n+1) with no intermediate reception capability for the nodes along the communication path. The approach is based on determination of the set of nodes (called stations) in the hypercube such that for any node in the network there is a station at distance of at most 1. Once stations are identified, parallel disjoint paths are formed from the source to all stations. The broadcasting is accomplished first by sending the message to all stations which will in turn inform the rest of the nodes of the message. To establish node-disjoint paths between the source node and all stations, we introduce a new routing strategy. We prove that multicasting can be done in one routing step as long as the number of destination nodes are at most n in an n-dimensional hypercube. The number of broadcasting steps using our routing is equal to or smaller than that obtained in an earlier work; this number is optimal for all hypercube dimensions n⩽12, except for n=10  相似文献   

3.
This paper presents efficient algorithms that implement one-to-many, or multicast, communication in wormhole-routed torus networks. By exploiting the properties of the switching technology and the use of virtual channels, a minimum-time multicast algorithm is presented for n-dimensional torus networks that use deterministic, dimension-ordered routing of unicast messages. The algorithm can deliver a multicast message to m-1 destinations in [log2 m] message-passing steps, while avoiding contention among the constituent unicast messages. Performance results of a simulation study on torus networks with up to 4096 nodes are also given  相似文献   

4.
Multicast is one of the most frequently used collective communication operations in multi-core SoC platforms. Bus as the traditional interconnect architecture for SoC development has been highly efficient in delivering multicast messages. Since the bus is non-scalable, it can not address the bandwidth requirements of the large SoCs. The networks on-chip (NoCs) emerged as a scalable alternative to address the increasing communication demands of such systems. However, due to its hop-to-hop communication, the NoCs may not be able to deliver multicast operations as efficiently as buses do. Adopting multi-port routers has been an approach to improve the performance of the multicast operations in interconnection networks. This paper presents a novel analytical model to compute communication latency of the multicast operation in wormhole-routed interconnection networks employing asynchronous multi-port routers scheme. The model is applied to the Quarc NoC and its validity is verified by comparing the model predictions against the results obtained from a discrete-event simulator developed using OMNET++.  相似文献   

5.
In a mesh multicomputer, submeshes are allocated to perform jobs according to processor allocation schemes, with each task assigned to occupy processors of one submesh with an appropriate size. To assign regions for incoming tasks, task compaction is needed to produce a large contiguous free region. The overhead of task compaction relies mainly on designing an efficient task migration scheme. This paper investigates task migration schemes in 2D wormhole-routed mesh multicomputers with an all-port communication model. Two constraints are given between two submeshes for task migration. Two task migration schemes that follow one of the constraints in 2D mesh multicomputers are then developed. In addition, the proposed schemes are proven to be deadlock-free and congestion-free. Finally, performance analysis is adopted to compare the proposed task migration schemes.  相似文献   

6.
In this paper, we propose an efficient multipath multicast routing algorithm in wormhole-routed 2D torus networks. We first introduce a hamiltonian cycle model for exploiting the feature of torus networks. Based on this model, we find a hamiltonian cycle in torus networks. Then, an efficient multipath multicast routing algorithm with hamiltonian cycle model (mulitpath-HCM) is presented. The proposed multipath multicast routing algorithm utilizes communication channels more uniformly in order to reduce the path length of the routing messages, making the multicasting more efficient. Simulation results show that the multicast latency of the proposed multipath-HCM routing algorithm is superior to that of fixed and dual-path routing algorithms.  相似文献   

7.
All-to-all personalized communication, or complete exchange, is at the heart of numerous applications in parallel computing. It is one of the most dense communication patterns. In this paper, we consider this problem in a torus of any dimension with the wormhole-routing capability. We propose complete exchange algorithms that use optimal numbers of phases (if each side of the tori is a multiple of eight) or asymptotically optimal numbers of phases (otherwise). Interestingly, in order to achieve this, we only make weak assumptions-that a node is capable of sending and receiving at most one message at a time, and the network is capable of supporting the dimension-ordered (or e-cube) minimum routing  相似文献   

8.
A new approach to the design of collective communication operations in wormhole-routed mesh networks is described. The approach extends the concept of dominating sets in graph theory by accounting for the relative distance-insensitivity of the wormhole switching strategy and by taking advantage of a multiport communication architecture, which allows each node to simultaneously transmit messages on different outgoing channels. Collective communication operations are defined in terms of sets of extended dominating nodes (EDNs). The nodes in a set of EDNs can deliver (receive) messages to (from) a different, larger set of nodes in a single message-passing step under dimension-ordered wormhole routing and without channel contention among messages. The EDN model can be applied to different collective operations in 2D and 3D mesh networks. The authors focus on EDN-based broadcast and global combine operations. Performance evaluation results are presented that confirm the advantage of this approach over other methods  相似文献   

9.
Most machines of the last generation of distributed memory parallel computers possess specific routers which are used to exchange messages between nonneighboring nodes in the network. Among the several technologies, wormhole routing is usually preferred because it allows low channel-setup time and reduces the dependency between latency and internode distance. However, wormhole routing is very susceptible to deadlock because messages are allowed to hold many resources while requesting others. Therefore, designing deadlock-free routing algorithms using few hardware facilities is a major problem for wormhole-routed networks. In this paper, we describe a general theoretical framework for the study of deadlock-free routing functions. We give a general definition of what can be a routing function. This definition captures many specific definitions of the literature (e.g., vertex dependent, input-dependent, source-dependent, path-dependent etc.). Using our definition, we give a necessary and sufficient condition which characterizes deadlock-free routing functions. Our theory embraces, at a high level, most of the theories related to deadlock avoidance in wormhole-routed networks previously derived in the literature. In particular, it applies not only to one-to-one routing, but also to one-to-many routing. The latter paradigm is used to solve the multicast problem with the path-based or tree-based facility  相似文献   

10.
In this paper, we propose an efficient barrier synchronization scheme on networks with arbitrary topologies. We first present a distributed method in building a barrier routing tree. The barrier messages can be delivered adaptively according to the hierarchy of the established barrier tree to void congestion and faulty nodes in the network. We then propose a new technique, called bandwidth-preempting technique, for a blocked barrier message to preempt a channel occupied by a data message so that the latency of a barrier message can be controlled without affecting much of the overall system performance. We also propose an analytical performance model and present simulation results for the performance evaluation of the proposed scheme. Performance evaluations show that the proposed scheme outperforms the existing algorithms for barrier synchronization  相似文献   

11.
In this paper, we propose an efficient algorithm to find a collision-free time slot schedule in a time division multiple access frame. In order to minimize the system delay, the optimal schedule must be defined as the one that has the minimum frame length and provides the maximum slot utilization. The proposed algorithm is based on the sequential vertex coloring algorithm. Numerical examples and comparisons with the algorithm in previous research have shown that the proposed algorithm can find near-optimal solutions in respect of the system delay.Scope and purposeAn ad-hoc network was introduced in order to apply packet switching communication to a shared radio channel. Using a radio channel as the broadcast medium to interconnect users, an ad-hoc network provides flexible data communication services among a large number of geographically distributed, possibly mobile, radio units. In an ad-hoc network, since all users share a single channel by multiple access protocol, unconstrained transmission may lead to the time overlap of two or more packet receptions, called collision, resulting in damaged useless packets at the destination. Collided packets increase the system delay because they must be retransmitted. Therefore, the transmission for each station must be scheduled to avoid any collision, that is, collision-free transmission should be guaranteed. The time division multiple access (TDMA) technology can be used to schedule collision-free transmission. In this paper, we propose an efficient algorithm to find a collision-free time slot schedule in a TDMA ad-hoc network.  相似文献   

12.
Networks-on-Chip (NoC) emerged to address the technological and design issues related to development of large systems-on-chip (SoCs). Due to diversity of the application's performance requirements, most NoC architectures offer supports for quality of service (QoS). Also, to utilize the available bandwidth efficiently, they might implement mechanisms for delivering collective communication operations. This paper presents an analytical model to predict the average latency of wormhole-routed prioritized broadcast communication in NoCs. The model assumes that the network uses all-port routers scheme and offers differentiated services-based QoS. To verify the analysis, the model predictions are compared against the results obtained from a discrete-event simulator developed using OMNET++.  相似文献   

13.
Multicast communication, in which the same message is delivered from a source node to an arbitrary number of destination nodes, is being increasingly demanded in parallel computing. System supported multicast services can potentially offer improved performance, increased functionality, and simplified programming, and may in turn be used to support various higher-level operations for data movement and global process control. This paper presents efficient algorithms to implement multicast communication in wormhole-routed direct networks, in the absence of hardware multicast support, by exploiting the properties of the switching technology. Minimum-time multicast algorithms are presented for n-dimensional meshes and hypercubes that use deterministic, dimension-ordered routing of unicast messages. Both algorithms can deliver a multicast message to m-1 destinations in [log 2 m] message passing steps, while avoiding contention among the constituent unicast messages. Performance results of implementations on a 64-node nCUBE-2 hypercube and a 168-node Symult 2010 2-D mesh are given  相似文献   

14.
This paper focuses on efficient multicasting in wormhole-routed networks. A trip-based model is proposed to support adaptive, distributed, and deadlock-free multiple multicast on any network with arbitrary topology using at most two virtual channels per physical channel. This model significantly generalizes the path-based model proposed earlier which works only for Hamiltonian networks and cannot be applicable to networks with arbitrary topology resulted due to system faults. Fundamentals of the trip-based model, including the necessary and sufficient condition to be deadlock-free, and the use of appropriate number of virtual channels to avoid deadlock are investigated. The potential of this model is illustrated by applying it to hypercubes with faulty nodes. Simulation results indicate that the proposed model can implement multiple multicast on faulty hypercubes with negligible performance degradation  相似文献   

15.
This paper addresses the problem of fault-tolerant multicast routing in wormholerouted multicomputers.A new pseudo-cycle-based routing method is presented for constructing deadlock-free multicast routing algorithms.With at most two virtual channels this technique can be applied to any connected networks with arbitrary topologies.Simulation results show that this technique results in negligible performance degradation even in the presence of a large number of faulty nodes.  相似文献   

16.
In this paper, we introduce a new approach to deadlock-free routing in wormhole-routed networks called the message flow model. This method may be used to develop deterministic, partially-adaptive, and fully-adaptive routing algorithms for wormhole-routed networks with arbitrary topologies. We first establish the necessary and sufficient condition for deadlock free routing, based on the analysis of the message flow on each channel. We then use the model to develop new adaptive routing algorithms for 2D meshes  相似文献   

17.
Multistage interconnection networks are a popular class of interconnection architecture for constructing scalable parallel computers (SPCs). The focus of this paper is on the multistage network system which supports wormhole routed turnaround routing. Existing machines characterized by such a system model include the IBM SP-1 and SP-2, TMC CM-5, and Meiko CS-2. Efficient collective communication among processor nodes is critical to the performance of SPCs. A system-level multicast service, in which the same message is delivered from a source node to an arbitrary number of destination nodes, is fundamental in supporting collective communication primitives including the application-level broadcast, reduction, and barrier synchronization. This paper addresses how to efficiently implement multicast services in wormhole-routed multistage networks, in the absence of hardware multicast support, by exploiting the properties of the turnaround switching technology. An optimal multicast algorithm is proposed. The results of implementations on a 64-node SP-1 show that the proposed algorithm significantly outperforms the application-level broadcast primitives provided by currently existing collective communication libraries including the public domain MPI  相似文献   

18.
The capability of multidestination wormhole allows a message to be propagated along any valid path in a wormhole-routed network conforming to the underlying base routing scheme. The multicast on the path-based routing model is highly dependent on the spatial locality of destinations participating in multicasting. In this paper, we propose two proximity grouping schemes for efficient multicast in wormhole-routed mesh networks with multidestination capability by exploiting the spatial locality of the destination set. The first grouping scheme, graph-based proximity grouping, is proposed to group the destinations together with locality to construct several disjoint sub-meshes. This is achieved by modeling the proximity grouping problem to graph partitioning problem. The second one, pattern-based proximity grouping, is proposed by the pattern classification schemes to achieve the goal of the proximity grouping. By simulation results, we show the routing performance gains over the traditional Hamiltonian-path routing scheme.  相似文献   

19.
Given a wireless ad hoc network with a specified source node that has to broadcast messages to all other nodes in the network, the minimum energy broadcast (MEB) problem seeks a broadcast scheme for this network with minimum energy consumption. The MEB problem is NP-Hard. This paper describes a hybrid approach to the MEB problem combining a genetic algorithm with a local search heuristic. We have compared our hybrid approach against the best heuristic approaches known for this problem. Our approach outperformed all these approaches and emerged as the best.  相似文献   

20.
《Parallel Computing》1997,23(13):1909-1936
This paper presents a global reduction algorithm for wormhole-routed 2D meshes. Well-known reduction algorithms that are optimized for short vectors have complexity O(M log N), where N = n × n is the number of nodes, and M the vector length. Algorithms suitable for long vectors have complexity O(√N + M). Previously known asymptotically optimal algorithms with complexity O(log N + M) incur inherent network contention among constituent messages. The proposed algorithm adapts to the given vector length, resulting in complexities O(M log N) for short vectors, O(log N + M) for medium-sized vectors, and O(√N + M) for sufficiently long vectors. The O(√N + M) version is preferred to the O(log N + M) version for long vectors, due to its small coefficient associated with M, the dominating factor for such vectors. The algorithm is contention-free in a synchronous environment. Under asynchronous execution models, depth contention (contention among message-passing steps) may occur. However, simulation studies show that the effect of depth contention on the actual performance is negligible.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号