首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Parallel computing on networks of workstations is fast becoming a cost-effective high-performance computing alternative to MPPs. Such a computing environment typically consists of processing nodes interconnected through a switch-based irregular network. Many of the problems that were solved for regular networks have to be solved anew for these systems. One such problem is that of efficient multicast communication. In this paper, we propose two broad categories of schemes for efficient multicasting in such irregular networks: network interface-based (NI-based) and switch-based. The NI-based multicasting schemes use the network interface of intermediate destinations for absorbing and retransmitting messages to other destinations in the multicast tree. In contrast, the switch-based multicasting schemes use hardware support for packet replication at the switches of the network and a concept known as multidestination routing to convey a multicast message from one source to multiple destinations. We first present alternative schemes for efficient multipacket forwarding at the NI and derive an optimal k-binomial multicast tree for multipacket NI-based multicast. We then propose two switch-based multicasting schemes that differ in the power of the encoding scheme and the complexity of the decoding logic at the switches. These multicasting schemes use path-based multidestination worms that can cover all nodes connected to switches along a valid unicast path and tree-based multidestination worms that can cover entire destination sets in a single phase using one worm, respectively. For each scheme, we describe the associated header encoding and decoding operation, the method for deriving multidestination worms that cover arbitrary multicast destination sets, and the multicasting scheme using the derived multidestination worms  相似文献   

2.
Multidestination message passing has been proposed as an attractive mechanism for efficiently implementing multicast and other collective operations on direct networks. However, applying this mechanism to switch-based parallel systems is nontrivial. In this paper, we propose alternative switch architectures with differing buffer organizations to implement multidestination worms on switch-based parallel systems. First, we discuss issues related to such implementation (deadlock-freedom, replication mechanisms, header encoding, and routing). Next, we demonstrate how an existing central-buffer-based switch architecture supporting unicast message passing can be enhanced to accommodate multidestination message passing. Similarly, implementing multidestination worms on an input-buffer-based switch architecture is discussed, and two architectural alternatives are presented that reduce the wiring complexity in a practical switch implementation. The central-buffer-based and input-buffer-based implementations are evaluated against each other, as well as against the corresponding software-based schemes. Simulation experiments under a range of traffic (multiple multicast, bimodal, varying degree of multicast, and message length) and system size are used for evaluation. The study demonstrates the superiority of the central-buffer-based switch architecture. It also indicates that under bimodal traffic the central-buffer-based hardware multicast implementation affects background unicast traffic less adversely compared to a software-based multicast implementation. These results show that multidestination message passing can be applied easily and effectively to switch-based parallel systems to deliver good multicast and collective communication performance  相似文献   

3.
This paper presents a new approach to minimize node contention while performing multiple multicast/broadcast on wormhole k-ary n-cube networks with overlapped destination sets. The existing multicast algorithms in the literature deliver poor performance under multiple multicast because these algorithms have been designed with only single multicast in mind. The new algorithms introduced in this paper do not use any global knowledge about the respective destination sets of the concurrent multicasts. Instead, only local information and a source-specific partitioning approach are used. For systems supporting unicast message-passing, a new SPUmesh (Source-Partitioned Umesh) algorithm is proposed and is shown to be superior than the conventional Umesh algorithm for multiple multicast. Two different algorithms, SQHL (Source-Quadrant Hierarchical Leader) and SCHL (Source-Centered Hierarchical Leader), are proposed for systems with multidestination message-passing and shown to be superior than the HL scheme. All of these algorithms perform 1) 5-10 times faster than the existing algorithms under multiple multicast and 2) as fast as existing algorithms under single multicast. Furthermore, the SCHL scheme demonstrates that the latency of multiple multicast can, in fact, be reduced as the degree of multicast increases beyond a certain number. Thus, these algorithms demonstrate significant potential to be used for designing fast and scalable collective communication libraries on current and future generation wormhole systems  相似文献   

4.
This paper proposes multidestination message passing on wormhole k-ary n-cube networks using a new base-routing-conformed-path (BRCP) model. This model allows both unicast (single-destination) and multidestination messages to co-exist in a given network without leading to deadlock. The model is illustrated with several common routing schemes (deterministic, as well as adaptive), and the associated deadlock-freedom properties are analyzed. Using this model, a set of new algorithms for popular collective communication operations, broadcast and multicast, are proposed and evaluated. It is shown that the proposed algorithms can considerably reduce the latency of these operations compared to the Umesh (unicast-based multicast) and the Hamiltonian path-based schemes. A very interesting result that is presented shows that a multicast can be implemented with reduced or near-constant latency as the number of processors participating in the multicast increases beyond a certain number. It is also shown that the BRCP model can take advantage of adaptivity in routing schemes to further reduce the latency of these operations. The multidestination mechanism and the BRCP model establish a new foundation to provide fast and scalable collective communication support on wormhole-routed systems  相似文献   

5.
1IntroductionMulticastcommunication,whichreferstothedeliveryofamessagefromasinglesourcenodetoanumberofdestinationnodes,isfrequentlyusedindistributed-memoryparallelcomputersystemsandnetworks[1].Efficientimplementationofmulticastcommunicationiscriticaltotheperformanceofmessage-basedscalableparallelcomputersandswitch-basedhighspeednetworks.Switch-basednetworksorindirectnetworks,basedonsomevariationsofmultistageiDterconnectionnetworks(MINs),haveemergedasapromisingnetworkajrchitectureforconstruct…  相似文献   

6.
This paper presents efficient algorithms that implement one-to-many, or multicast, communication in wormhole-routed torus networks. By exploiting the properties of the switching technology and the use of virtual channels, a minimum-time multicast algorithm is presented for n-dimensional torus networks that use deterministic, dimension-ordered routing of unicast messages. The algorithm can deliver a multicast message to m-1 destinations in [log2 m] message-passing steps, while avoiding contention among the constituent unicast messages. Performance results of a simulation study on torus networks with up to 4096 nodes are also given  相似文献   

7.
This paper presents an efficient routing and flow control mechanism to implement multidestination message passing in wormhole networks. The mechanism is a variation of tree-based multicast with pruning to recover from deadlocks and it is well suited for distributed shared-memory multiprocessors (DSMs) with hardware cache coherence. It does not require any preprocessing of multicast messages reducing notably the software overhead required to send a multicast message. Also, it allows messages to use any deadlock-free routing function. The new scheme has been evaluated by simulation using synthetic loads. It achieves multicast latency reductions of 30% on average. Also it was compared with other multicast mechanisms proving its benefits. Finally, it can be easily implemented in hardware with minimal changes to existing unicast wormhole routers.  相似文献   

8.
The capability of multidestination wormhole allows a message to be propagated along any valid path in a wormhole-routed network conforming to the underlying base routing scheme. The multicast on the path-based routing model is highly dependent on the spatial locality of destinations participating in multicasting. In this paper, we propose two proximity grouping schemes for efficient multicast in wormhole-routed mesh networks with multidestination capability by exploiting the spatial locality of the destination set. The first grouping scheme, graph-based proximity grouping, is proposed to group the destinations together with locality to construct several disjoint sub-meshes. This is achieved by modeling the proximity grouping problem to graph partitioning problem. The second one, pattern-based proximity grouping, is proposed by the pattern classification schemes to achieve the goal of the proximity grouping. By simulation results, we show the routing performance gains over the traditional Hamiltonian-path routing scheme.  相似文献   

9.
Multicast communication, in which the same message is delivered from a source node to an arbitrary number of destination nodes, is being increasingly demanded in parallel computing. System supported multicast services can potentially offer improved performance, increased functionality, and simplified programming, and may in turn be used to support various higher-level operations for data movement and global process control. This paper presents efficient algorithms to implement multicast communication in wormhole-routed direct networks, in the absence of hardware multicast support, by exploiting the properties of the switching technology. Minimum-time multicast algorithms are presented for n-dimensional meshes and hypercubes that use deterministic, dimension-ordered routing of unicast messages. Both algorithms can deliver a multicast message to m-1 destinations in [log 2 m] message passing steps, while avoiding contention among the constituent unicast messages. Performance results of implementations on a 64-node nCUBE-2 hypercube and a 168-node Symult 2010 2-D mesh are given  相似文献   

10.
基于IP组播的应用问题探讨   总被引:2,自引:2,他引:2  
组播通信是近年来得到迅速推广的一种通信技术,它能弥补单播通信和广播通信的不足。分析了3种组播类型的特点及相关应用,讨论了组播服务对带宽和延时的要求,最后分析了组播的会话管理、可靠数据传输以及异类接收者支持问题,并提出了共享式学习和本地恢复两种解决异类接收者支持问题的方法。  相似文献   

11.
This paper develops an efficient and scalable multicast scheme for high-quality multimedia distribution. The traditional IP multicast, a pure network-layer solution, is bandwidth efficient in data delivery but not scalable in managing the multicast tree. The more recent overlay multicast establishes the data-dissemination structure at the application layer; however, it induces redundant traffic at the network layer. We propose an application-oriented multicast (AOM) protocol, which exploits the application-network cross-layer design. With AOM, each packet carries explicit destinations information, instead of an implicit group address, to facilitate the multicast data delivery; each router leverages the unicast IP routing table to determine necessary multicast copies and next-hop interfaces. In our design, all the multicast membership and addressing information traversing the network is encoded with bloom filters for low storage and bandwidth overhead. We theoretically prove that the AOM service model is loop-free and incurs no redundant traffic. The false positive performance of the bloom filter implementation is also analyzed. Moreover, we show that the AOM protocol is a generic design, applicable for both intra-domain and inter-domain scenarios with either symmetric or asymmetric routing.  相似文献   

12.
A unicast-based fault-tolerant multicasting method is proposed for hypercubes, which can still work well when the system contains enough faults. A multicast message may be unable to reach a destination if Hamming distance between the destination and the multicast source is large enough. A multicast message fails if any one of the destinations is unreachable from the source. An effective destination ordering scheme of the destinations is proposed for one-port systems first, it is extended to all-port systems for unicast-based fault-tolerant multicasting. Unreachable destinations from the source based on the local safety information are forwarded to a reachable destination, where the multicast message can be routed reliably. Destination ordering is completed based on Hamming distance. A multiple round p-cube routing scheme is presented for a deadlock-free fault-tolerant routing for each unicast step in hypercubes, where the same virtual channel is used for each round of p-cube routing. Sufficient simulation results are presented by comparing with the previous methods.  相似文献   

13.
不断增长的诸如电信会议和视频点播等多媒体应用,需要Internet有效地提供高性能的组播支持,为此,人们设计了重叠网络,用以支持不断增长的组播应用.在重叠网络中,信息包的复制过程是由专门的组播服务节点 (Multicast Service Node,MSN) 来完成的.因此,急需对MSN的基本体系结构,特别是MSN的排队和调度方案进行设计,从而使其提供有效的组播支持.为MSN设计了一个具有高性能的排队方案,它能够同时支持单播和组播,而且能以合理的代价实现高链路效率.方法是使用动态按需依向量排队(per-vector queuing on-demand).仿真研究表明,与其它已有的排队方案相比,采用该排队方案的交换机能实现更大的网络吞吐量,具有更高的链路效率和更少的网络延时.  相似文献   

14.
Network-on-Chip (NoC) devices have been widely used in multiprocessor systems. In recent years, NoC-based Deep Neural Network (DNN) accelerators have been proposed to connect neural computing devices using NoCs. Such designs dramatically reduce off-chip memory accesses of these platforms. However, the large number of one-to-many packet transfers significantly degrade performance with traditional unicast channels. We propose a multicast mechanism for a NoC-based DNN accelerator called Multicast Mechanism for NoC-based Neural Network accelerator (MMNNN). To do so, we propose a tree-based multicast routing algorithm with excellent scalability and the ability to minimize the number of packets in the network. We also propose a router architecture for single-flit packets. Our proposed router transfers flits to multiple destinations in a single process and has no head-of-line blocking issue, offering higher throughput and lower latency than traditional wormhole router architectures. Simulation results show that our proposed multicast mechanism offers excellent performance in classification latency, average packet latency, and energy consumption.  相似文献   

15.
Multicast is an important collective communication in scalable parallel computers. One efficient scheme to perform multicast is multidestination messaging[8]. In multidestination messaging, destination nodes of a multicast are partitioned into disjoint groups. Nodes in each group are reached with a multidestination message that conforms to the base routing algorithm of the system. A systematic way of partitioning the nodes is critical to the efficiency of multidestination messaging. In this paper we propose a node grouping method, called turn grouping, for partitioning the destination nodes in a multicast. Turn grouping is general in the sense that it supports any base routing algorithm derivable from the turn model [5]. Given such a base routing algorithm and the corresponding prohibited turns, turn grouping can systematically produce a proper schedule for multicasting the message. We evaluated the performance of turn grouping using three typical turn model-based routing algorithms. The simulation results show that our approach performs better than the Umesh [12] and the Hamiltonian-path [8] algorithms.  相似文献   

16.
This paper addresses the problem of one-to-many, or multicast, communication in wormhole-routed,n-dimensional torus networks. The proposed methods are designed for systems that support intermediate reception, which permits multidestination messages to be pipelined through several nodes, depositing a copy at each node. A key issue in the design of such systems is the routing function, which must support both unicast and multicast traffic while preventing deadlock among messages. An efficient, deadlock-free routing function is developed and used as a basis for a family of multicast algorithms. TheS-torusmulticast algorithm uses a single multidestination message to perform an arbitrary multicast operation. TheM-torusalgorithm is a generalized multiphase multicast algorithm, in which a combination of multidestination messages is used to perform a multicast in one or more communication steps. Two specific instances of the M-torus algorithm, theMd-torusandMu-torusmulticast algorithms, are presented. These algorithms produce contention-free multicast operations and are deadlock-free under all combinations of network traffic. A simulation study compares the performance of the different multicast algorithms, and implementation issues are discussed. The results of this research are applicable to the design of architectures for both wormhole-routed massively parallel computers and high-speed local area networks with wormhole-routed switch fabrics.  相似文献   

17.
Large-scale distributed shared-memory multiprocessors (DSMs) provide a shared address space by physically distributing the memory among different processors. A fundamental DSM communication problem that significantly affects scalability is an increase in remote memory latency as the number of system nodes increases. Remote memory latency, caused by accessing a memory location in a processor other than the one originating the request, includes both communication latency and remote memory access latency over I/O and memory buses. The proposed architecture reduces remote memory access latency by increasing connectivity and maximizing channel availability for remote communication. It also provides efficient and fast unicast, multicast, and broadcast capabilities, using a combination of aggressively designed multiplexing techniques. Simulations show that this architecture provides excellent interconnect support for a highly scalable, high-bandwidth, low-latency network.  相似文献   

18.
In this paper, a new class of optical multistage interconnection network (MIN) architecture is presented, which is constructed utilizing a modularization approach rather than the traditional recursive or fixed exchange pattern methods. The modified architecture consists of an input module, an output module, two point-to-point (PTP) modules, and one modified multicast/broadcast (M/B) module(s). We also implement the multicast/broadcast module with WDM technique, which reduces the hardware cost required for multicast and the re-computation cost for a new connection. We show that it has the best application flexibility and provides multicast function without imposing significant negative impacts on the whole network. A new multicast connection pattern is also proposed in this paper, which makes it practical and economical to apply amplification in space-division networks. Compared with existing multicast architectures, this new architecture with Dilated Benes PTP modules has better performance in terms of system SNR, the number of switch elements, and system attenuation in point-to-point connections. Moreover, the multicast/broadcast module adopts wavelength division multiplexing (WDM) technique to increase its multicast/broadcast assignment. As a result, given m available distinguished wavelengths, one M/B module can support at most m M/B requests at the same time. The new proposed M/B module with WDM is more practical and economical to apply amplification in space-division networks.  相似文献   

19.
《Performance Evaluation》2006,63(4-5):395-422
We develop methods for quantifying the gain of using multicast or a combination of broadcast and unicast for transmitting popular content in telecommunication networks. The gain is evaluated in a single link, such as the radio interface of a cellular network. Two approaches to define such a gain are presented: one based on the average link occupancy, and another based on the blocking probability. The developed methods enable determining how popular the most popular contents should be in order to justify the use of multicast or a combination of broadcast and unicast from a performance point of view.  相似文献   

20.
The paper discusses a multicast mechanism using propagation trees. It guarantees the total ordering (including causal ordering) of messages in multiple groups. The mechanism introduces a concept of meta-groups (a subset of a multicast group) and organizes meta-groups into propagation trees. Compared with the existing propagation tree mechanisms, this mechanism has the following advantages: 1) Greater parallelism. Messages can be sent to destinations by using broadcast networks. 2) Less message cost and less latency time. It takes less network communication to multicast a message and less time to have the message delivered to all the destinations. 3) More flexibility to dynamic membership changes and higher reliability for message propagation. It does not need to restructure propagation trees when there is a change in membership, and a site failure does not stop the message propagation to its descendants in the tree  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号