期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Efficient broadcast and multicast on multistage interconnectionnetworks using multiport encoding

Sivaram R. Panda D.K. Stunkel C.B. 《Parallel and Distributed Systems, IEEE Transactions on》1998,9(10):1004-1028

This paper proposes anew approach for implementing fast multicast and broadcast in unidirectional and bidirectional multistage interconnection networks (MINs) with multiport encoded multidestination worms. For a MIN with n stages, such worms use n header flits each. One flit is used for each stage of the network and it indicates the output ports to which a multicast message needs to be replicated. A multiport encoded worm with (d₁, d₂..., d_n, 1⩽d_i⩽k) degrees of replication for the respective stages is capable of covering (d₁×d_x×...×d_n) destinations with a single communication start-up. In this paper, a switch architecture is proposed for implementing multidestination worms without deadlock. Three grouping algorithms of varying complexity are presented to derive the associated multiport encoded worms for a multicast to an arbitrary set of destinations. Using these worms, a multinomial tree-based scheme is proposed to implement the multicast. This scheme significantly reduces broadcast/multicast latency compared to schemes using unicast messages. Simulation studies for both unidirectional and bidirectional MIN systems indicate that improvement in broadcast/multicast latency up to a factor of four is feasible using the new approach. Interestingly, this approach is able to implement multicast with reduced latency as the number of destinations increases beyond a certain number. Compared to implementing unicast messages, this approach requires little additional logic at the switches. Thus, the scheme demonstrates significant potential for implementing efficient collective communication operations on current and future MIN-based systems 相似文献

2.

Asynchronous tree-based multicasting in wormhole-switched MINs

Varavithya V. Mohapatra P. 《Parallel and Distributed Systems, IEEE Transactions on》1999,10(11):1159-1178

Multicast operation is an important operation in multicomputer communication systems and can be used to support several collective communication operations. A significant performance improvement can be achieved by supporting multicast operations at the hardware level. We propose an asynchronous tree-based multicasting (ATBM) technique for multistage interconnection networks (MINs). The deadlock issues in tree-based multicasting in MINs are analyzed first to examine the main causes of deadlocks. An ATBM framework is developed in which deadlocks are prevented by serializing the initiations of tree operations that have a potential to create deadlocks. These tree operations are identified through a grouping algorithm. The ATBM approach is not only simple to implement but also provides good communication performance using minimal overheads in terms of additional hardware requirements and synchronization delay. Using the ATBM framework, algorithms are developed for both unidirectional and bidirectional multistage interconnection networks. The performances of the proposed algorithms are evaluated through simulation experiments. The results indicate that the proposed hardware-based ATBM scheme reduces the communication latency when compared to the software multicasting approach proposed earlier 相似文献

3.

Efficient and Scalable Hardware-Based Multicast in Fat-Tree Networks

Coll Salvador Mora Francisco J. Duato Jose Petrini Fabrizio 《Parallel and Distributed Systems, IEEE Transactions on》2009,20(9):1285-1298

This article presents an efficient and scalable mechanism to overcome the limitations of collective communication in switched interconnection networks in the presence of faults. Considering that current trends in supercomputing are moving toward massively parallel computers, with many thousands of components, reliability becomes a challenge. In such scenario, fat-tree networks that provide hardware support for collective communication suffer from serious performance degradation due to the presence of, even, a single faulty node. This paper describes a new mechanism to provide high-performance collective communication in such situations. The feasibility of the proposed technique is formally demonstrated. We present the design of a new hardware-based routing algorithm for multicast, that is at the base of our proposal. The proposed mechanism is implemented and experimentally evaluated. Our experimental results show that hardware-based multicast trees provide an efficient and scalable solution for collective communication in fat-tree networks, significantly outperforming traditional solutions. 相似文献

4.

Notification and multicast networks for synchronization and coherence

John B. Andrews Carl J. Beckmann David K. Poulsen 《Journal of Parallel and Distributed Computing》1992,15(4)

This paper presents two different multistage interconnection network designs for shared-memory multiprocessors that provide unrestricted multicast and notification capabilities. The networks allow efficient synchronization and communication because they conserve network bandwidth by eliminating polling and by performing multicast to multiple recipient processors, as opposed to broadcast or individual messages per recipient processor. Simulation results show that the use of these networks not only decreases synchronization overhead, but also increases network performance for nonsynchronization traffic. The hardware complexity of these schemes is reasonable, making them practical for real systems. Their use in supporting efficient directory-based update or invalidate cache coherence is also discussed. 相似文献

5.

A Novel Multistage Network Architecture with Multicast and Broadcast Capability

Hao Tian Ajay K. Katangur Jiling Zhong Yi Pan 《The Journal of supercomputing》2006,35(3):277-300

In this paper, a new class of optical multistage interconnection network (MIN) architecture is presented, which is constructed utilizing a modularization approach rather than the traditional recursive or fixed exchange pattern methods. The modified architecture consists of an input module, an output module, two point-to-point (PTP) modules, and one modified multicast/broadcast (M/B) module(s). We also implement the multicast/broadcast module with WDM technique, which reduces the hardware cost required for multicast and the re-computation cost for a new connection. We show that it has the best application flexibility and provides multicast function without imposing significant negative impacts on the whole network. A new multicast connection pattern is also proposed in this paper, which makes it practical and economical to apply amplification in space-division networks. Compared with existing multicast architectures, this new architecture with Dilated Benes PTP modules has better performance in terms of system SNR, the number of switch elements, and system attenuation in point-to-point connections. Moreover, the multicast/broadcast module adopts wavelength division multiplexing (WDM) technique to increase its multicast/broadcast assignment. As a result, given m available distinguished wavelengths, one M/B module can support at most m M/B requests at the same time. The new proposed M/B module with WDM is more practical and economical to apply amplification in space-division networks. 相似文献

6.

PeerCast: Churn-resilient end system multicast on heterogeneous overlay networks

Jianjun Zhang Ling Liu Lakshmish Ramaswamy Calton Pu 《Journal of Network and Computer Applications》2008,31(4):821-850

相似文献

7.

TTPM – An efficient deadlock-free algorithm for multicast communication in 2D torus networks

M.G. A.A. M.A. K. 《Journal of Systems Architecture》2008,54(10):919-928

A torus network has become increasingly important to multicomputer design because of its many features including scalability, low bandwidth and fixed degree of nodes. A multicast communication is a significant operation in multicomputer systems and can be used to support several other collective communication operations. This paper presents an efficient algorithm, TTPM, to find a deadlock-free multicast wormhole routing in two-dimensional torus parallel machines. The introduced algorithm is designed such that messages can be sent to any number of destinations within two start-up communication phases; hence the name Torus Two Phase Multicast (TTPM) algorithm. An efficient routing function is developed and used as a basis for the introduced algorithm. Also, TTPM allows some intermediate nodes that are not in the destination set to perform multicast functions. This feature allows flexibility in multicast path selection and therefore improves the performance. Performance results of a simulation study on torus networks are discussed to compare TTPM algorithm with a previous algorithm. 相似文献

8.

Multicast communication in multicomputer networks

Lin X. Ni L.M. 《Parallel and Distributed Systems, IEEE Transactions on》1993,4(10):1105-1117

Efficient routing of messages is a key to the performance of multicomputers. Multicast communication refers to the delivery of the same message from a source node to an arbitrary number of destination nodes. While multicast communication is highly demanded in many applications, most of the existing multicomputers do not directly support this service; rather it is indirectly supported by multiple one-to-one or broadcast communications, which result in more network traffic and a waste of system resources. The authors study routing evaluation criteria for multicast communication under different switching technologies. Multicast communication in multicomputers is formulated as a graph theoretical problem. Depending on the evaluation criteria and switching technologies, they study three optimal multicast communication problems, which are equivalent to the finding of the following three subgraphs: optimal multicast path, optimal multicast cycle, and minimal Steiner tree, where the interconnection of a multicomputer defines a host graph. They show that all these optimization problems are NP-complete for the popular 2D-mesh and hypercube host graphs. Heuristic multicast algorithms for these routing problems are proposed 相似文献

9.

Communication modeling of multicast in all-port wormhole-routed NoCs

Mahmoud Moadeli^{Author Vitae} Wim Vanderbauwhede Author Vitae 《Journal of Systems and Software》2010,83(8):1327-1336

Multicast is one of the most frequently used collective communication operations in multi-core SoC platforms. Bus as the traditional interconnect architecture for SoC development has been highly efficient in delivering multicast messages. Since the bus is non-scalable, it can not address the bandwidth requirements of the large SoCs. The networks on-chip (NoCs) emerged as a scalable alternative to address the increasing communication demands of such systems. However, due to its hop-to-hop communication, the NoCs may not be able to deliver multicast operations as efficiently as buses do. Adopting multi-port routers has been an approach to improve the performance of the multicast operations in interconnection networks. This paper presents a novel analytical model to compute communication latency of the multicast operation in wormhole-routed interconnection networks employing asynchronous multi-port routers scheme. The model is applied to the Quarc NoC and its validity is verified by comparing the model predictions against the results obtained from a discrete-event simulator developed using OMNET++. 相似文献

10.

Deadlock-free multicast wormhole routing in 2-D mesh multicomputers

Xiaola Lin McKinley P.K. Ni L.M. 《Parallel and Distributed Systems, IEEE Transactions on》1994,5(8):793-804

Multicast communication services, in which the same message is delivered from a source node to an arbitrary number of destination nodes, are being provided in new-generation multicomputers. Broadcast is a special case of multicast in which a message is delivered to all nodes in the network. The nCUBE-2, a wormhole-routed hypercube multicomputer, provides hardware support for broadcast and a restricted form of multicast in which the destinations form a subcube. However, the broadcast routing algorithm adopted in the nCUBE-2 is not deadlock-free. In this paper, four multicast wormhole routing strategies for 2-D mesh multicomputers are proposed and studied. All of the algorithms are shown to be deadlock-free. These are the first deadlock-free multicast wormhole routing algorithms ever proposed. A simulation study has been conducted that compares the performance of these multicast algorithms under dynamic network traffic conditions in a 2-D mesh. The results indicate that a dual-path routing algorithm offers performance advantages over tree-based, multipath, and fixed-path algorithms 相似文献

11.

Stateless key distribution for secure intra and inter-group multicast in mobile wireless network

《Computer Networks》2007,51(15):4303-4321

Group communication has become an important component in wireless networks. In this paper, we focus on the environments in which multiple groups coexist in the system, and both intra and inter-group multicast traffic must be protected by secret keys. We propose a mechanism that integrates polynomials with stateless secret updates to achieve personal key share distribution and efficient key refreshment during group changes. The proposed mechanism distributes keys via true broadcast. Compared to previous approaches, the proposed mechanism has the following advantages: (1) The adoption of symmetric encryption/decryption for multicast traffic matches the limited processing capability of wireless nodes. (2) The stateless feature of key distribution matches the properties of mobile wireless networks including frequent topology changes and temporary connection disruptions. (3) Special mechanisms are designed to reduce the communication overhead during key updates and provide protection against both intra and inter-group impersonation. The storage, computation, and communication overhead of the proposed mechanism is investigated. Analysis and simulation are conducted to demonstrate the improvements over previous approaches. 相似文献

12.

Interleaved all-to-all reliable broadcast on meshes and hypercubes

Sunggu Lee Shin K.G. 《Parallel and Distributed Systems, IEEE Transactions on》1994,5(5):449-458

All-to-all (ATA) reliable broadcast is the problem of reliably distributing information from every node to every other node in point-to-point interconnection networks. A good solution to this problem is essential for clock synchronization, distributed agreement, etc. We propose a novel solution in which the reliable broadcasts from individual nodes are interleaved in such a manner that no two packets contend for the same link at any given time-this type of method is particularly suited for systems which use virtual cut-through or wormhole routing for fast communication between nodes. Our solution, called the IHC Algorithm, can be used on a large class of regular interconnection networks including regular meshes and hypercubes. By adjusting a parameter η referred to as the interleaving distance, we can flexibly decrease the link utilization of the IHC algorithm (for normal traffic) at the expense of an increase in the time required for ATA reliable broadcast. We compare the IHC algorithm to several other possible virtual cut-through solutions and a store-and-forward solution. The IHC algorithm with the minimum value of η is shown to be optimal in minimizing the execution time of ATA reliable broadcast when used in a dedicated mode (with no other network traffic) 相似文献

13.

支持多点交互同步的应用层组播路由算法

姜山郑庆华刘均杜海鹏《软件学报》2011,22(5):972-985

提出了在多源组播路由过程中解决交互同步问题而无须使用同步控制器的思想,在这种思想的基础上进一步实现低延迟组播.主要贡献包括:(1)建立面向多点交互过程的同步模型及证明支持多点交互同步的组播路由定理;(2)提出一种有效的、低延迟的、支持多点交互同步的应用层组播路由算法;(3)采用数学方法对新算法和现有相关算法进行性能分析... 相似文献

14.

高性能互联网络交换机研究与设计 总被引：1，自引：0，他引：1

王达伟曹政刘新春游定山孙凝晖《计算机研究与发展》2008,45(12)

高性能互联网络交换机是高性能计算机系统的核心部件,科学计算作为高性能计算机的上层应用.不仅要求交换机具有低延迟、高带宽的特性.还要求其在集合通信如广播、多播和同步操作等进行硬件级支持.HyperLink交换机,作为曙光5000计算机系统互联网络的重要组成部件,具有38.4 ns单级延迟和160 Gbps聚合带宽,并能够同时支持16组多播和16组同步操作.理想情况下,1024个节点多播和同步操作可以在2μs内完成,大大加速了科学计算的性能.为了对HyperLink交换机性能进行评价,建立了周期精确的仿真模型.通过模拟证明,对于16端口输入缓冲交换机,3个虚通道是性价比最好的选择;当MTU为1KB时,4 KB大小的输入缓冲就可达到最高单播吞吐率.采用理论分析的方法比较了具有相同网络带宽的多轨网络和单轨网络,分析表明,前者可以有效降低网络延迟,因此能够比后者提供更高的网络吞吐率.采用LogP模型分析了HyperLink多播和Barrier的性能,分析表明,HyperLink交换机具有良好扩展性,能够很好支持到数千节点. 相似文献

15.

6LoWPAN网络组播通信方案的研究与设计

孙佳浩汪诚诚唐道鲜李跃华《计算机工程与科学》2021,43(2):288-294

随着无线传感器网络(WSN)对新应用的需求不断增加,基于IEEE 802.15.4实现IPv6通信的低速无线个人局域网标准6LoWPAN是将WSN接入Internet实现全IP通信的理想解决方案.在此提出了一种基于6LoWPAN网络的组播通信方案,通过自组建M AC地址的方式,对现有的6LoWPAN网络增加了对组播通信的支持,设计完成了6LoWPAN网络组播通信方案,降低了组播通信下组内节点接收网关数据的时延,以及组外节点对无关数据的处理消耗.结果分析表明,该组播通信方案下的节点通信时延是单播通信下节点通信时延的15.13％,组外节点数据处理效率比广播通信下的组外节点提高了39.02％.该通信方案能够获得预期功能和性能,6LoWPAN节点能够动态加入和退出组播组,接收组播组内信息. 相似文献

16.

A new approach to improve reliability of the multistage interconnection networks

《Computers & Electrical Engineering》2014,40(8):348-374

Because of their cost-effectiveness, multistage interconnection networks are widely used in parallel multiprocessor systems to make a connection among the processors and memory modules. One of the most important requirements for these communication systems is reliability. Adding a number of stages to these networks is one of the main approaches to promote this issue. Despite its modest cost and ease of implementation, this approach improves the reliability only to a small extent, which is not desirable, especially for large-scale systems. In this paper, we propose a new approach to improve reliability of the networks, called reducing nodes. Extensive reliability analyses from two major perspectives, terminal and broadcast, demonstrate that this idea can achieve a tremendous advantage over the aforementioned approach. 相似文献

17.

Multiple multicast with minimized node contention on wormhole k-aryn-cube networks

Kesavan R. Panda D.K. 《Parallel and Distributed Systems, IEEE Transactions on》1999,10(4):371-393

This paper presents a new approach to minimize node contention while performing multiple multicast/broadcast on wormhole k-ary n-cube networks with overlapped destination sets. The existing multicast algorithms in the literature deliver poor performance under multiple multicast because these algorithms have been designed with only single multicast in mind. The new algorithms introduced in this paper do not use any global knowledge about the respective destination sets of the concurrent multicasts. Instead, only local information and a source-specific partitioning approach are used. For systems supporting unicast message-passing, a new SPUmesh (Source-Partitioned Umesh) algorithm is proposed and is shown to be superior than the conventional Umesh algorithm for multiple multicast. Two different algorithms, SQHL (Source-Quadrant Hierarchical Leader) and SCHL (Source-Centered Hierarchical Leader), are proposed for systems with multidestination message-passing and shown to be superior than the HL scheme. All of these algorithms perform 1) 5-10 times faster than the existing algorithms under multiple multicast and 2) as fast as existing algorithms under single multicast. Furthermore, the SCHL scheme demonstrates that the latency of multiple multicast can, in fact, be reduced as the degree of multicast increases beyond a certain number. Thus, these algorithms demonstrate significant potential to be used for designing fast and scalable collective communication libraries on current and future generation wormhole systems 相似文献

18.

A note on distributed multicast routing in point-to-point networks

《Computers & Operations Research》2001,28(12):1149-1164

The distributed algorithm for a multicast connection set-up, based on the ‘cheapest insertion’ heuristic, is reviewed. The multicast routing problem is translated into a Steiner tree problem in point-to-point networks where nodes have only a limited knowledge about the network. A solution is proposed in which the time complexity and the amount of information exchanged between network nodes are proportional to the number of members of the multicast group. The Steiner tree is constructed by means of a distributed table-passing algorithm. The analysis of the algorithm presented, backed up by simulation results, confirms its superiority over the algorithm based on ‘waving technique’.Scope and purposeMulticasting is a mechanism used in communication networks that allows distribution of information from a single source to multiple destinations. The problem of finding a multicast connection for a static group of communicating entities in connection-oriented point-to-point network can be formulated in graph theory as a minimum Steiner tree problem. Due to NP-completeness of the Steiner tree problem multicast, routing algorithms are based on heuristics. The diversity of network environments and the lack of centralised information about network topology require an effective distribution of the multicast routing algorithms among the network nodes. This article presents an alternative to the distributed algorithm proposed by Rugelj and Klavzar that implements the same heuristics for the construction of a minimum cost multicast connection in point-to-point networks. The present algorithm constitutes a substantial improvement over that previously proposed with regard to running time and the amount of the information exchanged between network nodes. 相似文献

19.

一个新的分布式最小连通支配集近似算法 总被引：32，自引：0，他引：32

彭伟卢锡城《计算机学报》2001,24(3):254-258

在计算机网络中广泛使用广播来解决一些网络问题,设计有效的广播算法是一项重要的课题。文中提出一种分布地计算网络最小连通支配集的近似算法并给出了它的正确性证明。它只需要网络节点具有局部的网络状态信息,可伸缩性强。通过此算法可以在网络中自动形成一个虚拟骨干网,从而可为网络中的广播和路由操作提供一个有效的通信基础。模拟结果表明,文中提出的算法求得的连通支配集小,能较好地应用于一般网络以及移动自组网络中。相似文献

20.

Using optimistic atomic broadcast in transaction processing systems 总被引：4，自引：0，他引：4

Kemme B. Pedone F. Alonso G. Schiper A. Wiesmann M. 《Knowledge and Data Engineering, IEEE Transactions on》2003,15(4):1018-1032

Atomic broadcast primitives are often proposed as a mechanism to allow fault-tolerant cooperation between sites in a distributed system. Unfortunately, the delay incurred before a message can be delivered makes it difficult to implement high performance, scalable applications on top of atomic broadcast primitives. Recently, a new approach has been proposed for atomic broadcast which, based on optimistic assumptions about the communication system, reduces the average delay for message delivery to the application. We develop this idea further and show how applications can take even more advantage of the optimistic assumption by overlapping the coordination phase of the atomic broadcast algorithm with the processing of delivered messages. In particular, we present a replicated database architecture that employs the new atomic broadcast primitive in such a way that communication and transaction processing are fully overlapped, providing high performance without relaxing transaction correctness. 相似文献