首页 | 本学科首页   官方微博 | 高级检索  
     

神威E级原型机互连网络和消息机制
引用本文:高剑刚,卢宏生,何王全,任秀江,陈淑平,斯添浩,周舟,胡舒凯,于康,魏迪.神威E级原型机互连网络和消息机制[J].计算机学报,2021,44(1):222-234.
作者姓名:高剑刚  卢宏生  何王全  任秀江  陈淑平  斯添浩  周舟  胡舒凯  于康  魏迪
作者单位:国家并行计算机工程技术研究中心 北京100190;国家并行计算机工程技术研究中心 北京100190;国家并行计算机工程技术研究中心 北京100190;国家并行计算机工程技术研究中心 北京100190;国家并行计算机工程技术研究中心 北京100190;国家并行计算机工程技术研究中心 北京100190;国家并行计算机工程技术研究中心 北京100190;国家并行计算机工程技术研究中心 北京100190;国家并行计算机工程技术研究中心 北京100190;国家并行计算机工程技术研究中心 北京100190
基金项目:本课题得到国家重点研发计划项目
摘    要:本文描述了神威E级原型机的互连网络和消息机制.神威E级原型机是继神威蓝光、神威·太湖之光之后神威家族的第三代计算机.该计算机作为一台E级计算机的原型机,峰值性能3.13 PFlops,其最大的特色之一就是采用28 Gbps传输技术,设计开发了新一代的神威高阶路由器和神威高性能网络接口两款芯片,在传统胖树的基础上,设计了双轨泛树拓扑结构,定义实现了新颖的神威消息原语和消息库,实现了一种基于包级粒度动态切换的双轨乱序消息机制,通信性能比神威·太湖之光互连网络提升了4倍,为神威E级计算机互连网络的研制奠定了基础.

关 键 词:多轨网络  泛树  高阶路由器  路由算法  网络接口  消息引擎  消息库

The Interconnection Network and Message Machinasim of Sunway Exascale Prototype System
GAO Jian-Gang,LU Hong-Sheng,HE Wang-Quan,REN Xiu-Jiang,CHEN Shu-Ping,SI Tian-Hao,ZHOU Zhou,HU Shu-Kai,YU Kang,WEI Di.The Interconnection Network and Message Machinasim of Sunway Exascale Prototype System[J].Chinese Journal of Computers,2021,44(1):222-234.
Authors:GAO Jian-Gang  LU Hong-Sheng  HE Wang-Quan  REN Xiu-Jiang  CHEN Shu-Ping  SI Tian-Hao  ZHOU Zhou  HU Shu-Kai  YU Kang  WEI Di
Affiliation:(National Research Center of Parallel Computer Engineering and Technology,Beijing 100190)
Abstract:The high-performance interconnection network is one of the main components of the high-performance computing system.It is responsible for the connection of computing nodes,storage nodes,and I/O devices in the high-performance computing system,and is responsible for the communication of all nodes in the high-performance computing system.There are a large number of parallel applications in high-performance computing systems that need to exchange data between different nodes(between computing nodes,between computing nodes and IO nodes,between computing nodes and storage nodes).High requirements are put forward for the communication delay and bandwidth of high-performance interconnection networks.A large number of high-performance computing systems have adopted customized interconnection networks to meet application requirements.The customized interconnection network can well meet the design requirements of high performance computing system,and can optimize the design of network performance such as communication delay and communication bandwidth to better meet the various communication requirements of high-performance computing systems and improve communication performance,thereby improving the actual operating performance of parallel applications in high-performance computing systems.Interconnection network design is an important means to improve network communication performance.At the same time,the message mechanism has a huge influence on communication performance.Even under the same topology and router conditions,different message mechanisms will still cause huge differences in communication performance.The customized features of customized networks are largely reflected in the ability to customize various message mechanisms.Each customized network has its own message mechanism and defines its own message protocol to meet its own special communication needs.The high-performance interconnection network and message mechanism are studied on the purpose of independent control.The communication performance must match the fast developing computing capability on the road to exascale system.The worldwide top supercomputers mainly select Mellanox InfiniBand,Cray Aries,Intel Onmi-path,and employ the 25 Gbps transmission technique to implement their interconnection network.The networks of the top domestic supercomputer,such as“Sunway Taihu Light”and“Tianhe 2”,are constructed based on 14 Gbps transmission.The interconnection network and message mechanism of the Sunway exascale prototype system are introduced in this paper.Sunway exascale prototype system is the third-generation supercomputer of Sunway supercomputer family,after Sunway Blue Light and Sunway Taihu Light.As a pre-research project for the exascale system,the peak performance of this system is up to 3.13 PFlops.The interconnection network of this system is constructed based on two innovative Sunway chips:the Sunway high-radix router chip and Sunway high-performance network interface chip,depending on the 28 Gbps transmission technique.Moreover,a generalized fat-tree network topology is developed;an out-of-order message mechanism with dynamic packet-interleaved transmission in two rails is implemented;the efficient Sunway message verbs and library are designed.The communication performance of the prototype system improves 4 times compared with Sunway Taihu Light,and it therefore makes the solid technology foundation for Sunway exascale system.Sunway exascale prototype system makes the break-through on the key technologies of 28 Gbps transmission,high-radix router,high-performance network interface,high-efficient and reliable network architecture.Furthermore,Sunway network chipset of new generation is designed,and the network of Sunway exascale prototype system is constructed.They all contribute to the design of the domestic exascale supercomputer.The research achieves the goal of innovative design of the exascale system by constructing the large-scale verification system,mastering the techniques of new interconnection network architecture,and testing based on domestic components and parts.
Keywords:multi-rail network  generalized fat-tree topology  high-radix router chip  routing arithmetic  network interface  message engine  message library
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号