首页 | 本学科首页   官方微博 | 高级检索  
     

E级高性能计算机系统中监控分系统的挑战与设计
引用本文:袁远,李世杰,邢建英,蒋句平.E级高性能计算机系统中监控分系统的挑战与设计[J].计算机工程与科学,2021,43(8):1366-1375.
作者姓名:袁远  李世杰  邢建英  蒋句平
作者单位:(国防科技大学计算机学院,湖南 长沙 410073)
基金项目:国家重点研发计划(2018YFB0204301)
摘    要:随着E级高性能计算机系统组装密度成倍增加,结点规模不断扩大,监控分系统在可扩展性、可靠性、可服务性和高效运维上面临巨大挑战。针对这些挑战,从架构、网络、功能和运维4个方面介绍了监控分系统的设计思路,并通过原型系统验证了部分设计的可行性与优势,对未来E级系统的构建具有较大的支撑作用。

关 键 词:E级高性能计算机系统  监控分系统  可扩展性  可靠性  
收稿时间:2020-06-17
修稿时间:2020-09-18

Monitoring subsystem for exascale HPC systems :Challenges and design
YUAN Yuan,LI Shi-jie,XING Jian-ying,JIANG Ju-ping.Monitoring subsystem for exascale HPC systems :Challenges and design[J].Computer Engineering & Science,2021,43(8):1366-1375.
Authors:YUAN Yuan  LI Shi-jie  XING Jian-ying  JIANG Ju-ping
Affiliation:(College of Computer Science and Technology,National University of Defense Technology,Changsha 410073,China)
Abstract:The High-Performance Computer (HPC) systems built for future Exascale computing require a several-times increase of assembly density, along with the large expansion of node scale. This presents huge challenges for HPC monitoring subsystem in terms of scalability, reliability, serviceability, and maintenance. In response to these challenges, this paper introduces the design ideas of the monitoring subsystem from the four aspects of architecture, network, functionality, and maintenance, and verifies the feasibility and advantages of some designs through the prototype system, which can significantly benefit the construction of future exascale HPC systems.
Keywords:exascale high-performance computer system  monitoring subsystem  scalability  reliability  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机工程与科学》浏览原始摘要信息
点击此处可从《计算机工程与科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号