复杂异构计算系统HPL的优化 Optimization of HPL on Complex Heterogeneous Computing System期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

复杂异构计算系统HPL的优化

引用本文：	黎雷生,杨文浩,马文静,张娅,赵慧,赵海涛,李会元,孙家昶. 复杂异构计算系统HPL的优化[J]. 软件学报, 2021, 32(8): 2307-2318

作者姓名：	黎雷生杨文浩马文静张娅赵慧赵海涛李会元孙家昶

作者单位：	中国科学院软件研究所并行软件与计算科学实验室,北京 100190;计算机科学国家重点实验室(中国科学院软件研究所),北京 100190;中国科学院软件研究所并行软件与计算科学实验室,北京 100190

基金项目：	中国科学院战略性先导科技专项（C类）（XDC01030200）；国家重点研发计划（2018YFB0204404，2016YFB0200601）；国家自然科学基金（11871455，11971016）

摘要：	当今世界的主流超级计算机越来越多地使用带有加速器的异构系统.随着加速器的浮点性能不断提高,超级计算机内计算节点的CPU、内存、总线、网络以及系统架构都要与之相适应.HPL(high performance Linpack)是高性能计算机评测的传统基准测试程序,复杂异构系统给HPL评测带来很多机遇与挑战.针对带有GPU的...
关键词：	复杂异构系统平衡点理论 panel分解加速连续流水线算法
收稿时间：	2019-08-20
修稿时间：	2019-12-05
Optimization of HPL on Complex Heterogeneous Computing System

LI Lei-Sheng,YANG Wen-Hao,MA Wen-Jing,ZHANG Y,ZHAO Hui,ZHAO Hai-Tao,LI Hui-Yuan,SUN Jia-Chang. Optimization of HPL on Complex Heterogeneous Computing System[J]. Journal of Software, 2021, 32(8): 2307-2318

Authors:	LI Lei-Sheng YANG Wen-Hao MA Wen-Jing ZHANG Y ZHAO Hui ZHAO Hai-Tao LI Hui-Yuan SUN Jia-Chang

Affiliation:	Laboratory of Parallel Software and Computational Science, Institute of Software, Chinese Academy of Sciences, Beijing 100190, China;State Key Laboratory of Computer Science(Institute of Software, Chinese Academy of Sciences), Beijing 100190, China

Abstract:	Nowadays, the mainstream supercomputers in the world adopt heterogeneous systems with accelerators more and more. The increase of float point computation performance of the accelerators requires other components to match its speed, including CPU, memory, bus, and network. High performance Linpack (HPL) is the traditional benchmark for high performance computers. Complex heterogeneous systems have brought both opportunities and challenges to the benchmarking with HPL. Therefore, for heterogeneous supercomputers, a new task partitioning scheme between the CPU and the accelerators is proposed, using the balance point theory to guide the optimization of HPL. For optimizing HPL, a look-ahead algorithm is proposed to coordinate the collaboration of CPU and the accelerators, as well as a contiguous row-swap algorithm, enabling the parallelism among CPU, accelerators, and network. Besides, new panel factorization and row-swap implementations have been designed for the system with accelerators, improving the effectiveness and efficiency of the usage of accelerators. With the configuration of 4 GPUs on each computing node, HPL efficiency of 79.51% on a single node.

Keywords:	complex heterogeneous system balance point theory panel factorization acceleration contiguous row-swap algorithm
本文献已被万方数据等数据库收录！
	点击此处可从《软件学报》浏览原始摘要信息
	点击此处可从《软件学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏