高性能众核处理器申威26010 Pytorch: An imperative style high-performance deep learning libray期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

高性能众核处理器申威26010

引用本文：	肖谦, 赵美佳, 李名凡, 沈莉, 陈俊仕, 周文浩, 王飞, 安虹. 面向新一代国产异构众核处理器的数据流计算系统[J]. 计算机研究与发展, 2023, 60(10): 2405-2417. DOI: 10.7544/issn1000-1239.202220562

作者姓名：	肖谦赵美佳李名凡沈莉陈俊仕周文浩王飞安虹

作者单位：	1.中国科学技术大学计算机科学与技术学院　合肥　230026;2.国家超级计算无锡中心　江苏无锡　214100;3.清华大学计算机科学与技术系　北京　100084

摘要：	如今，科学研究已从计算科学时代进入数据科学时代. 从海量数据中发现规律和突破科学发展瓶颈是数据科学范式的主要目标. 与此同时，高性能计算机（HPC）也越来越重视智能算力，在传统高性能计算方法的基础上融合人工智能算法（HPC+AI），更有利于在数据科学时代解决实际问题，并能充分发挥高性能计算机的智能算力. 不过，在国产HPC系统——特别是面向由新一代国产异构众核处理器sw26010pro构建的HPC系统——上支撑HPC+AI领域应用，则面临着诸多挑战. 提出了一种面向国产异构众核处理器的数据流计算系统swFLOWpro，支持使用TensorFlow接口构建数据流程序，实现对用户透明的众核加速，并实现了面向全处理器视角的两级并行策略. 经测试，系统针对典型核心计算，单核组众核加速比最高可达545倍、典型模型众核加速比最高可达346倍，全片6核组并行执行ResNet50模型训练，对比单核组加速比达到4.96倍，并行效率82.6%. 实验表明，swFLOWpro能够支持以深度学习为代表的数据流程序在国产异构众核处理器上的高效运行.
关键词：	数据流深度学习异构众核 swFLOWpro系统高性能计算
收稿时间：	2022-06-16
修稿时间：	2023-01-16
Pytorch: An imperative style high-performance deep learning libray

Xiao Qian, Zhao Meijia, Li Mingfan, Shen Li, Chen Junshi, Zhou Wenhao, Wang Fei, An Hong. A Dataflow Computing System for New Generation of Domestic Heterogeneous Many-Core Processors[J]. Journal of Computer Research and Development, 2023, 60(10): 2405-2417. DOI: 10.7544/issn1000-1239.202220562

Authors:	Xiao Qian Zhao Meijia Li Mingfan Shen Li Chen Junshi Zhou Wenhao Wang Fei An Hong

Affiliation:	1.Institute of Computer Science and Technology, University of Science and Technology of China, Hefei 230026;2.National Suppercomputer Center in Wuxi, Wuxi, Jiangsu 214100;3.Department of Computer Science and Technology, Tsinghua University, Beijing 100084

Abstract:	Today, scientific research has moved from the era of computational science to the era of data science. Discovering laws from massive data and breaking through bottlenecks in scientific development are the main goals of the data science paradigm. At the same time, high performance computers are also paying more and more attention on intelligent computing power. Integrating AI algorithms on the basis of traditional high performance computing methods (HPC+AI) is more conducive to solving practical science problems in the era of data science, and can give full play to the intelligent computing power of high performance computers. However, on domestic HPC systems, especially on HPC systems constructed by the new generation of domestic heterogeneous many-core processors, there are many challenges to support HPC+AI programs. In this paper, we propose a data flow computing system for domestic heterogeneous many-core processors, which is called swFLOWpro. The system supports the use of TensorFlow interface to build data flow programs, and realizes many-core parallel acceleration transparent to users, and implements two-level parallel strategy based on the whole processor perspective. Testing on sw26010pro processor, swFLOWpro can get up to 545 times single core group (CG) many-core speedup ratio for typical OP, 346 times for typical deep learning models. Compared with the single CG of sw26010pro, we execute ResNet50 model on all the 6 CGs for one whole processor, and the speedup ration is up to 4.96 times, whose parallel efficiency is 82.6%. Experiments show that swFLOWpro can support the efficient execution of data flow programs represented by deep learning on domestic heterogeneous many-core processors.

Keywords:	dataflow deep learning heterogeneous many-core swFLOWpro system high performance computing

	点击此处可从《计算机研究与发展》浏览原始摘要信息
	点击此处可从《计算机研究与发展》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏