首页 | 本学科首页   官方微博 | 高级检索  
     

面向申威异构架构的并行代码自动生成
引用本文:陶小涵,朱雨,庞建民,赵捷,徐金龙.面向申威异构架构的并行代码自动生成[J].软件学报,2023,34(4):1570-1593.
作者姓名:陶小涵  朱雨  庞建民  赵捷  徐金龙
作者单位:信息工程大学, 河南 郑州 450001;数学工程与先进计算国家重点实验室, 河南 郑州 450001
基金项目:国家自然科学基金(61702546)
摘    要:异构架构逐渐成为高性能计算领域的主流架构,但相较于同构多核架构,其硬件结构及存储层次更为复杂,程序编写更为困难.先进的优化编译器可以协助程序开发人员实现更为高效的代码,降低程序开发复杂度.多面体编译模型通过抽象分析将程序抽象成空间多面体表示形式,能够将多种循环变换与硬件映射相结合,并面向特定体系结构生成相应的代码.设计实现了一个面向国产申威异构架构的并行代码自动生成系统,采用“源-源”编译模式,基于多面体编译模型实现.系统针对申威异构架构特点将程序计算过程进行硬件部署,同时实现数据传输与内存空间的自动管理.实验基于Polybench测试集中线性代数相关用例进行测试.结果表明,利用代码自动生成系统生成的异构并行代码能够在申威异构平台上正确运行,并能够有效发挥申威异构平台的性能,基于申威异构平台利用64线程加速计算的平均加速比达到了539.16倍.

关 键 词:申威异构架构  多面体模型  并行计算  代码生成
收稿时间:2021/11/25 0:00:00
修稿时间:2022/2/2 0:00:00

Parallel Code Generation for Sunway Heterogeneous Architecture
TAO Xiao-Han,ZHU Yu,PANG Jian-Min,ZHAO-Jie,XU Jin-Long.Parallel Code Generation for Sunway Heterogeneous Architecture[J].Journal of Software,2023,34(4):1570-1593.
Authors:TAO Xiao-Han  ZHU Yu  PANG Jian-Min  ZHAO-Jie  XU Jin-Long
Affiliation:Information Engineering University, Zhengzhou 450001, China;State Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou 450001, China
Abstract:Heterogeneous architectures are dominating the realm of high-performance computing. However, these architectures also complicate the programming issue due to its increasingly complex hardware and memory hierarchy compared to homogeneous architectures. One of the most promising solutions to this issue is making use of optimizing compilers which can help programmers develop high-performance code executable on target machines, thereby simplifying the difficulty of programming. The polyhedral model is widely studied due to its ability to generate effective code and portability to various targets, which is realized by first converting a program into its intermediate representation and then combining the compositions of loop transformations and hardware binding strategies. This paper presents a source-to-source parallel code generator targeting the domestic, heterogeneous architecture of the Sunway machine using the polyhedral model. In particular, the computation is deployed automatedly onto the Sunway architecture and memory management, minimizing the amount of data movements between the management processing element and computing processing elements of the target. The experiments are conducted on 13 linear algebra applications extracted from the Polybench Benchmarks. The experimental results show that the proposed approach can generate effective code executable on the Sunway heterogeneous architecture, providing a mean speedup of 539.16× on 64 threads over the sequential implementation executed on a management processing element.
Keywords:Sunway heterogeneous architecture  polyhedral model  parallel computing  code generation
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号