首页 | 本学科首页   官方微博 | 高级检索  
     

GROMACS 2020在ROCm平台上的移植与优化
引用本文:张驭洲,曹武迪,卜景德,谭光明,吉青.GROMACS 2020在ROCm平台上的移植与优化[J].计算机工程与科学,2021,43(11):1901-1909.
作者姓名:张驭洲  曹武迪  卜景德  谭光明  吉青
作者单位:(1.中国科学院理论物理研究所理论物理先进计算联合实验室,北京 100190; 2.中国科学院计算技术研究所计算机体系结构国家重点实验室, 北京 100190)
基金项目:国家重点研发计划(2018YFB0204400)
摘    要:GROMACS是应用广泛的开源分子动力学模拟软件,当前主要通过CUDA使用NVIDIA GPU进行加速计算。ROCm是一个开源的高性能异构计算平台。基于ROCm平台的HIP编程语言,首次实现了GROMACS 2020系列在ROCm平台上的完整移植。在MI50 GPU上,以一个复杂离子液体模拟算例为目标,使用GPU性能分析工具rocprof对移植代码进行了性能分析。针对MI50硬件特性,先后对成键力核函数、静电力的PME核函数和短程非成键力核函数进行了优化,优化后运行目标算例的性能相比初始版本整体上获得了约2.8倍的加速比,在 MI50上的性能高于GROMACS原版OpenCL代码60.5%,相对纯CPU版本有约2.7倍的加速比。在另外2个具有代表性算例的单结点测试以及离子液体算例的多结点扩展性测试中,优化后的代码也达到了较好的性能提升,这表明所采用的优化操作具有一定的通用性。

关 键 词:分子动力学  GROMACS  ROCm  应用移植  性能优化  
收稿时间:2020-09-18
修稿时间:2020-12-02

Porting and optimization of GROMACS 2020 on ROCm platform
ZHANG Yu-zhou,CAO Wu-di,BU Jing-de,TAN Guang-ming,JI Qing.Porting and optimization of GROMACS 2020 on ROCm platform[J].Computer Engineering & Science,2021,43(11):1901-1909.
Authors:ZHANG Yu-zhou  CAO Wu-di  BU Jing-de  TAN Guang-ming  JI Qing
Affiliation:(1.Joint Laboratory of Advanced Computing for Theoretical Physics, Institute of Theoretical Physics,Chinese Academy of Sciences,Beijing 100190; 2.State Key Laboratory of Computer Architecture,Institute of Computing Technology, Chinese Academy of Sciences,Beijing 100190,China)
Abstract:GROMACS is a widely used open-source molecular dynamics simulation software. Currently, NVIDIA GPUs are mainly used for accelerated calculations through CUDA. ROCm is an open-source high-performance heterogeneous computing platform. Based on the HIP programming language of the ROCm platform, this paper implements the complete porting of the GROMACS 2020 series on the ROCm platform for the first time. On MI50 GPU, with a complex ionic liquid simulation example as the target, the performance analysis of the transplanted code was carried out using GPU performance analysis tool rocprof. According to the hardware characteristics of MI50, the bonding force kernel function, the PME kernel function of electrostatic force and the short-range non-bonding force kernel function are optimized successively. After optimization, the performance of the target calculation example is about 2.8 times that of the initial version. The performance on MI50 is 60.5% higher than that of the GROMACS original OpenCL code, which is about 2.7 times faster than the pure CPU version. In the single-node test of the other two representative examples and the multi-node scalability test of the ionic liquid example, the optimized code also achieves a better performance improvement, which shows that the optimization has a certain versatility.
Keywords:molecular dynamics  GROMACS  radeon open compute  application porting  performance optimization  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机工程与科学》浏览原始摘要信息
点击此处可从《计算机工程与科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号