首页 | 本学科首页   官方微博 | 高级检索  
     

CPU-GPU协同计算在MOC中子输运异构并行计算中的应用研究
引用本文:宋佩涛,张志俭,张乾,梁亮,赵强. CPU-GPU协同计算在MOC中子输运异构并行计算中的应用研究[J]. 核动力工程, 2020, 41(4): 17-21. DOI: 10.13832/j.jnpe.2020.04.0017
作者姓名:宋佩涛  张志俭  张乾  梁亮  赵强
作者单位:哈尔滨工程大学核安全与仿真技术国防重点学科实验室,哈尔滨,150001,哈尔滨工程大学核安全与仿真技术国防重点学科实验室,哈尔滨,150001,哈尔滨工程大学核安全与仿真技术国防重点学科实验室,哈尔滨,150001,哈尔滨工程大学核安全与仿真技术国防重点学科实验室,哈尔滨,150001,哈尔滨工程大学核安全与仿真技术国防重点学科实验室,哈尔滨,150001
摘    要:特征线方法(MOC)可以精确求解任意几何的中子输运方程,但该方法收敛慢、计算时间长。本研究基于空间区域分解和特征线并行技术,采用MPI+OpenMP/CUDA编程模型,实现了适用于中央处理器-图形处理器(CPU-GPU)异构系统的二维MOC异构并行算法。为充分利用异构系统中的CPU和GPU计算资源,实现CPU-GPU协同计算,提出动态任务分配模型,根据CPU和GPU的计算能力合理分配计算任务。数值验证结果表明:程序具有良好的计算精度;动态任务分配模型能根据硬件性能给出最佳任务分配方案;5异构节点(包含20块GPU)并行时,相对MPI+CUDA并行模式,采用CPU-GPU协同计算后,程序整体效率提升达到14%。 

关 键 词:异构并行   特征线方法   中子输运计算   GPU   CUDA

Study on Heterogeneous Computing for MOC Neutron Transport Calculation with CPU-GPU Concurrent Calculation
Song Peitao,Zhang Zhijian,Zhang Qian,Liang Liang,Zhao Qiang. Study on Heterogeneous Computing for MOC Neutron Transport Calculation with CPU-GPU Concurrent Calculation[J]. Nuclear Power Engineering, 2020, 41(4): 17-21. DOI: 10.13832/j.jnpe.2020.04.0017
Authors:Song Peitao  Zhang Zhijian  Zhang Qian  Liang Liang  Zhao Qiang
Abstract:The Method of Characteristics (MOC) is capable to accurately solve the neutron transport equation with arbitrary geometry. However, the MOC suffers from some drawbacks: slow convergence and time consuming. Based on the spatial domain decomposition and the ray parallelization, the parallel 2D MOC algorithm was implemented with MPI+OepnMP/CUDA programming model to leverage the computing power of Central Processing Unit-Graphics Processing Unit (CPU-GPU) heterogeneous high-performance computing systems. In addition, a dynamic workload partitioning scheme was proposed to efficiently take advantage of all the CPU and GPU resources. The workload is appropriately assigned to the CPU and GPU according to their computational capabilities, and all CPUs and GPUs perform the calculation concurrently. The numerical results demonstrate that the parallel algorithm maintains the desired accuracy. Meanwhile, the dynamic workload portioning scheme can provide the optimal workload partition based on the runtime performance. As a result, about 14% improvement is observed in the overall performance compared with the MPI+CUDA parallelization when the CPU-GPU heterogeneous computation is performed on 5 heterogeneous nodes (including 20 GPUs). 
Keywords:
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《核动力工程》浏览原始摘要信息
点击此处可从《核动力工程》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号