首页 | 本学科首页   官方微博 | 高级检索  
     

基于CPU-GPU异构并行的MOC中子输运计算并行效率优化研究
引用本文:宋佩涛,张志俭,梁亮,张乾,赵强. 基于CPU-GPU异构并行的MOC中子输运计算并行效率优化研究[J]. 原子能科学技术, 2019, 53(11): 2209-2217. DOI: 10.7538/yzk.2019.youxian.0416
作者姓名:宋佩涛  张志俭  梁亮  张乾  赵强
作者单位:哈尔滨工程大学 核安全与仿真技术国防重点学科实验室,黑龙江 哈尔滨150001
摘    要:CPU-GPU异构系统为加速全堆芯特征线方法(MOC)精细计算提供了方法和思路。在实现基于CPU-GPU异构系统的二维MOC异构并行算法基础上,提出了性能分析模型,识别了影响异构并行算法并行效率的主要因素;针对识别到的性能影响因素,实现了输运计算与数据传递相互掩盖,提升了异构并行算法的整体并行效率。数值结果表明:程序具备良好的计算精度;数据传递(MPI通信和CPU与GPU之间的数据拷贝)是影响异构并行算法并行效率的主要因素;实现输运计算与数据传递相互掩盖后,程序性能和强并行效率均有所提升;5异构节点(包含20块GPU)并行时,程序整体效率提升达8%,强并行效率从87%提升到95%;相比CPU节点并行计算,4个CPU-GPU异构节点整体性能优于20个CPU节点。

关 键 词:异构并行   特征线方法   中子输运计算   GPU   CUDA

Study on Optimization of Parallel Efficiency of CPU-GPU Heterogeneous Parallelization for MOC Neutron Transport Calculation
SONG Peitao,ZHANG Zhijian,LIANG Liang,ZHANG Qian,ZHAO Qiang. Study on Optimization of Parallel Efficiency of CPU-GPU Heterogeneous Parallelization for MOC Neutron Transport Calculation[J]. Atomic Energy Science and Technology, 2019, 53(11): 2209-2217. DOI: 10.7538/yzk.2019.youxian.0416
Authors:SONG Peitao  ZHANG Zhijian  LIANG Liang  ZHANG Qian  ZHAO Qiang
Affiliation:Fundamental Science on Nuclear Safety and Simulation Technology Laboratory, Harbin Engineering University, Harbin 150001, China
Abstract:The CPU-GPU heterogeneous system provides method and idea for accelerating the whole-core MOC (method of characteristics) neutron transport calculation. A performance analysis model was proposed to identify the factors which significantly impact the parallel efficiency of the 2D MOC heterogeneous parallel algorithm based on the CPU-GPU heterogeneous system. Then the overall parallel efficiency was improved by the transport sweep and the data movement overlapping after the performance analysis. The numerical results demonstrate that the parallel algorithm maintains the desired accuracy. The data movement which includes the MPI communication and the data copy between CPU and GPU is the main factor affecting the parallel efficiency of heterogeneous parallel algorithm. The overall performance and the strong scaling efficiency are improved with the transport sweep and the data movement overlapping. About 8% improvement is observed in the overall performance and the strong scaling efficiency reaches 95% from 87% when 5 heterogeneous nodes (including 20 GPUs) are utilized to perform the simulation. Compared against the CPU-based parallelization, the overall performance of 4 CPU GPU heterogeneous nodes outperforms the performance of 20 CPU nodes.
Keywords:heterogeneous parallelization  method of characteristics  neutron transport calculation  GPU  CUDA  
点击此处可从《原子能科学技术》浏览原始摘要信息
点击此处可从《原子能科学技术》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号