基于CPU-GPU异构并行的MOC中子输运计算并行效率优化研究 Study on Optimization of Parallel Efficiency of CPU-GPU Heterogeneous Parallelization for MOC Neutron Transport Calculation期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于CPU-GPU异构并行的MOC中子输运计算并行效率优化研究

引用本文：	宋佩涛,张志俭,梁亮,张乾,赵强. 基于CPU-GPU异构并行的MOC中子输运计算并行效率优化研究[J]. 原子能科学技术, 2019, 53(11): 2209-2217. DOI: 10.7538/yzk.2019.youxian.0416

作者姓名：	宋佩涛张志俭梁亮张乾赵强

作者单位：	哈尔滨工程大学核安全与仿真技术国防重点学科实验室，黑龙江哈尔滨150001

摘要：	CPU-GPU异构系统为加速全堆芯特征线方法（MOC）精细计算提供了方法和思路。在实现基于CPU-GPU异构系统的二维MOC异构并行算法基础上，提出了性能分析模型，识别了影响异构并行算法并行效率的主要因素；针对识别到的性能影响因素，实现了输运计算与数据传递相互掩盖，提升了异构并行算法的整体并行效率。数值结果表明：程序具备良好的计算精度；数据传递（MPI通信和CPU与GPU之间的数据拷贝）是影响异构并行算法并行效率的主要因素；实现输运计算与数据传递相互掩盖后，程序性能和强并行效率均有所提升；5异构节点（包含20块GPU）并行时，程序整体效率提升达8%，强并行效率从87%提升到95%；相比CPU节点并行计算，4个CPU-GPU异构节点整体性能优于20个CPU节点。
关键词：	异构并行特征线方法中子输运计算 GPU CUDA
Study on Optimization of Parallel Efficiency of CPU-GPU Heterogeneous Parallelization for MOC Neutron Transport Calculation

SONG Peitao,ZHANG Zhijian,LIANG Liang,ZHANG Qian,ZHAO Qiang. Study on Optimization of Parallel Efficiency of CPU-GPU Heterogeneous Parallelization for MOC Neutron Transport Calculation[J]. Atomic Energy Science and Technology, 2019, 53(11): 2209-2217. DOI: 10.7538/yzk.2019.youxian.0416

Authors:	SONG Peitao ZHANG Zhijian LIANG Liang ZHANG Qian ZHAO Qiang

Affiliation:	Fundamental Science on Nuclear Safety and Simulation Technology Laboratory, Harbin Engineering University, Harbin 150001, China

Abstract:	The CPU-GPU heterogeneous system provides method and idea for accelerating the whole-core MOC (method of characteristics) neutron transport calculation. A performance analysis model was proposed to identify the factors which significantly impact the parallel efficiency of the 2D MOC heterogeneous parallel algorithm based on the CPU-GPU heterogeneous system. Then the overall parallel efficiency was improved by the transport sweep and the data movement overlapping after the performance analysis. The numerical results demonstrate that the parallel algorithm maintains the desired accuracy. The data movement which includes the MPI communication and the data copy between CPU and GPU is the main factor affecting the parallel efficiency of heterogeneous parallel algorithm. The overall performance and the strong scaling efficiency are improved with the transport sweep and the data movement overlapping. About 8% improvement is observed in the overall performance and the strong scaling efficiency reaches 95% from 87% when 5 heterogeneous nodes (including 20 GPUs) are utilized to perform the simulation. Compared against the CPU-based parallelization, the overall performance of 4 CPU GPU heterogeneous nodes outperforms the performance of 20 CPU nodes.

Keywords:	heterogeneous parallelization method of characteristics neutron transport calculation GPU CUDA

	点击此处可从《原子能科学技术》浏览原始摘要信息
	点击此处可从《原子能科学技术》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏