首页 | 本学科首页   官方微博 | 高级检索  
     


Gyrokinetic particle-in-cell optimization on emerging multi- and manycore platforms
Authors:Kamesh MadduriEun-Jin Im,Khaled Z. Ibrahim,Samuel WilliamsSté  phane Ethier,Leonid Oliker
Affiliation:a Computational Research Division, Lawrence Berkeley National Laboratory, CA 94720, United States
b School of Computer Science, Kookmin University, Seoul 136-702, Republic of Korea
c Princeton Plasma Physics Laboratory, Princeton, NJ 08543, United States
Abstract:The next decade of high-performance computing (HPC) systems will see a rapid evolution and divergence of multi- and manycore architectures as power and cooling constraints limit increases in microprocessor clock speeds. Understanding efficient optimization methodologies on diverse multicore designs in the context of demanding numerical methods is one of the greatest challenges faced today by the HPC community. In this work, we examine the efficient multicore optimization of GTC, a petascale gyrokinetic toroidal fusion code for studying plasma microturbulence in tokamak devices. For GTC’s key computational components (charge deposition and particle push), we explore efficient parallelization strategies across a broad range of emerging multicore designs, including the recently-released Intel Nehalem-EX, the AMD Opteron Istanbul, and the highly multithreaded Sun UltraSparc T2+. We also present the first study on tuning gyrokinetic particle-in-cell (PIC) algorithms for graphics processors, using the NVIDIA C2050 (Fermi). Our work discusses several novel optimization approaches for gyrokinetic PIC, including mixed-precision computation, particle binning and decomposition strategies, grid replication, SIMDized atomic floating-point operations, and effective GPU texture memory utilization. Overall, we achieve significant performance improvements of 1.3-4.7× on these complex PIC kernels, despite the inherent challenges of data dependency and locality. Our work also points to several architectural and programming features that could significantly enhance PIC performance and productivity on next-generation architectures.
Keywords:Particle-in-cell   Multicore   Manycore   Code optimization   Graphic processing units   Fermi
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号