Scalability of parallel spatial direct numerical simulations on intel hypercube and IBM SP1 and SP2 |
| |
Authors: | Ronald D. Joslin Ulf R. Hanebutte Mohammad Zubair |
| |
Affiliation: | (1) Flow Modeling and Control Branch, NASA Langley Research Center, 23681 Hampton, Virginia;(2) Reactor Analysis Division, Argonne National Laboratory, 60439 Argonne, Illinois;(3) IBM Thomas J. Watson Research Center, 10598 Yorktown Heights, New York |
| |
Abstract: | The implementation and performance of a parallel spatial direct numerical simulation (PSDNS) approach on the Intel iPSC/860 hypercube and IBM SP1 and SP2 parallel computers is documented. Spatially evolving disturbances associated with laminar-to-turbulent transition in boundary-layer flows are computed with the PSDNS code. The feasibility of using the PSDNS to perform transition studies on these computers is examined. The results indicate that PSDNS approach can effectively be parallelized on a distributed-memory parallel machine by remapping the distributed data structure during the course of the calculation. Scalability information is provided to estimate computational costs to match the actual costs relative to changes in the number of grid points. By increasing the number of processors, slower than linear speedups are achieved with optimized (machine-dependent library) routines. This slower than linear speedup results because the computational cost is dominated by FFT routine, which yields less than ideal speedups. By using appropriate compile options and optimized library routines on the SP1, the serial code achieves 52–56 Mflops on a single node of the SP1 (45 percent of theoretical peak performance). The actual performance of the PSDNS code on the SP1 is evaluated with a real world simulation that consists of 1.7 million grid points. One time step of this simulation is calculated on eight nodes of the SP1 in the same time as required by a Cray Y/MP supercomputer. For the same simulation, 32-nodes of the SP1 and SP2 are required to reach the performance of a Cray C-90. A 32 node SP1 (SP2) configuration is 2.9 (4.6) times faster than a Cray Y/MP for this simulation, while the hypercube is roughly 2 times slower than the Y/MP for this application. |
| |
Keywords: | Spatial direct numerical simulations incompressible viscous flows spectral methods finite differences parallel computing |
本文献已被 SpringerLink 等数据库收录! |
|