Ultrahigh-performance FFTs for the CRAY-2 and CRAY Y-MP supercomputers |
| |
Authors: | David A. Carlson |
| |
Affiliation: | (1) Supercomputing Research Center, 17100 Science Drive, 20715-4300 Bowie, MD, USA |
| |
Abstract: | In this paper a set of techniques for improving the performance of the fast Fourier transform (FFT) algorithm on modern vector-oriented supercomputers is presented. Single-processor FFT implementations based on these techniques are developed for the CRAY-2 and the CRAY Y-MP, and it is shown that they achieve higher performance than previously measured on these machines. The techniques include (1) using gather/scatter operations to maintain optimum length vectors throughout all stages of small-to medium-sized FFTs, (2) using efficient radix-8 and radix-16 inner loops, which allow a large number of vector loads/stores to be overlapped, and (3) prefetching twiddle factors as vectors so that on the CRAY-2 they can later be fetched from local memory in parallel with common memory accesses. Performance results for Fortran implementations using these techniques demonstrate that they are faster than Cray's library FFT routine CFFT2. The actual speedups obtained, which depend on the size of the FFT being computed and the supercomputer being used, range from about 5 to over 300%. |
| |
Keywords: | Fast Fourier transform supercomputer vector processing |
本文献已被 SpringerLink 等数据库收录! |
|