Post-pass partitioning of signal processing programs |
| |
Authors: | Chris J. Newburn John Paul Shen |
| |
Affiliation: | (1) Department of Electrical and Computer Engineering, Carnegie Mellon University, USA |
| |
Abstract: | Symmetric multiprocessor systems are increasingly common, not only as high-throughput servers, but as a vehicle for executing a single application in parallel in order to reduce its execution latency. This article presents Pedigree, a compilation tool that employs a new partitioning heuristic based on the program dependence graph (PDG). Pedigree creates overlapping, potentially interdependent threads, each executing on a subset of the SMP processors that matches the thread’s available parallelism. A unified framework is used to build threads from procedures, loop nests, loop iterations, and smaller constructs. Pedigree does not require any parallel language support; it is post-compilation tool that reads in object code. The SDIO Signal and Data Processing Benchmark Suite has been selected as an example of real-time, latency-sensitive code. Its coarse-grained data flow parallelism is naturally exploited by Pedigree to achieve speedups of 1.63×/2.13× (mean/max) and 1.71×/2.41× on two and four processors, respectively. There is roughly a 20% improvement over existing techniques that exploit only data parallelism. By exploiting the unidirectional flow of data for coarse-grained pipelining, the synchronization overhead is typically limited to less than 6% for synchronization latency of 100 cycles, and less than 2% for 10 cycles. This research was supported by ONR contract numbers N00014-91-J-1518 and N00014-96-1-0347. We would like to thank the Pittsburgh Supercomputing Center for use of their Alpha systems. |
| |
Keywords: | Post-pass partitioning threading multiprocessing compiler PDG retargetable Pedigree |
本文献已被 SpringerLink 等数据库收录! |
|