Automatically Partitioning Threads for Multithreaded Architectures |
| |
Affiliation: | 1. Energy Efficient Embedded Systems (EEES) Lab – DEI, University of Bologna, Italy;2. Center for Mind/Brain Sciences, University of Trento, Italy;3. Energy Efficient Embedded Digital Architectures (E3DA) Unit – ICT Center, Fondazione Bruno Kessler, Italy;4. Integrated System Laboratory ETH, Zurich, Switzerland;2. Information Technology and Services Accenture, Turin, Italy |
| |
Abstract: | There is an enormous amount of parallelism exposed to fine-grain multithreaded architectures to cover latencies. It is a demanding task for a multithreading programmer to manage such a degree of parallelism by hand. To use multithreaded architectures efficiently it is essential to have compiler support for automatically partitioning programs into threads. This paper solves a fundamental problem in compiling for multithreaded architectures, automatically partitioning a program into threads. The focus of such partitioning is to overlap the remote communication latency and minimize the total execution time. We first formulate the partitioning problem based on a multithreaded execution cost model. Then, we prove such a formulation is NP-hard. Therefore, we propose two heuristic thread-partitioning methods to solve this problem in practice. The advanced partitioning algorithm is a novel extension of list scheduling, and it takes advantage of the cost model to generate near-optimum partitioning results. The remote-path-based partitioning algorithm is a simplified version of the advanced one but it is easy for compiler implementation. The two partitioning algorithms were implemented respectively in a thread partitioning testbed and a research EARTH-C compiler. The experimental results show that both partitioning algorithms are effective to generate efficient threaded code, and code generated by the compiler is comparable to hand-written code. |
| |
Keywords: | |
本文献已被 ScienceDirect 等数据库收录! |
|