Optimal code parallelization using unimodular transformations |
| |
Authors: | Michael L Dowling |
| |
Affiliation: | Abteilung für Mathematische Optimierung, Institut für Angewandte Mathematik, Carolo-Wilhelmina Universität zu Braunschweig, Pockelsstrasse 14, D-3300, Braunschweig, Germany |
| |
Abstract: | ![]() Lamport's parallelization algorithm (cf. [7]) is generalized to a broader class of loops, and the complexity of the transformation process has been estimated. It is shown that every loop can be parallelized using methods similar to those in [7]; moreover, they also have the property that all their inner loops are devoid of data dependencies, and so are fully parallelizable. Unfortunately, without restricting the nature of the loop to be parallelized, the negative solution to Hilbert's tenth problem (cf. [3]) can be applied to show that the parallelizing transformations are not computable. The class of affine loops was therefore introduced. This class is more general than that considered by Lamport, and it is shown that parallelizing transformations for affine loops are computable. In general, however, the complexity estimates for finding such loops suggest that the parallelization procedure will take longer than executing the original loop sequentially. It is further shown that, if the loop satisfies an additional, nondegeneracy condition, then the loop can be efficiently transformed.Finally, although more generally applicable, these methods are best applied to vectorization problems. |
| |
Keywords: | Parallel algorithms Multiprocessing Integer programming NP-completeness Scheduling Vectorization Problem insolubility |
本文献已被 ScienceDirect 等数据库收录! |
|