期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

COMPILE TIME PARTITIONING OF NESTED LOOP ITERATION SPACES WITH NON-UNIFORM DEPENDENCES*

《International Journal of Parallel, Emergent and Distributed Systems》2012,27(1-3):113-141

In this paper we address the problem of partitioning nested loops with non-uniform (irregular) dependence vectors. Parallelizing and partitioning of nested loops requires efficient inter-iteration dependence analysis. Although many methods exist for nested loop partitioning, most of these perform poorly when parallelizing nested loops with irregular dependences. Unlike the case of nested loops with uniform dependences these will have a complicated dependence pattern which forms a non-uniform dependence vector set. We apply the results of classical convex theory and principles of linear programming to iteration spaces and show the correspondence between minimum dependence distance computation and iteration space tiling. Cross-iteration dependences are analyzed by forming an Integer Dependence Convex Hull (IDCH). Every integer point in this IDCH corresponds to a dependence vector in the iteration space of the nested loops. A simple way to compute minimum dependence distances from the dependence distance vectors of the extreme points of the IDCH is presented. Using these minimum dependence distances the iteration space can be tiled. Iterations within a tile can be executed in parallel and the different tiles can then be executed with proper synchronization. We demonstrate that our technique gives much better speedup and extracts more parallelism than the existing techniques. 相似文献

2.

分布存储系统中优化通信的冗余计算分割 总被引：1，自引：0，他引：1

陈莉张兆庆冯晓兵《计算机学报》2003,26(2):180-187

针对并行循环套序列，提出一种冗余计算分割的通信优化方法，根据数据流分析，文中给出用以确定每个循环套的冗余计算量的一般方法，并在此基础上提出冗余计算分割的实现和判定，针对规则依赖的程序，该文还提出了一个高效的冗余计算分割的实现方法，该技术已经在一个并行编译器中实现，试验结果表明，它比传统的通信优化技术有明显的优越性。相似文献

3.

Statement-Level Communication-Free Partitioning Techniques for Parallelizing Compilers 总被引：3，自引：3，他引：0

Shih Kuei-Ping Sheu Jang-Ping Huang Chua-Huang 《The Journal of supercomputing》2000,15(3):243-269

This paper addresses the problem of communication-free partition of iteration spaces and data spaces along hyperplanes. To finding more possible communication-free hyperplane partitions, we treat statements within a loop body as separate schedulable units. Instead of using the information about data dependence distance or direction vectors, our technique explicitly formulates array references as transformations from statement-iteration spaces to data spaces. Based on these transformations, the necessary and sufficient conditions for communication-free partition along hyperplanes to be feasible have been proposed. This approach can be applied to all programs with an imperfectly nested loop or sequences of imperfectly nested loops, whose array references are affine functions of outer loop indices or loop invariant variables. The proposed approach is more practical than existing methods in finding the data and computation distribution patterns that can cause the processor to execute fully-parallel on multicomputers without any interprocessor communication. 相似文献

4.

面向SIMD机器的全局自动数据分割

林进朱宁宁张兆庆乔如良《计算机学报》1999,22(6):596-602

提出了一种面向ＳＩＭＤ机器的全局数据自动分割算法,该算法能处理多个非紧嵌折循环嵌套,并且数组下标存取为循环变量的线性式,首先通过数据与迭代映射抽象了计算中的通信方式,然事提出识别规则模式通信模式的形式比条件,接着建立包含对准信息和相应通信开销的数据迭代图,并在数据迭代图的基础上提出了一个启发式算法来计算较优的数据分布和迭代分布,以优化处理单元之间的通信开销,通过发析多个循环嵌套所涉及的多个数组映和相似文献

5.

Processor preallocation and load balancing of DOALL loops

Gary W. Elsesser Viet N. Ngo Sourav Bhattacharya Wei -Tek Tsai 《The Journal of supercomputing》1994,8(2):135-161

Load balance is important because it may affect the speedup attained through the concurrent execution of loop iterations on a parallel processor. We study loop load balance in the context of the well-known Perfect benchmarks. Several static and dynamic characteristics of the Perfect benchmark DOALL loops are observed and interpreted. Thelate arrival of processors is noted as a major source of load imbalance. This observation suggested the idea ofprocessor preallocation. An analytic cost model is presented and the advantages of processor preallocation are demonstrated by experimental evaluation on a CRAY Y-MP8 under the Unicos operating system. 相似文献

6.

Automatic partitioning of parallel loops and data arrays fordistributed shared-memory multiprocessors

Agarwal A. Kranz D.A. Natarajan V. 《Parallel and Distributed Systems, IEEE Transactions on》1995,6(9):943-962

Presents a theoretical framework for automatically partitioning parallel loops to minimize cache coherency traffic on shared-memory multiprocessors. While several previous papers have looked at hyperplane partitioning of iteration spaces to reduce communication traffic, the problem of deriving the optimal tiling parameters for minimal communication in loops with general affine index expressions has remained open. Our paper solves this open problem by presenting a method for deriving an optimal hyperparallelepiped tiling of iteration spaces for minimal communication in multiprocessors with caches. We show that the same theoretical framework can also be used to determine optimal tiling parameters for both data and loop partitioning in distributed memory multicomputers. Our framework uses matrices to represent iteration and data space mappings and the notion of uniformly intersecting references to capture temporal locality in array references. We introduce the notion of data footprints to estimate the communication traffic between processors and use linear algebraic methods and lattice theory to compute precisely the size of data footprints. We have implemented this framework in a compiler for Alewife, a distributed shared-memory multiprocessor 相似文献

7.

EIGENVECTORS-BASED PARALLELISATION OF NESTED LOOPS WITH AFFINE DEPENDENCES

《International Journal of Parallel, Emergent and Distributed Systems》2012,27(3):227-248

Abstract

This paper presents a method for parallelising nested loops with affine dependences. The data dependences of a program are represented exactly using a dependence matrix rather than an imprecise dependence abstraction. By a careful analysis of the eigenvectors and eigenvalues of the dependence matrix, we detect the parallelism inherent in the program, partition the iteration space of the program into sequential and parallel regions, and generate parallel code to execute these regions. For a class of programs considered in the paper, the proposed method can expose more coarse-grain and fine-grain parallelism than a hyperplane-based loop transformation. 相似文献

8.

基于SIMD机器的优化数据传输的并行循环分割 总被引：2，自引：1，他引：2

林进张兆庆祝明发《计算机学报》1998,21(7):577-585

本文提出一个基于分布式局存的ＳＩＭＤ机器的循环分割理论体系以优化运算中所需要的数据传输。该体系使用矩阵表示迭代空间、数据空间和数组存取式。我们引入数据传输概念，并建立一个简单有效的数据传输模型来评估数据在全局内存和局部内存之间的传输开销。最后，对于给定的循环嵌套，我们给出一个循环分割算法以获得优化循环块，使得循环嵌套中所需要的数据传输开销最小，并且大大减少了数据传输和计算的同步开销。实验结果证明了相似文献

9.

一种面向众核处理器的嵌套循环多维并行识别方法*

李颖颖庞建民李雁冰翟胜伟《计算机应用研究》2018,35(11)

现有并行识别方法用于众核处理器时存在一定不足,当选择的循环并行维迭代数较少时可能导致严重地负载不均衡。针对这一问题,提出了一种面向众核处理器的多维并行识别方法,在现有并行识别方法无法做到较好的负载均衡时,选择嵌套循环的多个维进行并行,将多个并行维的迭代空间合并后再做任务划分,减少负载不均衡对程序并行效率的影响。此方法已在课题组开发的自动并行化系统中进行了实现,实际应用过程中能够提升一些应用程序在众核处理器上并行执行的效率。相似文献

10.

Loop recreation for thread‐level speculation on multicore processors

Lin Gao Jingling Xue Tin‐Fook Ngai 《Software》2010,40(1):45-72

Inter‐iteration dependences in loops can hinder loop‐level parallelism. For some loops, existing thread‐level speculation techniques fail to expose their inherent loop‐level parallelism, because some inter‐iteration dependences are too costly to synchronize, predict, pre‐compute and isolate. This paper presents a compiler technique called loop recreation to change the nature of some dependences (by turning some inter‐iteration dependences into intra‐iteration ones and vice versa) in a loop so that the inter‐iteration dependences in the transformed loop are less costly to enforce at runtime than those in the original loop. We present an algorithm for finding an optimal loop recreation transformation with respect to a simple misspeculation cost model and demonstrate the performance advantages of loop recreation over two recent techniques for multicore systems running nine representative irregular applications. Copyright © 2009 John Wiley & Sons, Ltd. 相似文献