期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

陈莉张兆庆冯晓兵《计算机学报》2003,26(2):180-187

针对并行循环套序列,提出一种冗余计算分割的通信优化方法,根据数据流分析,文中给出用以确定每个循环套的冗余计算量的一般方法,并在此基础上提出冗余计算分割的实现和判定,针对规则依赖的程序,该文还提出了一个高效的冗余计算分割的实现方法,该技术已经在一个并行编译器中实现,试验结果表明,它比传统的通信优化技术有明显的优越性。相似文献

2.

Approaches Based on Permutations for Partitioning Sparse Matrices on Multiprocessors

E.?M.?Garz Email author I.?García 《The Journal of supercomputing》2005,34(1):41-61

This paper introduces new approaches to the data distribution-partition problem for sparse matrices in a multiprocessor environment. In this work, the data partition problem of a sparse matrix is modeled as a Min-Max Problem subject to the uniformity constrain when the goal is to balance the load for both sparse and dense operations. This problem is NP-Complete and two heuristic solutions (ABO and GPB) are proposed. The key of ABO and GPB is to determine the permutation of rows/columns of the input sparse matrix to obtain a sorted matrix with a homogeneous density of nonzero elements. Due to the heuristic nature of the proposed methods their validation is carried out by a comparative study of the parallel efficiency of two types of problems (sparse and mixed) when ABO, GPB, Block, Cyclic and MRD data distributions are applied.This work has been partially supported by the Spanish CICYT through grant TIC2002-00228. 相似文献

3.

一种改进的并行计算图划分模型

马永刚谭国真杨际祥潘东《小型微型计算机系统》2011,32(3)

图划分成功地应用在许多领域,但应用于并行计算时,使用边割度量通信量,其主要缺点是不能准确代表通信量,而且图划分模型没有考虑通信延迟和通信额外开销的分布对并行性能的影响.提出了改进的图划分模型,该模型将影响并行性能的多个要素(通信延迟、最大的局部通信额外开销和整体通信额外开销)整合到一个统一的代价函数,不仅克服了图划分模型中边割度量的一些缺点,而且可以通过调整加权参数,处理不同的优化目标和强调不同因素对并行性能的影响. 相似文献

4.

Reducing Cache Conflicts by Multi-Level Cache Partitioning and Array Elements Mapping

Chang Chih-Yung Sheu Jang-Ping Chen Hsi-Chiuen 《The Journal of supercomputing》2002,22(2):197-219

This article presents an algorithm to reduce cache conflicts and improve cache localities. The proposed algorithm analyzes locality reference space for each reference pattern, partitions the multi-level cache into several parts with different sizes, and then maps array data onto the scheduled cache positions to eliminate cache conflicts. A greedy method for rearranging array variables in declared statement is also developed, to reduce the memory overhead for mapping arrays onto a partitioned cache. Besides, loop tiling and the proposed schemes are combined to exploit opportunities for both temporal and spatial reuse. Atom is used as a tool to develop a simulation of the behavior of the direct-mapping cache to demonstrate that our approach is effective at reducing number of cache conflicts and exploiting cache localities. Experimental results reveal that applying the cache partitioning scheme can greatly reduce the cache conflicts and thus save program execution time in both single-level cache and multi-level cache hierarchies. 相似文献

5.

视频编码中DCT系数基于重量化的数据分割

谢剑锋杜建洪《计算机工程》2004,30(12):147-148,161

数据分割技术将视频流分成不同的部分进行传输。传统的数据分割算法是采用基于DCT系数的频率域分层技术。文章提出的算法对DCT系数进行重量化,量化后的系数作为基本层的数据,而量化后的余值作为增强层的数据。实验证明,与频率域分层技术相比,该算法提供了较好的图像质量。相似文献

6.

A Computation+Communication Load Balanced Loop Partitioning Method for Distributed Memory Systems

Santosh Pande Tareq Bali 《Journal of Parallel and Distributed Computing》1999,58(3):251

Due to a significant communication overhead of sending and receiving data, the loop partitioning approaches on distributed memory systems must guarantee not just the computation load balance but computation+communication load balance. The previous approaches in loop partitioning have achieved a communication-free, computation load balanced iteration space partitioning solution for a limited subset of DOALL loops. But a large category of DOALL loops inevitably result in communication and the trade-offs between computation and communication must be carefully analyzed for these loops in order to balance out the combined computation time and communication overheads. In this work, we describe a partitioning approach based on the above motivation for the general cases of DOALL loops. Our goal is to achieve a computation+communication load balanced partitioning through static data and iteration space distribution. Our approach first performs partitioning of iteration and data spaces of a loop nest by analyzing communication and parallelism; it then performs architecture-dependent analysis to adjust the granularity of partitions, load balance each partition with respect to total computation+communication, and then performs mapping of partitions onto the available number of processors. This multiphase partitioning method works as follows. First, the code partitioning phase analyzes the references in the body of the DOALL loop nest and determines a set of directions for reducing a larger degree of communication by trading a lesser degree of parallelism. The partitioning is carried out in the iteration space of the loop by cyclically following a set of direction vectors such that the data references are maximally localized and reused, eliminating a larger communication volume than parallelism. We then perform data space partitioning based on a new larger partition owns rule to minimize the communication overhead for a compute intensive partition by localizing its references relatively more than a smaller noncompute intensive partition. A partition interaction graph is then constructed which is used by the architecture-dependent analysis phase to merge the partitions to achieve granularity adjustment, computation+communication load balance, and mapping on the actual number of available processors. Relevant theory and algorithms are developed along with a performance evaluation on the Cray T3D. 相似文献

7.

基于0-1规划的并行计算图划分模型

马永刚谭国真王启冬《小型微型计算机系统》2011,32(8)

图划分广泛地应用在许多科学与工程领域,但它应用于并行计算任务分配时,使用无向图表示数据依赖关系,这限制了它的应用(例如,无向图不能表示矩形和非对称依赖关系的应用).为了克服图划分的这个缺点,我们对数据间的依赖关系进行区分(即同一条边区分通信的发送方与接收方),然后基于0-1规划模型化这个问题,并通过互联网上求解优化问题常用的NEOS服务器进行求解,在一些数据集上的实验表明,0-1规划方法优于求解图划分流行的多层划分方法. 相似文献

8.

Partitioning and mapping of parallel programs by self-organization

Hans-Ulrich Heiss Marcus Dormanns 《Concurrency and Computation》1996,8(9):685-706

To execute a parallel program on a multicomputer system, the tasks of the program have to be mapped to the particular processors of the parallel machine. The aim of the mapping is twofold: (i) to achieve a balanced load on the processors (partitioning problem) and (ii) to keep communication delays low by placing communicating tasks closely together (mapping). Since both the communication structure of the program and the interconnection structure of the parallel machine can be represented as graphs, the mapping problem can be regarded as a graph embedding problem to minimize communication costs. As a new heuristic approach to this NP-hard problem we apply Kohonen's self-organizing maps to establish a topology-preserving embedding. Experimental results are presented and compared to other approaches to this problem. The most attractive feature of our new method is that it can be extremely well parallelized. 相似文献

9.

基于CPU／GPU异构平台并行优化的研究

杨芳菊《电脑编程技巧与维护》2012,(18):4-7,67

CPU／GPU异构系统具有很大的发展潜力,深入研究CPU／GPU异构平台的并行优化,可实现系统整体计算能力的最大化。通过对CPU／GPU任务划分的优化来平衡CPU和GPU的负载,可提高计算资源的利用率,缩短计算任务的执行时间;通过对GPU线程划分的优化,可使GPU获得更高的速度。从而提高系统整体性能。相似文献

10.

Data Partitioning for Parallel Spatial Join Processing 总被引：1，自引：0，他引：1

Xiaofang Zhou David J. Abel David Truffet 《GeoInformatica》1998,2(2):175-204

The cost of spatial join processing can be very high because of the large sizes of spatial objects and the computation-intensive spatial operations. While parallel processing seems a natural solution to this problem, it is not clear how spatial data can be partitioned for this purpose. Various spatial data partitioning methods are examined in this paper. A framework combining the data-partitioning techniques used by most parallel join algorithms in relational databases and the filter-and-refine strategy for spatial operation processing is proposed for parallel spatial join processing. Object duplication caused by multi-assignment in spatial data partitioning can result in extra CPU cost as well as extra communication cost. We find that the key to overcome this problem is to preserve spatial locality in task decomposition. In this paper we show that a near-optimal speedup can be achieved for parallel spatial join processing using our new algorithms. 相似文献