期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

由于计算机访问本地存储器的速度远远快于通过网络访问异地计算机存储器的速度,因此,在分布式存储环境中,如何对程序中引用的数据进行合理的分布,从而达到在本地进行计算时只需访问存储在本地的数据（即无通信的数据分布）的目的,已成为提高并行计算速度的关键问题,本文主要讨论如何在数组下标表达式为线性的条件下,对一种种锘于线性代数中超平面概念的数组线性划分技术进行扩充,并给出了完整的数据划式计算算法。相似文献

8.

并行词法分析器—并行编译理论研究之一

邹恒明《计算机学报》1990,13(12):940-945

本文提出了一套向量变换理论,在此基础上,设计了一个词法分析器。该词法分析器在KJ8920大型计算机上部分实现。相似文献

9.

优化并行图重写计算粒度的编译时部分调度策略

田新民王鼎兴《计算机学报》1992,15(11):838-847

并行图重写计算的有效实现需要压缩重写任务的频繁生成、切换和同步开销.为此本文提出了一种编译时重写粒度优化技术——编译时部分调度.其核心思想是基于对重写结点的全序性质和执行语义的分析,编译时构造保持原有执行语义的粗粒度顺序重写体。在本文定义的形式框架下,我们建立了编译时部分调度的安全条件,并给出了严格的证明.实验研究结果表明编译时部分调度能有效地增大重写粒度,重写任务数压缩了30—60％,并且计算的安全性得到了保证. 相似文献

10.

并行推理机编译技术的研究

黄志毅胡守仁《计算机科学》1990,17(3):10-15

自动模式识别,数据相关性分析、AND并行性的开发、副作用处理、并行性的粒度分析、并发语言的处理和WAM指令集的扩充是并行推理机编译中所面临的一些课题。本文对这些课题及我们所做的工作逐一作了论述,并展示了并行推理机编译技术研究的前景。相似文献

11.

面向MPP Fortran的程序自动并行化

唐新春郭克榕《计算机研究与发展》1996,33(8):566-573

ＭＰＰＦｏｒｔｒａｎ是Ｃｒａｙ公司为分布存储、全局编址的ＣｒａｙＴ３ＤＭＰＰ系统推出的一种数据并行语言。本文首先介绍了ＭＰＰＦｏｒｔｒａｎ的主要特点，然后以该语言为例，对面向ＭＰＰ系统程序自动并行化的基本内容以及关键技术进行了分析和探讨。相似文献

12.

一种具有重排通信的数据分布策略*

董春丽赵荣彩韩林张亚《计算机应用研究》2007,24(7):19-21

主要讨论了一种适用于分布和共享内存的循环级的数据分布策略.该方法支持由数据重排而引起的通信. 相似文献

13.

Minimal data dependence abstractions for loop transformations: Extended version

Yi-Qing Yang Corinne Ancourt François Irigoin 《International journal of parallel programming》1995,23(4):359-388

Many abstractions of program dependences have already been proposed, such as the Dependence Distance, the Dependence Direction Vector, the Dependence Level or the Dependence Cone. These different abstractions have different precisions. Theminimal abstraction associated to a transformation is the abstraction that contains the minimal amount of information necessary to decide when such a transformation is legal. Minimal abstractions for loop reordering and unimodular transformations are presented. As an example, the dependence cone, which approximates dependences by a convex cone of the dependence distance vectors, is the minimal abstraction for unimodular transformations. It also contains enough information for legally applying all loop reordering transformations and finding the same set of valid mono- and multi-dimensional linear schedules as the dependence distance set. 相似文献

14.

An Interleaving Transformation for Parallelizing Reductions for Distributed-Memory Parallel Machines

Wu Jan-Jan 《The Journal of supercomputing》2000,15(3):321-339

Reduction operations frequently appear in algorithms. Due to their mathematical invariance properties (assuming that round-off errorscan be tolerated), it is reasonable to ignore ordering constraints on the computation of reductions in order to take advantage of the computing power of parallel machines.One obvious and widely-used compilation approach for reductions is syntactic pattern recognition. Either the source language includes explicit reduction operators, or certain specific loops are recognized as equivalent to known reductions. Once such patterns are recognized, hand optimized code for the reductions are incorporated in the target program. The advantage of this approach is simplicity. However, it imposes restrictions on the reduction loops—no data dependence other than that caused by the reduction operation itself is allowed in the reduction loops.In this paper, we present a parallelizing technique, interleaving transformation, for distributed-memory parallel machines. This optimization exploits parallelism embodied in reduction loops through combination of data dependence analysis and region analysis. Data dependence analysis identifies the loop structures and the conditions that can trigger this optimization. Region analysis divides the iteration domain into a sequential region and an order-insensitive region. Parallelism is achieved by distributing the iterations in the order-insensitive region among multiple processors. We use a triangular solver as an example to illustrate the optimization. Experimental results on various distributed-memory parallel machines, including the Connection Machines CM-5, the nCUBE, the IBM SP-2, and a network of Sun Workstations are reported. 相似文献

15.

结合模型和迭代编译优化矩阵相乘程序

陆平静王正华车永刚《计算机工程与科学》2009,31(Z1)

高性能计算应用程序获得的持续性能与机器峰值性能的差距日益扩大,很大程度上制约着高性能计算的发展。程序变换通过对程序进行适应机器体系结构特征的优化变换,提高程序实际执行性能,是解决该问题的有效途径之一。很多高级程序变换均具有数值参数,为了获得最优性能,需要仔细选择参数的值。传统的编译器使用简单的模型选择这些参数,难以适应日趋复杂的硬件平台和应用程序。迭代编译通过生成不同的程序版本并在实际硬件评估上运行程序,来评估关键优化参数的值并决定能够产生最优性能的值,显著优于静态方法,但巨大的优化开销限制了其应用范围。本文针对矩阵相乘程序提出一种结合性能模型和迭代编译的优化方法,利用基于对机器体系结构和程序的经验知识构造性能模型约束优化空间,并使用遗传算法加速在优化空间中寻找优秀解的过程。实验结果表明,该方法可以较低的开销获得更优的性能优化效果。相似文献

16.

基于规范划分集的并行循环计算划分

下载免费PDF全文

黄其军杨建武余华山许卓群《软件学报》2003,14(3):362-368

计算划分问题是并行编译中最为重要的问题之一.针对并行循环,在数据分布确定的情况下,提出了基于规范集的计算划分算法,具体讨论了规范集的获取方法及综合通信与负载均衡的最优方案选取算法.实验表明,在并行循环处理方面,这一算法与以前几种算法相比更加简单、有效;采用这一算法的p_HPF编译器对数据并行应用问题可以获得良好的加速比和效率.该编译器已在石油领域得到应用. 相似文献

17.

多面体表示技术及在程序性能优化中的应用

陆平静车永刚束尧王正华《计算机工程与科学》2008,30(9):137-140

多面体表示技术提供一种统一化的方式来表示程序变换和程序变换组合,有利于最优程序变换的搜索。论文首先介绍并评价了几种典型的多面体表示方法,并详细介绍了Cohen提出的多面体表示模型;同时,对多面体表示在程序性能优化尤其是迭代编译中的应用进行了介绍和评价;最后,对多面体表示技术在迭代编译领域今后的发展方向做出了展望。相似文献

18.

Optimal Fine and Medium Grain Parallelism Detection in Polyhedral Reduced Dependence Graphs

Alain Darte Frédéric Vivien 《International journal of parallel programming》1997,25(6):447-496

This paper presents an optimal algorithm for detecting line or medium grain parallelism in nested loops whose dependences are described by an approximation of distance vectors by polyhedra. In particular, this algorithm is optimal for the classical approximation by direction sectors. This result generalizes, to the case of several statements. Wolf and Lam's algorithm which is optimal for a single statement. Our algorithm relies on a dependence uniformization process and on parallelization techniques related to system of uniform recurrence equations. It can also be viewed as a combination of both Allen and Kennedy's algorithm and Wolf and Lam's algorithm. 相似文献