首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 203 毫秒
1.
针对应用自动单模变换的两大困难,如何自动 使多重不并行化的恰当的单模变换矩阵以及如何解决防碍模国计算的非常数归约相关距离,提出了如何对给定常数距离矩阵,自动找出使循环并行化的恰当的单模变换矩阵的技术,然后提出将组归约相关表示为最小常数距离向量,从而使存在归约相关的多重量重循环也能够应用自动单模变换技术,为自动单模变换技术走向实用化提供了理论依据。  相似文献   

2.
针对应用自动单模变换的两大困难:如何自动找出使多重循环并行化的恰当的单模变换矩阵以及如何解决妨碍单模矩阵计算的非常数归约相关距离,提出了如何对给定常数距离矩阵,自动找出使循环并行化的恰当的单模变换矩阵的技术;然后提出将数组归约相关表示为最小常数距离向量,从而使存在归约相关的多重循环也能够应用自动单模变换技术,为自动单模变换技术走向实用化提供了理论依据.  相似文献   

3.
利用U模变换增加并行粒度与改善数据访问局部性的方法   总被引:3,自引:0,他引:3  
提出了一种利用循环变换增加循环并行粒度,改善循环数据访问局部性的方法,该方法利用了给定二重循环的相关向量集的某些性质,将外层循环变量不同而内层循环变量相等的若干次迭代合并,成为折叠后迭代空间的一个结点,并且保持内层循环的并行性不变,从而达到增加循环并行粒度的目的。对于更普遍的情况,该文讨论了如何根据给定循环的循环向量集,确定一个U模变换对迭代空间进行变换,达到内层循环可并行和扩大循环粒度两个目的,针对循环变换中数据访问局部性可能变差的问题,该文提出了对内层循环先合并,根据合并后的相关向量集变换迭代空间,以及折叠迭代空间的方法,该文的方法是Wavefront循环并行化方法的一种扩展。  相似文献   

4.
分析了并行关联规则挖掘算法存在的不足,提出了一种改进的关联规则挖掘的多核并行优化算法。该算法对Apriori算法的压缩矩阵进行了改造,并在多核平台下利用OpenMP技术和TBB技术对串行程序进行循环并行化和任务分配的并行化设计,最大限度地实现并行关联规则挖掘。  相似文献   

5.
在并行编译中,循环变换是开发程序并行度的主要方法,但存在复杂控制流的非紧密嵌套循环往往无法得到有效的并行化。文章结合分析Benchmark和实现自动并行化系统AFT中复杂非紧密嵌套循环变换的经验,给出复杂非紧密嵌套循环变换的特点及其在并行编译中的应用。  相似文献   

6.
对于多重循环中,当内层迭代的上下界为外层迭代的仿射函数时,介绍了搜索脉动变换的一个优化策略和相应的自动化算法.  相似文献   

7.
针对现有通信优化算法无法使MPI自动并行化编译器生成加速比理想的消息传递程序问题,提出了一种基于重排序变换和循环分布的通信优化算法。该算法根据给出的过程间副作用集合和基于mpi_wait/mpi_irecv移动的重排序变换规则,有序地采用重排序变换和循环分布,尽可能安全地扩大点到点非阻塞通信中通信与计算的重叠窗口,使MPI自动并行化编译器生成具有更多计算重叠通信的消息传递代码。实验结果表明,该算法能够隐藏更多的点到点非阻塞通信开销,并且明显提升消息传递程序的加速比。  相似文献   

8.
非结构网格的并行多重网格解算器   总被引:2,自引:0,他引:2  
李宗哲  王正华  姚路  曹维 《软件学报》2013,24(2):391-404
多重网格方法作为非结构网格的高效解算器,其串行与并行实现在时空上都具有优良特性.以控制方程离散过程为切入点,说明非结构网格在并行数值模拟的流程,指出多重网格方法主要用于求解时间推进格式产生的大规模代数系统方程,简述了算法实现的基本结构,分析了其高效性原理;其次,综述性地概括了几何多重网格与代数多种网格研究动态,并对其并行化的热点问题进行重点论述.同时,针对非结构网格的实际应用,总结了多重网格解算器采用的光滑算子;随后列举了非结构网格应用的部分开源项目软件,并简要说明了其应用功能;最后,指出并行多重网格解算器在非结构网格应用中的若干关键问题和未来的研究方向.  相似文献   

9.
循环优化对于提高Cache性能、发掘程序的并行性以及减少执行循环的开销都有着重要的作用,证明带循环优化功能的现代编译器的正确性已成为可信编译的一个挑战性的问题.形式化证明一个羽翼丰满的优化编译器本质上是不可行的,可以使用替代的方法,即不是证明优化编译器本身,而是形式化证明每一次循环变换前后编译对象的正确性.提出一种新颖的基于扩展逻辑变换系统μTS来证明循环优化正确性的方法.系统μTS在逻辑变换系统TS的基础上扩展了若干条派生规则,经谓词抽象将源程序与目标程序转换为形式化Radl语言后,使用μTS的派生规则能证明常见循环变换的正确性,如循环融合、循环分配、循环交换、循环反转、循环分裂、循环脱皮、循环调整、循环展开、循环铺盖、循环判断外提、循环不变代码外提等.循环优化可以看作一系列循环变换的组合,从而系统μTS能证明循环优化的正确性.为了支持自动化证明循环优化的正确性并出示证据,进一步提出了一个辅助证明算法.最后通过一个典型实例对这一方法进行了详细的阐述,实际效果表明了该方法的有效性.该方法对设计高可信优化编译器具有重要的指导意义.  相似文献   

10.
快速小波变换是数字信号处理面临的一个重要问题,针对并行小波算法展开研究,缩减小波变换中卷积运算的规模,提高小波变换过程中的并行效能,以实现小波变换的快速并行计算。通过FFT矩阵代入计算,消去了并行计算过程中的同步通信,降低了乘法运算次数。对算法思想进行了理论分析,说明新算法在短小数据分段情况下能够减少50%~75%的乘法操作;通过搭建两种不同平台进行了对比测试,证明了算法的先进性与有效性。基于FFT矩阵的并行小波变换算法是一种稳定有效的经典小波并行算法。  相似文献   

11.
Many abstractions of program dependences have already been proposed, such as the Dependence Distance, the Dependence Direction Vector, the Dependence Level or the Dependence Cone. These different abstractions have different precisions. Theminimal abstraction associated to a transformation is the abstraction that contains the minimal amount of information necessary to decide when such a transformation is legal. Minimal abstractions for loop reordering and unimodular transformations are presented. As an example, the dependence cone, which approximates dependences by a convex cone of the dependence distance vectors, is the minimal abstraction for unimodular transformations. It also contains enough information for legally applying all loop reordering transformations and finding the same set of valid mono- and multi-dimensional linear schedules as the dependence distance set.  相似文献   

12.
Linear transformations are widely used to vectorize and parallelize loops. A subset of these transformations are unimodular transformations. When a unimodular transformation is used, the exact bounds of the transformed loop nest are easily computed and the steps of the loops are equal to 1. Unimodular loop transformations have been widely used since they permit the implementation of many useful loop transformations. Recently, nonunimodular transformations have been proposed to reduce communication requirements or to use the memory hierarchy efficiently. The methods used for unimodular transformations do not work in the case of nonunimodular transformations, since they do not produce the exact bounds of the transformed loop nest. In this paper, we present a method for nested loop transformation which gives the exact bounds for both unimodular and nonunimodular transformations. The basic idea is to use the Hermite Normal Form (HNF) of the transformation matrix  相似文献   

13.
归约识别及其单模变换   总被引:1,自引:0,他引:1  
数组归约的识别是提高并行化编译能力的有效方法,单模是开发程序并行性的重要手段。然而,由于归约语句间相关的特殊性,影响了单模变换的实施。本文从归约语句引起的相关本质特征入手,分析了归约语句和单模变换的相互影响,提出了在归约语句存在的情况下单模变换的具体方法。  相似文献   

14.
Loop transformations,such as loop interchange,reversal and skewing,have been unified under linear matrix transformations.A legal transformation matrix is usually generated based upon distance vectors or direction vectors.Unfortunately,for some nested loops,distance vectors may not be computable and direction vectors, Unfortunately,for some nested loops,distance vectors may not be computable and direction vectors,on the other hand,may not contain useful information.We propose the use of linear equations or inequalities of distance vectors to approximate data dependence.This approach is advantageous since(1) many loops having no constant distance vectors have very simple equations of distance vectors;(2) these equations contain more information than direction vectors do,thus the chance of exploiting potential parallelism is improved.In general,the equations or inequalities that approximate the data dependence of a given nested loop is not unique,hence classification is discussed for the purpose of loop transformationEfficient algorithms are developed to generate all kinds of linear equations of distance vectors for a given nested loop.The issue of how to obtain a desired transformation matrix from those equations is also addressed.  相似文献   

15.
A general method for the identification of the independent subsets in loops with constant dependence vectors is presented. It is shown that the dependence relation remains invariant under a unimodular transformation. Then a unimodular transformation is used to bring the dependence matrix into a form where the independent subsets are obtained by a direct and inexpensive partitioning algorithm. This leads to a procedure for the automatic conversion of a serial loop into a nest of parallel DO-ALL loops. Another unimodular transformation results in an algorithm to label the dependent iterations of an n-fold nested loop in O(n2) time. This provides a multithreaded dynamic scheduling scheme requiring only one fork and one join primitive  相似文献   

16.
Many of the applications of polynomial matrices in real world systems require column‐ or diagonally‐reduced polynomial matrices. If a given polynomial matrix is not column‐ or diagonally‐reduced, Callier or Wolowich algorithms, which use unimodular transformations, can be applied for column‐ or diagonal‐reduction, respectively, as a pre‐processing step in the applications. However, Callier and Wolowich algorithms may be unstable, from a numerical viewpoint, because they use elementary column and row operations. The purpose of this paper is to present sufficient conditions for existence of a constant orthogonal transformation of the given polynomial matrix so that it becomes column‐ or diagonally‐reduced. Copyright © 2008 John Wiley and Sons Asia Pte Ltd and Chinese Automatic Control Society  相似文献   

17.
Lamport's parallelization algorithm (cf. [7]) is generalized to a broader class of loops, and the complexity of the transformation process has been estimated. It is shown that every loop can be parallelized using methods similar to those in [7]; moreover, they also have the property that all their inner loops are devoid of data dependencies, and so are fully parallelizable. Unfortunately, without restricting the nature of the loop to be parallelized, the negative solution to Hilbert's tenth problem (cf. [3]) can be applied to show that the parallelizing transformations are not computable. The class of affine loops was therefore introduced. This class is more general than that considered by Lamport, and it is shown that parallelizing transformations for affine loops are computable. In general, however, the complexity estimates for finding such loops suggest that the parallelization procedure will take longer than executing the original loop sequentially. It is further shown that, if the loop satisfies an additional, nondegeneracy condition, then the loop can be efficiently transformed.

Finally, although more generally applicable, these methods are best applied to vectorization problems.  相似文献   


18.
It is shown how self-resolving clauses like symmetry or transitivity, or even clauses like condensed detachment, can faithfully be deleted from the clause set, thus eliminating or at least reducing recursiveness and circularity in clause sets. Possible applications are reducing the search space for automated theorem provers, eliminating loops in Prolog programs, parallelizing simple closure computation algorithms, and supporting automated complexity analysis.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号