首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Linear loop transformations and tiling are known to be very effective for enhancing locality of reference in perfectly-nested loops. However, they cannot be applied directly to imperfectly-nested loops. Some compilers attempt to convert imperfectly-nested loops into perfectly-nested loops by using statement sinking, loop fusion, etc., and then apply locality enhancing transformations to the resulting perfectly-nested loops, but the approaches used are fairly ad hoc and may fail even for simple programs. In this paper, we present a systematic approach for synthesizing transformations to enhance locality in imperfectly-nested loops. The key idea is to embed the iteration space of each statement into a special iteration space called the product space. The product space can be viewed as a perfectly-nested loop nest, so embedding generalizes techniques like statement sinking and loop fusion which are used in ad hoc ways in current compilers to produce perfectly-nested loops from imperfectly-nested ones. In contrast to these ad hoc techniques however, our embeddings are chosen carefully to enhance locality. The product space can itself be transformed to increase locality further, after which fully permutable loops can be tiled. The final code generation step may produce imperfectly-nested loops as output if that is desirable. We present experimental evidence for the effectiveness of this approach, using dense numerical linear algebra benchmarks, relaxation codes, and the tomcatv code from the SPEC benchmarks.  相似文献   

2.
利用U模变换增加并行粒度与改善数据访问局部性的方法   总被引:3,自引:0,他引:3  
提出了一种利用循环变换增加循环并行粒度,改善循环数据访问局部性的方法,该方法利用了给定二重循环的相关向量集的某些性质,将外层循环变量不同而内层循环变量相等的若干次迭代合并,成为折叠后迭代空间的一个结点,并且保持内层循环的并行性不变,从而达到增加循环并行粒度的目的。对于更普遍的情况,该文讨论了如何根据给定循环的循环向量集,确定一个U模变换对迭代空间进行变换,达到内层循环可并行和扩大循环粒度两个目的,针对循环变换中数据访问局部性可能变差的问题,该文提出了对内层循环先合并,根据合并后的相关向量集变换迭代空间,以及折叠迭代空间的方法,该文的方法是Wavefront循环并行化方法的一种扩展。  相似文献   

3.
基于线性表出的非奇异循环变换局部性优化方法   总被引:1,自引:0,他引:1  
夏军  戴华东  杨学军 《计算机学报》2003,26(12):1609-1620
开发程序的局部性是当今并行编译优化研究的重点之一,而程序变换是开发程序时间局部性和空间局部性的重要手段之一.该文提出了一种新的利用非奇异循环变换来优化程序局部性的局部性优化方法,即基于线性表出的循环变换.该方法利用一组最少的线性无关向量组来线性表出数组访问的下标表达式,并据此构造非奇异变换矩阵来优化数组访问的时间局部性和空间局部性.该方法能充分开发数组访问的时间局部性,能简便地确定是否能对数组访问进行时间局部性或空间局部性优化,并能对给定的嵌套循环同时进行时间局部性和空间局部性优化.实验结果表明了该文所提出的基于线性表出的非奇异循环变换局部性优化方法是有效的.  相似文献   

4.
Loop tiling (or loop blocking) is a well-known loop transformation to improve temporal locality in nested loops which perform matrix computations. When targeting caches that have low associativities, one of the key challenges for loop tiling is to simultaneously minimize capacity misses and conflict misses. This paper analyzes the effect of the tile size and the array-dimension size on capacity misses and conflict misses. The analysis supports the approach of combining tile-size selection (to minimize capacity misses) with array padding (to minimize conflict misses).  相似文献   

5.
The exploitation of today's high-performance computer systems requires the effective use of parallelism in many forms and at numerous levels. This survey article discusses program analysis and restructuring techniques that target parallel architectures. We first describe various categories of architectures that are oriented toward parallel computation models: vector architectures, shared-memory multiprocessors, massively parallel machines, message-passing architectures, VLIWs, and multithreaded architectures. We then describe a variety of optimization techniques that can be applied to sequential programs to effectively utilize the vector and parallel processing units. After an overview of basic dependence analysis, we present restructuring transformations on DO loops targeted both to vectorization and to concurrent execution, interprocedural and pointer analysis, task scheduling, instruction-level parallelization, and compiler-assisted data placement. We conclude that although tremendous advances have been made in dependence theory and in the development of a toolkit of transformations, parallel systems are used most effectively when the programmer interacts in the optimization process.  相似文献   

6.
数据网格已逐步在科学研究领域得到应用.提高数据网格的性能以适应分布式数据管理已经成为研究数据网格的一个热点.提出了网格局部性的概念,分析了网格局部性对数据网格性能的影响,并从增强网格局部性的角度对数据网格的性能进行优化,提出了综合跳一扩散副本替换策略(jump-DRP)和参考生物外激素的任务调度策略(JARIP).实验结果表明,考虑了网格局部性因素的jump-DRP与JARIP的策略组合提高了网格平台的任务处理性能,并对各类应用背景及任务的复杂程度具有鲁棒性.  相似文献   

7.
Microprocessor speed has been growing exponentially faster than memory system speed in the recent past. This paper explores the long term implications of this trend. We define scalable locality, which measures our ability to apply ever faster processors to increasingly large problems (just as scalable parallelism measures our ability to apply more numerous processors to larger problems). We provide an algorithm called time skewing that derives an execution order and storage mapping to produce any desired degree of locality, for certain programs that can be made to exhibit scalable locality. Our approach is unusual in that it derives the transformation from the algorithm's dataflow (a fundamental characteristic of the algorithm) instead of searching a space of transformations of the execution order and array layout used by the programmer (artifacts of the expression of the algorithm). We provide empirical results for data sets using L2 cache, main memory, and virtual memory.  相似文献   

8.
In this paper we describe a COBOL-program restructuring tool currently under development. The tool is constructed using program transformations executed by the TAMPR program transformation system. We discuss the COBOL knowledge embodied in the transformations and how they restructure an example COBOL program developed in the mid–1970s. While the tool is not yet a robust commercial product, early use for restructuring COBOL programs demonstrates the power and flexibility of this transformation–based approach.  相似文献   

9.
Hardware and software cache optimizations are active fields of research, that have yielded powerful but occasionally complex designs and algorithms. The purpose of this paper is to investigate the performance of combined through simple software and hardware optimizations. Because current caches provide little flexibility for exploiting temporal and spatial locality, two hardware modifications are proposed to support these two kinds of locality. Spatial locality is exploited by using large virtual cache lines which do not exhibit the performance flaws of large physical cache lines. Temporal locality is exploited by minimizing cache pollution with a bypass mechanism that still allows to exploit spatial locality. Subsequently, it is shown that simple software informations on the spatial/temporal locality of array references, as provided by current data locality optimization algorithms, can be used to increase cache performance significantly. The performance and design tradeoffs of the proposed mechanisms are discussed, Software-assisted caches are also shown to provide a very convenient support for further enhancement of data locality optimizations.  相似文献   

10.
给出与平台无关的局部性量化方法,从空间局部性和时间局部性2个角度,量化SPEC2000测试基准程序,以及这些程序的数据段、代码段和堆栈段。时间和空间局部性组成的二维局部性分布直观地展示了基准测试程序的局部性。实验结果表明,程序数据局部性主要由堆段的局部性决定,堆段的局部性最差,栈的局部性最优。  相似文献   

11.
面对日益增长的图像数据库,为用户提供一个简洁高效的搜索和浏览解决方案成为一个紧迫而且充满挑战的问题.图像聚类技术可以在许多方面为此提供帮助,例如图像数据预处理、用户界面设计.以及对搜索结果的聚类等.在众多聚类算法中,谱聚类(spectral clustering)方法由于能够解决复杂分布数据的聚类问题,以及接近全局最优的性能,成为近年来广受关注的一种方法.然而,目前存在的谱聚类方法,譬如normalized cut在处理新增数据点的聚类时,计算复杂度很高.提出了一种新的聚类算法——保局聚类.保局聚类在拥有许多非线性谱聚类方法优点的同时,又具有独特的数学特性——能提供显式的映射函数.这为在原数据集和新增数据集上进行高效的聚类提供了可能.实验结果显示,保局聚类比K均值聚类和主成分分析后的K均值聚类效果要好.实验同样显示,保局聚类与normalized cut效果可比,而前者更加高效.  相似文献   

12.
一种利用数据融合来提高局部性和减少伪共享的方法   总被引:6,自引:0,他引:6  
某些应用程序不能通过数组内元素的重排优化获得性能提高 .针对这一问题 ,该文扩展了数组之间数据重组优化方法 ,着重分析了将多个数组的数据按一定方式进行融合来提高局部性和减少伪共享优化方法的特性 .文章针对几种典型的数组关联模式 ,提出了相应的数据融合方法 ,并建立了一组粗略的性能代价判别规则 ,以指导编译器有选择地融合数组以提高程序的全局优化效果 .根据在多个平台上的测试结果 ,该文还分析了数据融合优化方法在不同体系结构上的性能可移植性 ,并将体系结构特征加入到性能代价判别规则中 ,使得此优化方法能适用于不同的体系结构 .测试结果表明 ,数据融合优化方法对提高某些应用程序的性能 ,尤其是其在软件DSM体系结构上的性能 ,是非常有效的  相似文献   

13.
可重用本体模块的抽取是本体重用的一个关键环节.与传统工程应用中使用的基于本体层次的结构化方法抽取本体模块相比,使用逻辑的方法能充分利用本体提供的语义信息,抽取的本体模块更具完整性和正确性.在研究保守扩展的本体模块理论基础上,根据Grau B C提出的()本地性规则,提出并证明了描述逻辑()对应的语义本地性规则和句法本地性规则,为基于该规则抽取可重用本体模块提供了理论基础.  相似文献   

14.
基于投影分层技术的嵌套循环空间局部性优化方法   总被引:3,自引:0,他引:3  
从数据访问轨迹入手,探讨了利用数据变换来改善数据访问局部性的本质,提出了一种新的优化数据访问的投影分层技术以及基于它的数据变换框架.该框架主要利用投影技术来优化数据访问的空间局部性,并同时利用数据分层技术来解决因投影而带来的数据重叠问题.该数据变换框架不仅能处理仿射数组下标,而且还能处理许多非仿射的更复杂的数组下标,同时它还能简单直接地确定数据元素的最优存储布局以及优化数据访问的数据变换短阵,并能使访问间距尽量小.实验结果表明它是有效的.  相似文献   

15.
可重用本体模块的抽取是本体重用的一个关键环节。与传统工程应用中使用的基于本体层次的结构化方法抽取本体模块相比,使用逻辑的方法能充分利用本体提供的语义信息,抽取的本体模块更具完整性和正确性。在研究保守扩展的本体模块理论基础上,根据Grau B C提出的 SHOJQ 本地性规则,提出并证明了描述逻辑SHJF对应的语义本地性规则和句法本地性规则,为基于该规则抽取可重用本体模块提供了理论基础。  相似文献   

16.
Program restructuring in a segmented virtual memory system is examined on the base of empirical data. The measure of connectivity is based on the decrease of the space-time integral caused by combining the two segments. A heuristic clustering algorithm, based on the use of effectively reduced memory reference strings, is presented. It turns out that performance problems in a segmented system are created by the large number of small segments. Restructuring is able to improve the performance of the system essentially; the optimal performance is reached with values of the memory control parameter which are one magnitude smaller than those applicable to original programs. The clusters formed do not correspond to program phases, but to smaller units of program execution; thus the clustering result supports a hierarchical model of program locality. The restructuring process is reasonably insensitive both to the memory control parameters and to the input data used.  相似文献   

17.
完备鉴别保局投影人脸识别算法   总被引:15,自引:0,他引:15  
为了充分利用保局总体散布主元空间内的鉴别信息进行人脸识别,提出了一种完备鉴别保局投影(complete discriminant locality preserving projections,简称CDLPP)人脸识别算法.鉴于Fisher鉴别分析和保局投影已经被广泛的应用于人脸识别,完备鉴别保局投影(locality preserving projections,简称LPP)算法将这两者结合起来,分析了保局类内散布、类间散布和总体散布的主元空间和零空间内包含的鉴别信息.该算法采用奇异值分解(singular value decomposition,简称SVD),去除了不含任何鉴别信息的保局总体散布的零空间;分别在保局类内散布的主元空间和零空间提取规则鉴别特征和不规则鉴别特征;用串联的方式在特征层融合规则鉴别特征和不规则鉴别特征形成完备的鉴别特征进行人脸识别.在ORL库、FERET子库和PIE子库上的大量识别实验充分表明了完备鉴别保局投影算法的性能优于线性鉴别分析、保局投影和鉴别保局投影等现有的子空间人脸识别算法,验证了算法的有 效性.  相似文献   

18.
Graph embedding (GE) is a unified framework for dimensionality reduction techniques. GE attempts to maximally preserve data locality after embedding for face representation and classification. However, estimation of true data locality could be severely biased due to limited number of training samples, which trigger overfitting problem. In this paper, a graph embedding regularization technique is proposed to remedy this problem. The regularization model, dubbed as Locality Regularization Embedding (LRE), adopts local Laplacian matrix to restore true data locality. Based on LRE model, three dimensionality reduction techniques are proposed. Experimental results on five public benchmark face datasets such as CMU PIE, FERET, ORL, Yale and FRGC, along with Nemenyi Post-hoc statistical of significant test attest the promising performance of the proposed techniques.  相似文献   

19.
一种基于Schur分解的正交鉴别局部保持投影方法   总被引:2,自引:0,他引:2       下载免费PDF全文
人脸识别是模式识别领域中的一项重要的研究课题。到目前为止,已经提出了许多方法来处理人脸的识别问题。最近,许多流形学习算法被提出并且成功地应用于人脸识别当中。这些流形学习方法能够保持人脸图像数据的局部结构,同时,还可以发现人脸的非线性结构。在这些流形学习方法中,局部保持投影方法(LPP)是最有效的方法之一。基于LPP方法,提出了一种新的人脸识别方法——基于Schur分解的正交鉴别局部保持投影方法(ODLPPS)。与LPP方法相比,ODLPPS 把类间散度与类内散度之差的信息融入到LPP的目标函数中并且获得了正交的基向量。在ORL和Yale 人脸数据库上的实验结果表明,该方法在识别性能上优于一些已经存在的方法,如eigenface,Fisherface,LPP 和orthogonal LPP(OLPP)。  相似文献   

20.
This paper presents source-level transformations that improve the performance of programs using synchronous and asynchronous message passing primitives, including remote call to an active process (rendezvous). It also discusses the applicability of these transformations to shared memory and distributed environments. The transformations presented reduce the need for context switching, simplify the specific form of communication, and/or reduce the complexity of the given form of communication. One additional transformation actually increases the number of processes as well as the number of context switches to improve program performance. These transformations are shown to be generalizable. Results of hand-applying the transformations to SR programs indicate reductions in execution time exceeding 90% in many cases. The transformations also apply to many commonly occurring synchronization patterns and to other concurrent programming languages such as Ada and Concurrent C. The long term goal of this effort is to include such transformations as an otpimization step, performed automatically by a compiler.This work was supported by NSF under Grant Number CCR88-10617.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号