期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Automatic tuning of whole applications using direct search and a performance-based transformation system

Apan Qasem Ken Kennedy John Mellor-Crummey 《The Journal of supercomputing》2006,36(2):183-196

In many cases, simple analytical models used by traditional compilers are no longer able to yield effectively optimized code for complex programs because of the enormous complexity of processor architectures. A promising alternative approach for optimizing applications effectively has been the use of search-based empirical methods. The success of empirically tuned library generators such as ATLAS has shown that this strategy can be effective for domain-specific programs. However, to date there has been no general-purpose tool for effective empirical optimization of whole programs. The main obstacle to this approach has been the need for evaluating a prohibitively large number of alternative program variants. To address this problem, we have developed a prototype tool for automatic application tuning that uses loop-level performance feedback and a direct search strategy to guide search for the best set of optimization parameters. Experiments on four different architectures show that direct search can be an effective technique for finding good values for transformation parameters in a reasonable time. This material is based on work supported by the Department of Energy under Contract Nos. 03891-001-99-4G, 74837-001-03 49, 86192-001-04 49, and/or 12783-001-05 49 from the Los Alamos National Laboratory. 相似文献

2.

An experiment with inline substitution

Keith D. Cooper Mary W. Hall Linda Torczon 《Software》1991,21(6):581-601

This paper describes an experiment undertaken to evaluate the effectiveness of inline substitution as a method of improving the running time of compiled code. Our particular interests are in the interaction between inline substitution and aggressive code optimization. To understand this relationship, we used commercially available FORTRAN optimizing compilers as the basis for our study. This paper reports on the effectiveness of the various compilers at optimizing the inlined code. We examine both the runtime performance of the resulting code and the compile-time performance of the compilers. This work can be viewed as a study of the effectiveness of inlining in modern optimizers; alternatively, it can be viewed as one data point on the overall effectiveness of modern optimizing compilers. We discovered that, with optimizing FORTRAN compilers, (1) object-code growth from inlining is substantially smaller than source-code growth, (2) compile-time growth from inlining is smaller than source-code growth, and (3) the compilers we tested were not able to capitalize consistently on the opportunities presented by inlining. 相似文献

3.

Model-driven auto-scaling of green cloud computing infrastructure

Brian DoughertyAuthor Vitae Jules White^{Author Vitae} 《Future Generation Computer Systems》2012,28(2):371-378

Cloud computing can reduce power consumption by using virtualized computational resources to provision an application’s computational resources on demand. Auto-scaling is an important cloud computing technique that dynamically allocates computational resources to applications to match their current loads precisely, thereby removing resources that would otherwise remain idle and waste power. This paper presents a model-driven engineering approach to optimizing the configuration, energy consumption, and operating cost of cloud auto-scaling infrastructure to create greener computing environments that reduce emissions resulting from superfluous idle resources. The paper provides four contributions to the study of model-driven configuration of cloud auto-scaling infrastructure by (1) explaining how virtual machine configurations can be captured in feature models, (2) describing how these models can be transformed into constraint satisfaction problems (CSPs) for configuration and energy consumption optimization, (3) showing how optimal auto-scaling configurations can be derived from these CSPs with a constraint solver, and (4) presenting a case study showing the energy consumption/cost reduction produced by this model-driven approach. 相似文献

4.

A program auto-parallelizer based on the component technology of optimizing compiler construction

A. Yu. Drozdov S. V. Novikov 《Programming and Computer Software》2009,35(6):321-339

This paper describes a program auto-parallelizer that is based on the component approach to constructing optimizing compilers; the parallelizer is included in the technological chain of gcc. Details of using analytical and optimization components for constructing an auto-parallelizer and a parallelization algorithm using the OpenMP library are considered. Finally, we discuss the results of operation of the auto-parallelizer in terms of performance on a subset of problems in the Spec2006 and NAS parallel benchmarks packages. 相似文献

5.

结合模型和迭代编译优化矩阵相乘程序

陆平静王正华车永刚《计算机工程与科学》2009,31(Z1)

高性能计算应用程序获得的持续性能与机器峰值性能的差距日益扩大,很大程度上制约着高性能计算的发展。程序变换通过对程序进行适应机器体系结构特征的优化变换,提高程序实际执行性能,是解决该问题的有效途径之一。很多高级程序变换均具有数值参数,为了获得最优性能,需要仔细选择参数的值。传统的编译器使用简单的模型选择这些参数,难以适应日趋复杂的硬件平台和应用程序。迭代编译通过生成不同的程序版本并在实际硬件评估上运行程序,来评估关键优化参数的值并决定能够产生最优性能的值,显著优于静态方法,但巨大的优化开销限制了其应用范围。本文针对矩阵相乘程序提出一种结合性能模型和迭代编译的优化方法,利用基于对机器体系结构和程序的经验知识构造性能模型约束优化空间,并使用遗传算法加速在优化空间中寻找优秀解的过程。实验结果表明,该方法可以较低的开销获得更优的性能优化效果。相似文献

6.

Model-driven design space exploration for multi-robot systems in simulation

Harbin James Gerasimou Simos Matragkas Nicholas Zolotas Thanos Calinescu Radu Alpizar Santana Misael 《Software and Systems Modeling》2023,22(5):1665-1688

Multi-robot systems are increasingly deployed to provide services and accomplish missions whose complexity or cost is too high for a single robot to achieve on its own. Although multi-robot systems offer increased reliability via redundancy and enable the execution of more challenging missions, engineering these systems is very complex. This complexity affects not only the architecture modelling of the robotic team but also the modelling and analysis of the collaborative intelligence enabling the team to complete its mission. Existing approaches for the development of multi-robot applications do not provide a systematic mechanism for capturing these aspects and assessing the robustness of multi-robot systems. We address this gap by introducing ATLAS, a novel model-driven approach supporting the systematic design space exploration and robustness analysis of multi-robot systems in simulation. The ATLAS domain-specific language enables modelling the architecture of the robotic team and its mission and facilitates the specification of the team’s intelligence. We evaluate ATLAS and demonstrate its effectiveness in three simulated case studies: a healthcare Turtlebot-based mission and two unmanned underwater vehicle missions developed using the Gazebo/ROS and MOOS-IvP robotic platforms, respectively.

相似文献

7.

Model-driven generative development of measurement software

Martin Monperrus Jean-Marc J��z��quel Benoit Baudry Jo?l Champeau Brigitte Hoeltzener 《Software and Systems Modeling》2011,10(4):537-552

Metrics offer a practical approach to evaluate properties of domain-specific models. However, it is costly to develop and maintain measurement software for each domain-specific modeling language. In this paper, we present a model-driven and generative approach to measuring models. The approach is completely domain-independent and operationalized through a prototype that synthesizes a measurement infrastructure for a domain-specific modeling language. This model-driven measurement approach is model-driven from two viewpoints: (1) it measures models of a domain-specific modeling language; (2) it uses models as unique and consistent metric specifications, with respect to a metric specification metamodel which captures all the necessary concepts for model-driven specifications of metrics. The benefit from applying the approach is evaluated by four case studies. They indicate that this approach significantly eases the measurement activities of model-driven development processes. 相似文献

8.

Global Register Allocation for SIMD Multiprocessors

下载免费PDF全文

Benjamin HAO David PEARSON Richard ZIPPEL 《计算机科学技术学报》1996,11(3):222-236

It is relatively clear how to map regular,repetitive or grid oriented computations onto SIMD architectures.It is not so clear,however,how to do this for irregular computations even though there may be significant amounts of intrinsic parallelism in branch free code.We study compilation techniques for this type of code when targeted to SIMD computers and illustrate their use on a simple model architecture.In this paper,we present one of the compilation techniques,global register allocation,we have developed for SIMD computers,and demonstrate that it can effectively allocate registers for parallelizing irregular computations in branch free code.This technique is an extension and a modification of the register allocation via graph coloring approach used by sequential compilers.Our performance results validate our method. 相似文献

9.

Parallelizing irregular and pointer-based computations automatically: Perspectives from logic and constraint programming

Manuel Hermenegildo 《Parallel Computing》2000,26(13-14)

Irregular computations pose some of the most interesting and challenging problems in automatic parallelization. Irregularity appears in certain kinds of numerical problems and is pervasive in symbolic applications. Such computations often use dynamic data structures, which make heavy use of pointers. This complicates all the steps of a parallelizing compiler, from independence detection to task partitioning and placement. Starting in the mid 80s there has been significant progress in the development of parallelizing compilers for logic programming (and more recently, constraint programming) resulting in quite capable parallelizers. The typical applications of these paradigms frequently involve irregular computations, and make heavy use of dynamic data structures with pointers, since logical variables represent in practice a well-behaved form of pointers. This arguably makes the techniques used in these compilers potentially interesting. In this paper, we introduce in a tutorial way, some of the problems faced by parallelizing compilers for logic and constraint programs and provide pointers to some of the significant progress made in the area. In particular, this work has resulted in a series of achievements in the areas of inter-procedural pointer aliasing analysis for independence detection, cost models and cost analysis, cactus-stack memory management, techniques for managing speculative and irregular computations through task granularity control and dynamic task allocation (such as work-stealing schedulers), etc. 相似文献

10.

Java性能优化技术综述

王会进龙舜《小型微型计算机系统》2008,29(4):720-725

Java由于其简单、面向对象、独立于硬件体系结构、安全等特点在各种应用领域内获得广泛的应用,但在很多情况下其运行性能仍有待提高.优化Java应用的运行性能成为当前业界迫切要解决的问题和当前研究的热点.本文简要回顾了当前在Java性能优化方面的最新研究成果,对其中的关键技术进行了深入探讨,并结合作者的经验提出对未来发展的一些看法. 相似文献

11.

AutoConfig: 面向深度学习编译优化的自动配置机制

张洪滨周旭林邢明杰武延军赵琛《软件学报》2024,35(6)

随着深度学习模型和硬件架构的快速发展,深度学习编译器已经被广泛应用.目前,深度学习模型的编译优化和调优的方法主要依赖基于高性能算子库的手动调优和基于搜索的自动调优策略.然而,面对多变的目标算子和多种硬件平台的适配需求,高性能算子库往往需要为各种架构进行多次重复实现.此外,现有的自动调优方案也面临着搜索开销大和缺乏可解释性的挑战.为了解决上述问题,本文提出了AutoConfig,一种面向深度学习编译优化的自动配置机制.针对不同的深度学习计算负载和特定的硬件平台,AutoConfig可以构建具备可解释性的优化算法分析模型,采用静态信息提取和动态开销测量的方法进行综合分析,并基于分析结果利用可配置的代码生成技术自动完成算法选择和调优.本文创新性地将优化分析模型与可配置的代码生成策略相结合,不仅保证了性能加速效果,还减少了重复开发的开销,同时简化了调优过程.在此基础上,本文进一步将AutoConfig集成到深度学习编译器Buddy Compiler中,对矩阵乘法和卷积的多种优化算法建立分析模型,并将自动配置的代码生成策略应用在多种SIMD硬件平台上进行评估.实验结果验证了AutoConfig在代码生成策略中有效地完成了参数配置和算法选择.与经过手动或自动优化的代码相比,由AutoConfig生成的代码可达到相似的执行性能,并且无需承担手动调优的重复实现开销和自动调优的搜索开销. 相似文献

12.

模型驱动的嵌入式系统设计与性能优化 总被引：3，自引：0，他引：3

栾静顾君忠《计算机工程与应用》2006,42(14):114-117

直接从规范需求描述入手,研究了嵌入式系统设计中的模型映射、代码自动生成、协同验证和性能优化等关键技术问题,提出了以模型驱动的嵌入式系统软硬件协同设计方法,使面向应用的嵌入式系统设计,在不同抽象层次上同步设计与验证,经过性能优化后得到RTL级SystemC代码表示的虚拟原型,介绍了一个应用实例。相似文献

13.

EGGS: Sparsity-Specific Code Generation

Xuan Tang Teseo Schneider Shoaib Kamil Aurojit Panda Jinyang Li Daniele Panozzo 《Computer Graphics Forum》2020,39(5):209-219

Sparse matrix computations are among the most important computational patterns, commonly used in geometry processing, physical simulation, graph algorithms, and other situations where sparse data arises. In many cases, the structure of a sparse matrix is known a priori, but the values may change or depend on inputs to the algorithm. We propose a new methodology for compile-time specialization of algorithms relying on mixing sparse and dense linear algebra operations, using an extension to the widely-used open source Eigen package. In contrast to library approaches optimizing individual building blocks of a computation (such as sparse matrix product), we generate reusable sparsity-specific implementations for a given algorithm, utilizing vector intrinsics and reducing unnecessary scanning through matrix structures. We demonstrate the effectiveness of our technique on a benchmark of artificial expressions to quantitatively evaluate the benefit of our approach over the state-of-the-art library Intel MKL. To further demonstrate the practical applicability of our technique we show that our technique can improve performance, with minimal code changes, for mesh smoothing, mesh parametrization, volumetric deformation, optical flow, and computation of the Laplace operator. 相似文献

14.

Quantum probabilistically cloning and computation

Ting Gao Fengli Yan Zhixi Wang Youcheng Li 《Frontiers of Computer Science in China》2008,2(2):179-189

In this article we make a review on the usefulness of probabilistically cloning and present examples of quantum computation tasks for which quantum cloning offers an advantage which cannot be matched by any approach that does not resort to it. In these quantum computations, one needs to distribute quantum information contained in states about which we have some partial information. To perform quantum computations, one uses state-dependent probabilistic quantum cloning procedure to distribute quantum information in the middle of a quantum computation. And we discuss the achievable efficiencies and the efficient quantum logic network for probabilistic cloning the quantum states used in implementing quantum computation tasks for which cloning provides enhancement in performance. 相似文献

15.

A framework for reliability-aware embedded system design on multiprocessor platforms

Jia Huang Simon Barner Andreas Raabe Christian Buckl Alois Knoll 《Microprocessors and Microsystems》2014

相似文献

16.

Parallelizing Operational Weather Forecast Models for Portable and Fast Execution

Bernardo Rodriguez Leslie Hart Tom Henderson 《Journal of Parallel and Distributed Computing》1996,37(2):159

This paper describes a high-level library (The Nearest Neighbor Tool, NNT) that has been used to parallelize operational weather prediction models. NNT is part of the Scalable Modeling System (SMS), developed at the Forecast Systems Laboratory (FSL). Programs written in NNT rely on SMS's run-time system and port between a wide range of computing platforms, performing well in multiprocessor systems. We show, using examples from operational weather models, how large Fortran 77 codes can be parallelized using NNT. We compare the ease of programmability of NNT and High Performance Fortran (HPF). We also discuss optimizations like data movement overlap (in interprocessor communication and I/O operations), and the minimization of data exchanges through the use of redundant computations. We show that although HPF provides a simpler programming interface, NNT allows for program optimizations that increase performance considerably and still keeps a simple user interface. These optimizations have proven essential to run weather prediction models in real time, and HPF compilers should incorporate them in order to meet operational demands. Throughout the paper, we present performance results of weather models running on a network of workstations, the Intel Paragon, and the SGI Challenge. Finally, we study the cost of programming global address space architectures with NNT's local address space paradigm. 相似文献

17.

Handling Global Constraints in Compiler Strategy

Erven Rohou François Bodin Christine Eisenbeis André Seznec 《International journal of parallel programming》2000,28(4):325-345

To achieve high-performance on processors featuring ILP, most compilers apply locally a set of heuristics. This leads to a potentially high-performance on separate code fragments. Unfortunately, most optimizations also increase code size, which may lead to a global net performance loss. In this paper, we propose a Global Constraints-Driven Strategy (GCDS) for guiding code optimization. When using GCDS, the final code optimization decision is taken according to global criteria rather than local criteria. For instance, such criteria might be performance, code size, instruction cache behavior, etc. The performance/code size trade-off is a particularly important problem for embedded systems. We show how GCDS can be used to master code size while optimizing performance. 相似文献

18.

Using profile information to assist classic code optimizations

Pohua P. Chang Scott A. Mahlke Wen-Mei W. Hwu 《Software》1991,21(12):1301-1321

This paper describes the design and implementation of an optimizing compiler that automatically generates profile information to assist classic code optimizations. This compiler contains two new components, an execution profiler and a profile-based code optimizer, which are not commonly found in traditional optimizing compilers. The execution profiler inserts probes into the input program, executes the input program for several inputs, accumulates profile information and supplies this information to the optimizer. The profile-based code optimizer uses the profile information to expose new optimization opportunities that are not visible to traditional global optimization methods. Experimental results show that the profile-based code optimizer significantly improves the performance of production C programs that have already been optimized by a high-quality global code optimizer. 相似文献

19.

alto: a link‐time optimizer for the Compaq Alpha

Robert Muth Saumya K. Debray Scott Watterson Koen De Bosschere 《Software》2001,31(1):67-101

Traditional optimizing compilers are limited in the scope of their optimizations by the fact that only a single function, or possibly a single module, is available for analysis and optimization. In particular, this means that library routines cannot be optimized to specific calling contexts. Other optimization opportunities, exploiting information not available before link time, such as addresses of variables and the final code layout, are often ignored because linkers are traditionally unsophisticated. A possible solution is to carry out whole‐program optimization at link time. This paper describes alto , a link‐time optimizer for the Compaq Alpha architecture. It is able to realize significant performance improvements even for programs compiled with a good optimizing compiler with a high level of optimization. The resulting code is considerably faster than that obtained using the OM link‐time optimizer, even when the latter is used in conjunction with profile‐guided and inter‐file compile‐time optimizations. Copyright © 2001 John Wiley & Sons, Ltd. 相似文献

20.

A new approach to pointer analysis for assignments

下载免费PDF全文

黄波臧斌宇等《计算机科学技术学报》2001,16(3):0-0

Pointer analysis is a technique to identify at copile-time the potential values of the pointer expressions in a program,which promises significant benefits for optimzing and parallelizing complilers.In this paper,a new approach to pointer analysis for assignments is presented.In this approach,assignments are classified into three categories:pointer assignments,structure(union)assignents and normal assignments which don‘t affect the point-to information.Pointer analyses for these three kinds of assignments respectively make up the integrated algorithm.When analyzing a pointer assigemtn.a new method called expression expansion is used to calculate both the left targets and the right targets.The integration of recursive data structure analysis into pointer analysis is a significant originality of this paper,which uniforms the pointer analysis for heap variables and the pointer analysis for stack variables.This algorithm is implemented in Agassiz,an analyzing tool for C programs developed by Institute of Parallel Processing,Fudan University,Its accuracy and effectiveness are illustrated by experimental data. 相似文献