首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In this paper we present our experience in developing an optimizing compiler for general purpose computation on graphics processing units (GPGPU) based on the Cetus compiler framework. The input to our compiler is a naïve GPU kernel procedure, which is functionally correct but without any consideration for performance optimization. Our compiler applies a set of optimization techniques to the naive kernel and generates the optimized GPU kernel. Our compiler supports optimizations for GPU kernels using either global memory or texture memory. The implementation of our compiler is facilitated with a source-to-source compiler infrastructure, Cetus. The code transformation in the Cetus compiler framework is called a pass. We classify all the passes used in our work into two categories: functional passes and optimization passes. The functional passes translate input kernels into desired intermediate representation, which clearly represents memory access patterns and thread configurations. A series of optimization passes improve the performance of the kernels by adapting them to the target GPGPU architecture. Our experiments show that the optimized code achieves very high performance, either superior or very close to highly fine-tuned libraries.  相似文献   

2.
A certifying compiler takes a source language program and produces object code, as well as a certificate that can be used to verify that the object code satisfies desirable properties, such as type safety and memory safety. Certifying compilation helps to increase both compiler robustness and program safety. Compiler robustness is improved since some compiler errors can be caught by checking the object code against the certificate immediately after compilation. Program safety is improved because the object code and certificate alone are sufficient to establish safety: even if the object code and certificate are produced on an unknown machine by an unknown compiler and sent over an untrusted network, safe execution is guaranteed as long as the code and certificate pass the verifier.Existing work in certifying compilation has addressed statically generated code. In this paper, we extend this to code generated at run time. Our goal is to combine certifying compilation with run-time code generation to produce programs that are both fast and verifiably safe. To achieve this goal, we present two new languages with explicit run-time code generation constructs: Cyclone, a type safe dialect of C, and TAL/T, a type safe assembly language. We have designed and implemented a system that translates a safe C program into Cyclone, which is then compiled to TAL/T, and finally assembled into executable object code. This paper focuses on our overall approach and the front end of our system; details about TAL/T will appear in a subsequent paper.  相似文献   

3.
4.
We present the results of our experience in introducing modularity into the programming language Pascal in order to aid the creation and use of library modules. Our system performs the symbolic linking of source language modules producing a single Pascal text ready for compilation; performing the link phase before compilation anticipates interface consistency checks, and suggests a possible improvement of program development systems. Our extension is implemented in a preprocessor which ensures a complete compatibility with any standard Pascal compiler. In this paper we examine the main features of some high-level programming languages which support modularization and data abstraction and some experiences in introducing modularity into Pascal; on this basis we describe our choice in detail. The design and implementation details are discussed and some examples are presented.  相似文献   

5.
Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem   总被引:29,自引:0,他引:29  
The problem of merging multiple databases of information about common entities is frequently encountered in KDD and decision support applications in large commercial and government organizations. The problem we study is often called the Merge/Purge problem and is difficult to solve both in scale and accuracy. Large repositories of data typically have numerous duplicate information entries about the same entities that are difficult to cull together without an intelligent equational theory that identifies equivalent items by a complex, domain-dependent matching process. We have developed a system for accomplishing this Data Cleansing task and demonstrate its use for cleansing lists of names of potential customers in a direct marketing-type application. Our results for statistically generated data are shown to be accurate and effective when processing the data multiple times using different keys for sorting on each successive pass. Combing results of individual passes using transitive closure over the independent results, produces far more accurate results at lower cost. The system provides a rule programming module that is easy to program and quite good at finding duplicates especially in an environment with massive amounts of data. This paper details improvements in our system, and reports on the successful implementation for a real-world database that conclusively validates our results previously achieved for statistically generated data.  相似文献   

6.
7.
Summary Ordered attributed grammars are defined as a large subclass of semantically well-defined attributed grammars proposed by Knuth. An attributed grammar is ordered if for each symbol a partial order over the associated attributes can be given, such that in any context of the symbol the attributes are evaluable in an order which includes that partial order. The definition does not refer to a predefined strategy for attribute evaluation, e.g. several passes from left to right. For each attributed grammar evaluable by any predefined evaluation strategy such an order exists. The ordering property can be checked by an algorithm, which depends polynomially in time on the size of the input grammar. Visit-sequences are computed from the attribute dependencies given by an ordered attributed grammar. They describe the control flow of an algorithm for attribute evaluation which can be part of an automatically generated compiler.  相似文献   

8.
数据流分析是编译器中重要部分,而增量式分析在程序开发环境和过程间优化编译器中有着相关实用的价值,当程序发生变化时,它可以增量式地维护数据流信息,而不致因程序的任何小改动都重新进行数据流分析,给出了一种增量式的消去数据流算法,它基于路径简化算法,具有和路径简化算法同样的复杂度,同样的通用性(适用于不可归约流图和流函数不完备的情况),而且能方便地在程序发生变化时维护现有的数据流信息。  相似文献   

9.
We present the implementation of a Prolog system composed of interpreter and compiler. The originality of our work consists in the adoption of a new framework to realize the main components of the system. The framework relies on new mechanisms, called sleepers. With their help we have developed a complete Prolog interpreter in which all the control activities, from backtracking up to last-call optimization, are performed by the sleeper mechanism. We have also produced a Prolog compiler by using a philosophy and tactics that are completely independent of hardware constraints; it exploits an incremental and abstract implementation technique, based on a delayed non-local execution protocol. Our approach to Prolog system implementation has been extremely useful both in terms of software design and overall performance.  相似文献   

10.
尹清华  李赣生  李莹 《软件学报》1997,8(3):229-234
类属单元是ADA语言的重大特色之一.一般的ADA编译程序多数是多次扫描的,对类属单元的处理往往是单独进行一次扫描,先进行预处理.有的编译为了处理类属还进行多次扫描.本文提出了一个一次扫描的ADA编译程序的类属处理的新方法,并应用于PC/UNIXADA-Z编译系统,取得了良好的效果.  相似文献   

11.
《Knowledge》2002,15(1-2):3-11
The complexity of modern digital systems requires complex design entry methods and thus, language based designs are often an appealing alternative for schematics. Language based design entry, supports high-level design transformations through formal and executable traditional compiler construction problem specifications, their main advantages being modularity and declarative notation. In this paper, this idea is exploited under a powerful compiler construction system and a methodology is given to design formal and executable high-level hardware manipulators. In effect, this methodology stands as a meta-level between hardware transformations and their implementation and can be valuable in fast evaluation of new ideas and techniques.  相似文献   

12.
13.
This paper presents an experimental implementation of a self-applicable partial evaluator in Prolog used for compiler generation and compiler generator generation. The partial evaluator is an extension of a simple meta interpreter for Prolog programs, and its self-application is straightforward because of its simplicity. A method of incremental compilation is also described as a promising application of the partial evaluator for knowledge-based systems.  相似文献   

14.
Frequent itemset mining (FIM) is a popular data mining issue adopted in many fields, such as commodity recommendation in the retail industry, log analysis in web searching, and query recommendation (or related search). A large number of FIM algorithms have been proposed to obtain better performance, including parallelized algorithms for processing large data volumes. Besides, incremental FIM algorithms are also proposed to deal with incremental database updates. However, most of these incremental algorithms have low parallelism, causing low efficiency on huge databases. This paper presents two parallel incremental FIM algorithms called IncMiningPFP and IncBuildingPFP, implemented on the MapReduce framework. IncMiningPFP preserves the FP-tree mining results of the original pass, and utilizes them for incremental calculations. In particular, we propose a method to generate a partial FP-tree in the incremental pass, in order to avoid unnecessary mining work. Further, some of the incremental parallel tasks can be omitted when the inserted transactions include fewer items. IncbuildingPFP preserves the CanTrees built in the original pass, and then adds new transactions to them during the incremental passes. Our experimental results show that IncMiningPFP can achieve significant speedup over PFP (Parallel FPGrowth) and a sequential incremental algorithm (CanTree) in most cases of incremental input database, and in other cases IncBuildingPFP can achieve it.  相似文献   

15.
The incorporation of global program analysis into recent compilers for Constraint Logic Programming (CLP) languages has greatly improved the efficiency of compiled programs. We present a global analyser based on abstract interpretation. Unlike traditional optimizers, whose designs tend to be ad hoc, the analyser has been designed with flexibility in mind. The analyser is incremental, allowing substantial program transformations by a compiler without requiring redundant re-computation of analysis data. The analyser is also generic in that it can perform a large number of different program analyses. Furthermore, the analyser has an object-oriented design, enabling it to be adapted to different applications easily and allowing it to be used with various CLP languages with simple modifications. As an example of this generality, we sketch the use of the analyser in two different applications involving two distinct CLP languages: an optimizing compiler for CLP(R) programs and an application for detecting occur-check problems in Prolog programs. © 1998 John Wiley & Sons Ltd.  相似文献   

16.
We present a Matlab implementation of topology optimization for fluid flow problems in the educational computer code PolyTop (Talischi et al. 2012b). The underlying formulation is the well-established porosity approach of Borrvall and Petersson (2003), wherein a dissipative term is introduced to impede the flow in the solid (non-fluid) regions. Polygonal finite elements are used to obtain a stable low-order discretization of the governing Stokes equations for incompressible viscous flow. As a result, the same mesh represents the design field as well as the velocity and pressure fields that characterize its response. Owing to the modular structure of PolyTop, incorporating new physics, in this case modeling fluid flow, involves changes that are limited mainly to the analysis routine. We provide several numerical examples to illustrate the capabilities and use of the code. To illustrate the modularity of the present approach, we extend the implementation to accommodate alternative formulations and cost functions. These include topology optimization formulations where both viscosity and inverse permeability are functions of the design; and flow control where the velocity at a certain location in the domain is maximized in a prescribed direction.  相似文献   

17.
戴彩艳  陈崚  胡孔法 《计算机科学》2018,45(Z6):442-446, 464
针对二分网络的社区挖掘问题,提出了一种基于模块度增量的二分网络社区挖掘算法。该算法假设每个顶点独自构成一个社区,并具有自己的标号。其中,一部分顶点将自己的标号复制并传递到另一部分中的某个顶点上,使之与其位于同一个社区;另一部分的顶点实施同样的操作。如此反复迭代,直至收敛。标号传播时,选择模块度增量最大的边进行传送,使整体模块度不断提高。在真实数据集上进行的测试表明,所提算法能对二分网络进行高质量的社区划分。  相似文献   

18.
We present the design and evaluation of a new data-race-detection technique. Our technique executes at runtime rather than post-mortem, and handles unmodified shared-memory applications that run on top of CVM, a software distributed shared memory system. We do not assume explicit associations between synchronization and shared data, and require neither compiler support nor program source. Instead, we use a binary code re-writer to instrument instructions that may access shared memory. The most novel aspect of our system is that we are able to use information from the underlying memory system implementation in order to reduce the number of comparisons made at runtime. We present an experimental evaluation of our techniques by using our system to look for data races in five common shared-memory programs. We quantify the effect of several optimizations to the basic technique: data flow analysis, instrumentation batching, runtime code modification, and instrumentation inlining. Our system correctly found races in three of the five programs, including two from a standard benchmark suite. The slowdown of this debugging technique averages less than 2.5 for our applications.  相似文献   

19.
We present a novel Hybrid Analysis technology which can efficiently and seamlessly integrate all static and run-time analysis of memory references into a single framework that is capable of performing all data dependence analysis and can generate necessary information for most associated memory related optimizations. We use HA to perform automatic parallelization by extracting run-time assertions from any loop and generating appropriate run-time tests that range from a low cost scalar comparison to a full, reference by reference run-time analysis. Moreover we can order the run-time tests in increasing order of complexity (overhead) and thus risk the minimum necessary overhead. We accomplish this by both extending compile time IP analysis techniques and by incorporating speculative run-time techniques when necessary. Our solution is to bridge free compile time techniques with exhaustive run-time techniques through a continuum of simple to complex solutions. We have implemented our framework in the Polaris compiler by introducing an innovative intermediate representation called RT_LMAD and a run-time library that can operate on it. Based on the experimental results obtained to date we hope to automatically parallelize most and possibly all PERFECT codes, a significant accomplishment.  相似文献   

20.
A novel hardware architecture for extracting region boundaries in two raster scan passes through a binary image is presented. The first pass gathers statistics regarding the size of each object contour. This information is used by the hardware to allocate dynamically off-chip memory for storage of boundary codes. In the second raster pass the same architecture constructs lists of grid-joint codes to represent the perimeter pixels of each object. These codes, referred to variously as crack codes or raster-chain codes in the literature, are later decoded by the hardware to reproduce the ordered sequence of coordinates surrounding each object. This list of coordinates is useful for a variety of shape recognition and manipulation algorithms that utilize boundary information. We present results of software simulations of the VLSI architecture, along with measurements on the coding efficiency of the basic algorithm, and estimates of the overall complexity of a proposed VLSI chip.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号