首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
UC is a data-parallel extension of C designed for scientific computations on synchronous and asynchronous parallel architectures. The primary constructs of the language include sets, reductions, and parallel and asynchronous composition. Its communication model is that of a globally addressable memory, with no syntactic distinction between local and remote data references, Unlike most existing data-parallel languages, UC programs may be synchronized at multiple levels of granularity, from a strict expression-level synchronization to a coarser statement or function-level synchronization. This paper describes the language and its implementation on the Connection Machine CM-2. Experimental measurements that compare the performance of the UC compiler with that of programs written in commercial parallel languages such as CM Fortran, C*, and *Lisp are also presented.  相似文献   

2.
Among the various memory consistency models, the sequential consistency (SC) model is the most intuitive and enables programmers to reason about their parallel programs the best. Nevertheless, processor designers often choose to support relaxed memory consistency models because the weaker ordering constraints imposed by such models allow for more instructions to be reordered and enable higher performance. Programs running on machines supporting weaker consistency models can be transformed into ones in which SC is enforced. The compiler does this by computing a minimal set of memory access pairs whose ordering automatically guarantees SC. To ensure that these memory access pairs are not reordered, memory fences are inserted. Unfortunately, insertion of such memory fences can significantly slowdown the program. We observe that the ordering of the minimal set of memory accesses that the compiler strives to enforce, is typically already enforced in the normal course of program execution. A study we conducted on programs with compiler inserted memory fences shows that only 8% of the executed instances of the memory fences are really necessary to ensure SC. Motivated by this study we propose the conditional fence mechanism, known as C-Fence that utilizes compiler information to decide dynamically if there is a need to stall at each fence, only stalling when necessary. Our experiments with SPLASH-2 benchmarks show that, with C-Fences and a centralized active table, programs can be transformed to enforce SC incurring only 12% slowdown, as opposed to 43% slowdown using normal fence instructions. Our approach requires very little hardware support (<350 bytes of on-chip-storage) and it avoids the use of speculation and its associated costs. Furthermore, to ameliorate the contention in the centralized active table arising from the increasing number of processors, we also design a distributed active table, which further improves the performance of C-Fence for a larger number of processors.  相似文献   

3.
SIGNAL belongs to the synchronous languages family which are widely used in the design of safety-critical real-time systems such as avionics, space systems, and nuclear power plants. This paper reports a compiler prototype for SIGNAL. Compared with the existing SIGNAL compiler, we propose a new intermediate representation (named S-CGA, a variant of clocked guarded actions), to integrate more synchronous programs into our compiler prototype in the future. The front-end of the compiler, i.e., the translation from SIGNAL to S-CGA, is presented. As well, the proof of semantics preservation is mechanized in the theorem prover Coq. Moreover, we present the back-end of the compiler, including sequential code generation and multithreaded code generation with time-predictable properties. With the rising importance of multi-core processors in safetycritical embedded systems or cyber-physical systems (CPS), there is a growing need for model-driven generation of multithreaded code and thus mapping on multi-core. We propose a time-predictable multi-core architecture model in architecture analysis and design language (AADL), and map the multi-threaded code to this model.  相似文献   

4.
面向对象语言在大型并行软件设计和开发上具有巨大的潜力。本文介绍了在网络环境上,我们设计 的面向对象C++并行编译系统OOCPCS的面向对象的大粒度数据流并行模型和总体设计,并讨论了其中一些重要的实现技术。  相似文献   

5.
This paper describes a program transformation technique for functional languages called removing partial parametrization. By transforming functional programs into equivalent ones without partial parametrization, each function is applied to the same number of arguments as its formal parameters. A new method of improving the efficiency of the implementation of functional language is to design the compiler according to the features of the program without partial parametrization. We have used this method in the environment-based implementation of the functional programming language LK. Programs run considerably faster and consume less memory space than traditional ones.  相似文献   

6.
7.
The generation of compiled code for expressions in programming languages such as Icon that support goal-directed evaluation in addition to traditional control structures presents more of a challenge than generating code for traditional imperative programming languages. This paper describes a code-generation technique for translating Icon programs into a traditional high-level language. Translations into both Pascal and C are discussed. However, any language that provides function parameters and recursion is sufficient. The technique described here has been used in the implementation of an optimizing compiler for Icon.  相似文献   

8.
For pt.I. see ibid., p. 170-80. In pt.I, we presented a binding environment for the AND and OR parallel execution of logic programs. This environment was instrumental in rendering a compiler for the AND and OR parallel execution of logic programs machine independent. In this paper, we describe a compiler based on the Reduce-OR process model (ROPM) for the parallel execution of Prolog programs, and provide performance of the compiler on five parallel machines: the Encore Multimax, the Sequent Symmetry, the NCUBE 2, the Intel i860 hypercube and a network of Sun workstations. The compiler is part of a machine independent parallel Prolog development system built on top of a run time environment for parallel programming called the Chare kernel, and runs unchanged on these multiprocessors. In keeping with the objectives behind the ROPM, the compiler supports both on and independent AND parallelism in Prolog programs and is suitable for execution on both shared and nonshared memory machines. We discuss the performance of the Prolog compiler in some detail and describe how grain size can be used to deliver performance that is within 10% of the underlying sequential Prolog compiler on one processor, and scale linearly with increasing number of processors on problems exhibiting sufficient parallelism. The loose coupling between parallel and sequential components makes it possible to use the best available sequential compiler as the sequential component of our compiler  相似文献   

9.
In recent years, the GPU (graphics processing unit) has evolved into an extremely powerful and flexible processor, with it now representing an attractive platform for general-purpose computation. Moreover, changes to the design and programmability of GPUs provide the opportunity to perform general-purpose computation on a GPU (GPGPU). Even though many programming languages, software tools, and libraries have been proposed to facilitate GPGPU programming, the unusual and specific programming model of the GPU remains a significant barrier to writing GPGPU programs. In this paper, we introduce a novel compiler-based approach for GPGPU programming. Compiler directives are used to label code fragments that are to be executed on the GPU. Our GPGPU compiler, Guru, converts the labeled code fragments into ISO-compliant C code that contains appropriate OpenGL and Cg APIs. A native C compiler can then be used to compile it into the executable code for GPU. Our compiler is implemented based on the Open64 compiler infrastructure. Preliminary experimental results from selected benchmarks show that our compiler produces significant performance improvements for programs that exhibit a high degree of data parallelism.  相似文献   

10.
One interpretive approach for handling concurrency is to provide an interpreter instance for each executing language‐level process. Such an approach has mainly been applied to concurrent implementations of logic and functional languages. This paper describes the use of this approach in constructing an interpreter for an imperative, distributed programming language from an existing compiler and run‐time support system (RTS). Primary design goals were to exploit the existing compiler to the extent possible as well as to have minimal impact on the RTS used to support concurrency. We have been successful in meeting these goals. Additionally, performance results show our interpreter's execution times compare favorably to the times required for compilation, linkage, and execution of small programs or programs with a significant number of calls to the RTS; on such programs, our interpreter's performance also compares favorably to that of the standard Java implementation. However, for larger programs and programs with fewer calls to the underlying RTS, the conventional compiler‐based implementation outperforms the interpreter implementation. For many distributed programs in which network costs dominate, the performances of the two implementations differ little. Copyright © 2001 John Wiley & Sons, Ltd.  相似文献   

11.
Mark Rain 《Software》1981,11(3):225-235
The MARY12 language implemented at Penobscot Research Center contains language differences from previous MARY implementations. These differences significantly increase the difficulty of implementing a compiler. Similar constructs have appeared in recent language proposals such as those for ADA. The methods of the MARY/2 compiler should be useful in compilers for these and other future languages. This paper discusses the language constructs which are the source of the difficulty; the implementation methods actually used; possible trade used; and the character of the programs which these constructs facilitate.  相似文献   

12.
The incorporation of global program analysis into recent compilers for Constraint Logic Programming (CLP) languages has greatly improved the efficiency of compiled programs. We present a global analyser based on abstract interpretation. Unlike traditional optimizers, whose designs tend to be ad hoc, the analyser has been designed with flexibility in mind. The analyser is incremental, allowing substantial program transformations by a compiler without requiring redundant re-computation of analysis data. The analyser is also generic in that it can perform a large number of different program analyses. Furthermore, the analyser has an object-oriented design, enabling it to be adapted to different applications easily and allowing it to be used with various CLP languages with simple modifications. As an example of this generality, we sketch the use of the analyser in two different applications involving two distinct CLP languages: an optimizing compiler for CLP(R) programs and an application for detecting occur-check problems in Prolog programs. © 1998 John Wiley & Sons Ltd.  相似文献   

13.
This paper presents source-level transformations that improve the performance of programs using synchronous and asynchronous message passing primitives, including remote call to an active process (rendezvous). It also discusses the applicability of these transformations to shared memory and distributed environments. The transformations presented reduce the need for context switching, simplify the specific form of communication, and/or reduce the complexity of the given form of communication. One additional transformation actually increases the number of processes as well as the number of context switches to improve program performance. These transformations are shown to be generalizable. Results of hand-applying the transformations to SR programs indicate reductions in execution time exceeding 90% in many cases. The transformations also apply to many commonly occurring synchronization patterns and to other concurrent programming languages such as Ada and Concurrent C. The long term goal of this effort is to include such transformations as an otpimization step, performed automatically by a compiler.This work was supported by NSF under Grant Number CCR88-10617.  相似文献   

14.
An increasing number of programming languages, such as Fortran 90, HPF, and APL, provide a rich set of intrinsic array functions and array expressions. These constructs, which constitute an important part of data parallel languages, provide excellent opportunities for compiler optimizations. The synthesis of consecutive array operations or array expressions into a composite access function of the source arrays at compile time has been shown (A. T. Budd, ACM Trans. Programm. Lang. Syst.6 (July 1984), 297–313; G. H. Hwang et al., in “Proc. of ACM SIGPLAN Conference on Principles and Practice of Parallel Programming, 1995,” pp. 112–122) to be an effective scheme for optimizing programs on flat shared memory parallel architectures. It remains, however, to be studied how the synthesis scheme can be incorporated into optimizing HPF-like programs on distributed memory machines by taking into account communication costs. In this paper, we propose solutions to address this issue. We first show how to perform array operation synthesis on HPF programs, and we demonstrate its performance benefits on distributed memory machines with real applications. In addition, to prevent a situation we call “synthesis anomaly,” we present an optimal solution to guide the array synthesis process on distributed memory machines. Due to the optimal problem being NP-hard, we further develop a practical strategy that compilers can use on distributed memory machines with HPF programs. Our synthesis engine is implemented as a Web-based tool, called Syntool, and experimental results show significant performance improvement over the base codes for HPF code fragments from real appli- cations on parallel machines. Our experiments were performed on three distributed memory machines: an 8-node DEC Alpha Farm, a 16-node IBM SP-2, and a 16-node nCUBE/2.  相似文献   

15.
Starting from the seminal work of Volpano and Smith, there has been growing evidence that type systems may be used to enforce confidentiality of programs through non-interference. However, most type systems operate on high-level languages and calculi, and “low-level languages have not received much attention in studies of secure information flow” (Sabelfeld and Myers, [Language-based information-flow security. IEEE Journal on Selected Areas in Communications 2003; 21:5–19]). Therefore, we introduce an information flow type system for a low-level language featuring jumps and calls, and show that the type system enforces termination-insensitive non-interference.Furthermore, information flow type systems for low-level languages should appropriately relate to their counterparts for high-level languages. Therefore, we introduce a compiler from a high-level imperative programming language to our low-level language, and show that the compiler preserves information flow types.  相似文献   

16.
The multi-core trend is widening the gap between programming languages and hardware. Taking parallelism into account in the programs is necessary to improve performance. Unfortunately, current mainstream programming languages fail to provide suitable abstractions to do so. The most common pattern relies on the use of mutexes to ensure mutual exclusion between concurrent accesses to a shared memory. However, this model is error-prone and scales poorly by lack of modularity. Recent research proposes atomic sections as an alternative. The user simply delimits portions of code that should be free from interference. The responsibility for ensuring interference freedom is left either to the compiler or to the run-time system.In order to provide enough modularity, it is necessary that both atomic sections could be nested and threads could be forked inside an atomic section. In this paper we focus on the semantics of programming languages providing these features. More precisely, without being tied to a specific programming language, we consider program traces satisfying some basic well-formedness conditions. Our main contribution is the precise definition of atomicity, well-synchronisation and the proof that the latter implies the strong form of the former. A formalisation of our results in the Coq proof assistant is described.  相似文献   

17.
基于方法调用一般化模型的并行性分析   总被引:3,自引:0,他引:3  
该文给出了一种考虑了面向对象语言的多态和对象引用别名问题的对象方法间并行性的分析方法,这种方法用于面向对象语言并行化中的并行性分析,文中首先给出了一般化的方法调用模型,然后基于该模型给出了表达式化简,过程和过程间分析的算法,该算法可以求出变量的定义和使用集合,由于并行性分析,该文给出的简单例子即可以将该文的和相关的工作加以区别。其技术已经在作者研制的Java并行化编译器JAPS-Ⅱ中实现。  相似文献   

18.
Originally developed with a single language in mind, the JVM is now targeted by numerous programming languages—its automatic memory management, just‐in‐time compilation, and adaptive optimizations—making it an attractive execution platform. However, the garbage collector, the just‐in‐time compiler, and other optimizations and heuristics were designed primarily with the performance of Java programs in mind. Consequently, many of the languages targeting the JVM, and especially the dynamically typed languages, are suffering from performance problems that cannot be simply solved at the JVM side. In this article, we aim to contribute to the understanding of the character of the workloads imposed on the JVM by both dynamically typed and statically typed JVM languages. To this end, we introduce a new set of dynamic metrics for workload characterization, along with an easy‐to‐use toolchain to collect the metrics. We apply the toolchain to applications written in six JVM languages (Java, Scala, Clojure, Jython, JRuby, and JavaScript) and discuss the findings. Given the recently identified importance of inlining for the performance of Scala programs, we also analyze the inlining behavior of the HotSpot JVM when executing bytecode originating from different JVM languages. As a result, we identify several traits in the non‐Java workloads that represent potential opportunities for optimization. © 2015 The Authors. Software: Practice and Experience Published by John Wiley & Sons Ltd.  相似文献   

19.
The inadequacies of conventional parallel languages for programming multicomputers are identified. The C* language is briefly reviewed, and a compiler that translates C* programs into C programs suitable for compilation and execution on a hypercube multicomputer is presented. Results illustrating the efficiency of executing data-parallel programs on a hypercube multicomputer are reported. They show the speedup achieved by three hand-compiled C* programs executing on an N-Cube 3200 multicomputer. The first two programs, Mandelbrot set calculation and matrix multiplication, have a high degree of parallelism and a simple control structure. The C* compiler can generate relatively straightforward code with performance comparable to hand-written C code. Results for a C* program that performs Gaussian elimination with partial pivoting are also presented and discussed  相似文献   

20.
In this paper, we introduce Continuation Passing C (CPC), a programming language for concurrent systems in which native and cooperative threads are unified and presented to the programmer as a single abstraction. The CPC compiler uses a compilation technique, based on the CPS transform, that yields efficient code and an extremely lightweight representation for contexts. We provide a proof of the correctness of our compilation scheme. We show in particular that lambda-lifting, a common compilation technique for functional languages, is also correct in an imperative language like C, under some conditions enforced by the CPC compiler. The current CPC compiler is mature enough to write substantial programs such as Hekate, a highly concurrent BitTorrent seeder. Our benchmark results show that CPC is as efficient, while using significantly less space, as the most efficient thread libraries available.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号