期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

BDL: a specialized language for per-object reactive control

Bertrand F. Augeraud M. 《IEEE transactions on pattern analysis and machine intelligence》1999,25(3):347-362

The problem of describing the concurrent behavior of objects in object oriented languages is addressed. The approach taken is to let methods be the behavior units whose synchronization is controlled separate from their specification. Our proposal is a domain-specific language called BDL for expressing constraints on this control and actually implementing its enforcement. We propose a model where each object includes a so-called “execution controller”, programmed in BDL. This separates cleanly the concepts of what the methods do, the object processes, from the circumstances in which they are allowed to do it, the control. The object controller ensures that scheduling constraints between the object's methods are met. Aggregate objects can be controlled in terms of their components. This language has a convenient formal base. Thus, using BDL expressions, behavioral properties of objects or groups of interesting objects can be verified. Our approach allows, for example, deadlock detection or verification of safety properties, while maintaining a reasonable code size for the running controller. A compiler from BDL has been implemented, automatically generating controller code in an Esterel program, i.e., in a reactive programming language. From this code, the Esterel compiler, in turn, generates an automaton on which verifications are done. Then this automaton is translated into a C code to be executed. This multistage process typifies the method for successful use of a domain-specific language. This also allows high level concurrent programming 相似文献

2.

A Vectorizing Compiler for Multimedia Extensions 总被引：6，自引：0，他引：6

N. Sreraman R. Govindarajan 《International journal of parallel programming》2000,28(4):363-400

In this paper, we present an implementation of a vectorizing C compiler for Intel's MMX (Multimedia Extension). This compiler would identify data parallel sections of the code using scalar and array dependence analysis. To enhance the scope for application of the subword semantics, our compiler performs several code transformations. These include strip mining, scalar expansion, grouping and reduction, and distribution. Thereafter inline assembly instructions corresponding to the data parallel sections are generated. We have used the Stanford University Intermediate Format (SUIF), a public domain compiler tool, for our implementation. We evaluated the performance of the code generated by our compiler for a number of benchmarks. Initial performance results reveal that our compiler generated code produces a reasonable performance improvement (speedup of 2 to 6.5) over the the code generated without the vectorizing transformations/inline assembly. In certain cases, the performance of the compiler generated code is within 85% of the hand-tuned code for MMX architecture. 相似文献

3.

支持有向有环图的微调度方法 总被引：1，自引：0，他引：1

文严治连瑞琦吴承勇冯晓兵张兆庆《计算机研究与发展》2005,42(3):387-393

指令调度是编译器中的重要优化阶段．如何充分利用处理器结构相关的资源,发掘程序并行性,以提高编译优化性能和增强代码可适应性,一直是指令调度的研究难点之一．目前微调度已经取得了一定的效果,但对软件流水产生的有向有环图则未能提供支持．在ORC中提出并实现了一种基于IA-64体系结构的支持有向有环图的微调度方法,有效地减少了程序执行周期和流水线停顿,取得了较为满意的编译优化性能．相似文献

4.

ESUIF: An Open Esterel Compiler

Stephen A. Edwards 《Electronic Notes in Theoretical Computer Science》2002,65(5)

I describe a new compiler infrastructure for imperative synchronous languages such as Esterel and E↕. Built on the S〉{ 2 system, it includes a new intermediate representation for this class of languages that has simple semantics designed for easy implementation in hardware or software. I describe the structure of this new compiler, the intermediate representation, and how Esterel source is translated into this intermediate representation. 相似文献

5.

State your position

Borja C.A. Mirats Tur J.M. Gordillo J.L. 《Robotics & Automation Magazine, IEEE》2009,16(2):82-90

Autonomous vehicles (AVs) andmobile robots are developed to carry out specific tasks autonomously, ideally with no human intervention at all [1], [2]. These kinds of vehicles, whether AVs or mobile robots, has generated great interest in recent years, for instance, because of their capacity to work in remote or harmful environments where humans cannot enter because of the extreme operating conditions [3].We are differentiating amobile robot (designed from scratch for a specific task) from an AV (a commercial vehicle with proper sensory and control systems added so as to be autonomous). However, in the sequel, for the sake of simplicity, we will use indistinctly one or the other term. Several applications are found in the literature, varying from material transportation in a common industrial environment [4] to the exciting exploration of a far planet surface [5], [6]. 相似文献

6.

Automatically Partitioning Threads for Multithreaded Architectures

《Journal of Parallel and Distributed Computing》1999,58(2):159-189

There is an enormous amount of parallelism exposed to fine-grain multithreaded architectures to cover latencies. It is a demanding task for a multithreading programmer to manage such a degree of parallelism by hand. To use multithreaded architectures efficiently it is essential to have compiler support for automatically partitioning programs into threads. This paper solves a fundamental problem in compiling for multithreaded architectures, automatically partitioning a program into threads. The focus of such partitioning is to overlap the remote communication latency and minimize the total execution time. We first formulate the partitioning problem based on a multithreaded execution cost model. Then, we prove such a formulation is NP-hard. Therefore, we propose two heuristic thread-partitioning methods to solve this problem in practice. The advanced partitioning algorithm is a novel extension of list scheduling, and it takes advantage of the cost model to generate near-optimum partitioning results. The remote-path-based partitioning algorithm is a simplified version of the advanced one but it is easy for compiler implementation. The two partitioning algorithms were implemented respectively in a thread partitioning testbed and a research EARTH-C compiler. The experimental results show that both partitioning algorithms are effective to generate efficient threaded code, and code generated by the compiler is comparable to hand-written code. 相似文献

7.

A performance debugging tool for high performance Fortran programs

Takashi Suzuoka Jaspal Subhlok Thomas Gross 《Concurrency and Computation》1997,9(10):927-945

Parallel languages allow the programmer to express parallelism at a high level. The management of parallelism and the generation of interprocessor communication is left to the compiler and the runtime system. This approach to parallel programming is particularly attractive if a suitable widely accepted parallel language is available. High Performance Fortran (HPF) has emerged as the first popular machine independent parallel language, and remarkable progress has been made towards compiling HPF efficiently. However, the performance of HPF programs is often poor and unpredictable, and obtaining adequate performance is a major stumbling block that must be overcome if HPF is to gain widespread acceptance. The programmer is often in the dark about how to improve the performance of an HPF program since poor performance can be attributed to a variety of reasons, including poor choice of algorithm, limited use of parallelism, or an inefficient data mapping. This paper presents a profiling tool that allows the programmer to identify the regions of the program that execute inefficiently, and to focus on the potential causes of poor performance. The central idea is to distinguish the code that is executing efficiently from the code that is executing poorly. Efficient code uses all processors of a parallel system to make progress, while inefficient code causes processors to wait, execute replicated code, idle, communicate, or perform compiler bookkeeping. We designate the latter code as non-scalable, since adding more processors generally does not lead to improved performance for such code. By analogy, the former code is called scalable. The tool presented here separates a program into scalable and non-scalable components and identifies the causes of non-scalability of different components. We show that compiler information is the key to dividing the execution times into logical categories that are meaningful to the programmer. We present the design and implementation of a profiler that is integrated with Fx, a compiler for a variant of HPF. The paper includes two examples that demonstrate how the data reported by the profiler are used to identify and resolve performance bugs in parallel programs. © 1997 John Wiley & Sons, Ltd. 相似文献

8.

Fixing Letrec: A Faithful Yet Efficient Implementation of Scheme's Recursive Binding Construct

Oscar Waddell Dipanwita Sarkar R. Kent Dybvig 《Higher-Order and Symbolic Computation》2005,18(3-4):299-326

A Scheme letrec expression is easily converted into more primitive constructs via a straightforward transformation given in the Revised⁵ Report. This transformation, unfortunately, introduces assignments that can impede the generation of efficient code. This article presents a more judicious transformation that preserves the semantics of the revised report transformation and also detects invalid references and assignments to left-hand-side variables, yet enables the compiler to generate efficient code. A variant of letrec that enforces left-to-right evaluation of bindings is also presented and shown to add virtually no overhead. A preliminary version of this article was presented at the 2002 Workshop on Scheme and Functional Programming [15]. 相似文献

9.

Performance debugging of Esterel specifications

Lei Ju Bach Khoa Huynh Abhik Roychoudhury Samarjit Chakraborty 《Real-Time Systems》2012,48(5):570-600

Synchronous languages like Esterel have been widely adopted for designing reactive systems in safety-critical domains such as avionics. Specifications written in Esterel are based on the underlying ??synchrony hypothesis??, which needs to be validated when Esterel specifications get compiled to real implementations (such as C code). In this work, we present a model-driven and architecture-aware timing analysis framework for C code generated from Esterel and executed on general-purpose processors. By integrating model-level information into the traditional timing analysis, we can efficiently compute accurate time estimates via systematically eliminating a large number of infeasible paths in the generated code. Experimental results show that with our proposed intermediate representation level infeasible path analysis in the model compilation, we obtain up to 16.1?% tighter WCET estimates compared to the traditional assembly code level infeasible path detection with substantially less analysis time. Furthermore, by maintaining the traceability links between Esterel specifications and the generated C code, we are able to map the time-critical computations at the C-level back to the Esterel-level. 相似文献

10.

Increasing autonomy of UAVs

How J.P. Fraser C. Kulling K.C. Bertuccelli L.F. 《Robotics & Automation Magazine, IEEE》2009,16(2):43-51

Unmanned aerial vehicles (UAVs) are acquiring an increased level of autonomy as more complexmission scenarios are envisioned [1]. For example, UAVs are being used for intelligence, surveillance, and reconnaissance missions as well as to assist humans in the detection and localization of wildfires [2], tracking of moving vehicles along roads [3], [4], and performing border patrol missions [5]. A critical component for networks of autonomous vehicles is the ability to detect and localize targets of interest in a dynamic and unknown environment. The success of these missions hinges on the ability of the algorithms to appropriately handle the uncertainty in the information of the dynamic environment and the ability to cope with the potentially large amounts of communicated data that will need to be broadcast to synchronize information across networks of vehicles. Because of their relative simplicity, centralized mission management algorithms have previously been developed to create a conflict-free task assignment (TA) across all vehicles. However, these algorithms are often slow to react to changes in the fleet and environment and require high bandwidth communication to ensure a consistent situational awareness (SA) from distributed sensors and also to transmit detailed plans back to those sensors. 相似文献

11.

Scanning parameterized polyhedron using Fourier-Motzkin elimination

Marc Le Fur 《Concurrency and Computation》1996,8(6):445-460

The paper presents two algorithms for computing a control structure whose execution enumerates the integer vectors of a parameterized polyhedron defined in a given context. Both algorithms reconsider the successive projection method, based on Fourier-Motzkin pairwise elimination, defined by Ancourt and Irigoin. The way redundant constraints are removed in their algorithm is revisited in order to improve the computation time for the enumeration code of higher order polyhedrons as well as their execution time. The algorithms presented here are at the root of the code generation in the HPF compiler PANDORE developed IRISA, France; a comparison of these algorithms with the one defined by Ancourt and Irigoin is given in the class of polyhedrons manipulated by the PANDORE compiler. 相似文献

12.

Path Analysis and Renaming for Predicated Instruction Scheduling

Lori Carter Beth Simon Brad Calder Larry Carter Jeanne Ferrante 《International journal of parallel programming》2000,28(6):563-588

Increases in instruction level parallelism are needed to exploit the potential parallelism available in future wide issue architectures. Predicated execution is an architectural mechanism that increases instruction level parallelism by removing branches and allowing simultaneous execution of multiple paths of control, only committing instructions from the correct path. In order for the compiler to expose and use such parallelism, traditional compiler data-flow and path analysis needs to be extended to predicated code. In this paper, we motivate the need for renaming and for predicates that reflect path information. We present Predicated Static Single Assignment (PSSA) which uses renaming and introduces Full -Path Predicates to remove false dependences and enable aggressive predicated optimization and instruction scheduling. We demonstrate the usefulness of PSSA for Predicated Speculation and Control Height Reduction. These two predicated code optimizations used during instruction scheduling reduce the dependence length of the critical paths through a predicated region. Our results show that using PSSA to enable speculation and control height reduction reduces execution time from 12 to 68%. 相似文献

13.

A MATLAB subset to C compiler targeting embedded systems

下载免费PDF全文

João Bispo João M. P. Cardoso 《Software》2017,47(2):249-272

This paper describes MATISSE, a compiler able to translate a MATLAB subset to C targeting embedded systems. MATISSE uses LARA, an aspect‐oriented programming language, to specify additional information and transformations to the input MATLAB code, for example, insertion of code for initialization of variables, and specification of types and shapes of variables. The compiler is being developed bearing in mind flexibility, multitarget and multitoolchain support, allowing for the generation of several implementations in C from the same reference code in MATLAB. In this paper, we also present a number of techniques being employed in MATLAB to C compilation, such as element‐wise mapping operations, matrix views, weak types, and intrinsics. We validate these techniques using MATISSE and a set of representative benchmarks. More specifically, we evaluate the compiler with a set of 31 benchmarks using an embedded system board and a desktop computer. The results show speedups up to 1.8× by employing information provided by LARA aspects, when compared with C code generated without additional user information. When compared with the execution time of the original code running on MATLAB, the execution time of the generated C code achieved a geometric mean speedup of 13×. Copyright © 2016 John Wiley & Sons, Ltd. 相似文献

14.

Type-based initialization analysis of a synchronous dataflow language

Jean-Louis?Cola?o Email author Marc?Pouzet 《International Journal on Software Tools for Technology Transfer (STTT)》2004,6(3):245-255

One of the appreciated features of the synchronous dataflow approach is that a program defines a perfectly deterministic behavior. But the use of the delay primitive leads to undefined values at the first cycle; thus a dataflow program is really deterministic only if it can be shown that such undefined values do not affect the behavior of the system.This paper presents an initialization analysis that guarantees the deterministic behavior of programs. This property being undecidable in general, the paper proposes a safe approximation of the property, precise enough for most dataflow programs. This analysis is a one-bit analysis – expressions are either initialized or uninitialized – and is defined as an inference-type system with subtyping constraints. This analysis has been implemented in the Lucid Synchrone compiler and in a new Scade-Lustre prototype compiler at Esterel Technologies. The analysis gives very good results in practice. 相似文献

15.

Code compaction for parallel architectures

Kasi Anantha Fred Long 《Software》1990,20(6):537-554

There are two principal methods used to exploit the parallelism available on a parallel machine: the program to be executed can be optimized by hand, or the program can be automatically converted to parallel machine code by a compiler. The first method usually derives parallelism at the procedure level; a parallel program is written in a high-level language and typically has various modules executing in parallel. By contrast, the compiler methodically transforms the program into parallel code using various transformations, such as code movement. The automatic conversion of a program to parallel code is called compaction or parallelization. This paper describes the evolution of a new compaction program and presents a new algorithm for determining legal code movements. A simulator of the target architecture was used to estimate the execution times of a sample suite of programs before and after compaction. The results verify that substantial advantages arise from applying this compaction technique. 相似文献

16.

Two decades of live coding and debugging of virtual machines through simulation

Daniel Ingalls Eliot Miranda Clément Béra Elisa Gonzalez Boix 《Software》2020,50(9):1629-1650

OpenSmalltalk-VM is a virtual machine (VM) for languages in the Smalltalk family (eg, Squeak and Pharo), which is itself written in a subset of Smalltalk that can easily be translated to C. VM development is done in Smalltalk, an activity we call “simulation.” The production VM is then derived by translating the core VM code to C. As a result, two execution models coexist: simulation, where the Smalltalk code is executed on top of a Smalltalk VM, and production, where the same code is compiled to an executable through a C compiler. The whole VM execution can be simulated: the heap is represented as a huge byte array, the VM code is executed as Smalltalk, and the native code generated by the just-in-time (JIT) compiler is executed by a processor simulator. All the Smalltalk development tools, such as the debugger, are then available while simulating. In addition, in simulation, it is also possible to use debugging features such as single stepping in the machine code generated by the JIT compiler. The Smalltalk development tools combined with the simulation debugging features provide developers with a productive environment in which to extend and debug the VM. In this article, we detail the VM simulation infrastructure and report our experiences developing and debugging VM features within it such as the garbage collector and the JIT compiler. 相似文献

17.

Translation and Run-Time Validation of Optimized Code

Lenore Zuck Amir Pnueli Yi Fang Benjamin Goldberg Ying Hu 《Electronic Notes in Theoretical Computer Science》2002,70(4):179-200

The paper presents approaches to the validation of optimizing compilers. The emphasis is on aggressive and architecture-targeted optimizations which try to obtain the highest performance from modern architectures, in particular EPIC-like micro-processors. Rather than verify the compiler, the approach of translation validation performs a validation check after every run of the compiler, producing a formal proof that the produced target code is a correct implementation of the source code.First we survey the standard approach to validation of optimizations which preserve the loop structure of the code (though they may move code in and out of loops and radically modify individual statements), present a simulation-based general technique for validating such optimizations, and describe a tool, VOC-64, which implements these technique. For more aggressive optimizations which, typically, alter the loop structure of the code, such as loop distribution and fusion, loop tiling, and loop interchanges, we present a set of permutation rules which establish that the transformed code satisfies all the implied data dependencies necessary for the validity of the considered transformation. We describe the necessary extensions to the VOC-64 in order to validate these structure-modifying optimizations.Finally, the paper discusses preliminary work on run-time validation of speculative loop optimizations, that involves using run-time tests to ensure the correctness of loop optimizations which neither the compiler nor compiler-validation techniques can guarantee the correctness of. Unlike compiler validation, run-time validation has not only the task of determining when an optimization has generated incorrect code, but also has the task of recovering from the optimization without aborting the program or producing an incorrect result. This technique has been applied to several loop optimizations, including loop interchange, loop tiling, and software pipelining and appears to be quite promising. 相似文献

18.

A comparison of two Fortran dialects for expressing parallel solutions for a problem in linear algebra

M. Clint J. S. Weston C. W. Bleakney 《Parallel Computing》1992,18(12):1325-1333

Recently, AMT has issued an extended version of Fortran Plus [1] which allows software to be developed without the developer needing to take explicit accout of the grid size of the target processor. Fortran-Plus and its extension, Fortran Plus Enhanced [2], have been developed for use on the AMT DAP 510 array processor. This machine has 1024 processors arranged in a square grid with nearest neighbour and wraparound connections. It is interesting to enquire whether the performance of code generated by the Fortran-Plus Enhanced compiler is, for a particular application, superior to that generated by the Fortran-Plus compiler from a program which recognises and is tailored to fit the characteristic features of the DAP 510. In this paper the performances of two implementations of an algorithm for the eigensolution of real tridiagonal symmetric matrices are compared. The algorithm is characterised by its heavy use of matrix operations, all of which can be efficiently implemented on an array processor. Some of the constituent operations commonly occur in other applications while others are specific to the problem being addressed. 相似文献

19.

Dynamic Event Generation for Runtime Checking using the JDI

Mark Brrkens Michael Mller 《Electronic Notes in Theoretical Computer Science》2002,70(4)

Approaches to runtime checking have to track the execution of a software system and therefore have to deal with generating and processing execution events. Often these techniques are applied at the code level – either by inserting new source code prior to the compilation or by modifying the target code, e.g. Java byte code, before running the program.The jassda [4,3] framework and tool enable runtime checking of Java programs against a CSP-like specification. For generating events it uses the Java Debug Interface (JDI) and thus no modifications to the code are necessary. Another advantage is that events are generated on demand, i.e. dynamically at runtime it is determined which events to generate for the current debug run without modifying the program itself. This paper shows how this event generation is done by the jassda framework. 相似文献

20.

Using SPEC CPU2006 to evaluate the sequential and parallel code generated by commercial and open-source compilers

Aldea Sergio Llanos Diego R. González-Escribano Arturo 《The Journal of supercomputing》2012,59(1):486-498

相似文献