首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 625 毫秒
1.
Object-oriented languages have suffered from poor performance caused by frequent and slow dynamically-bound procedure calls. The best way to speed up a procedure call is to compile it out, but dynamic binding of object-oriented procedure calls without static receiver type information precludes inlining.Iterative type analysis andextended message splitting are new compilation techniques that extract much of the necessary type information and make it possible to hoist run-time type tests out of loops.Our system compiles code on-the-fly that is customized to the actual data types used by a running program. The compiler constructs a control flow graph annotated with type information by simultaneously performing type analysis and inlining. Extended message splitting preserves type information that would otherwise be lost by a control-flow merge by duplicating all the code between the merge and the place that uses the information. Iterative type analysis computes the types of variables used in a loop by repeatedly recompiling the loop until the computed types reach a fix-point. Together these two techniques enable our SELF compiler to split off a copy of an entire loop, optimized for the common-case types.By the time our SELF compiler generates code for the graph, it has eliminated many dynamically-dispatched procedure calls and type tests. The resulting machine code is twice as fast as that generated by the previous SELF compiler, four times faster than ParcPlace Systems Smalltalk-80, the fastest commercially available dynamically-typed object-oriented language implementation, and nearly half the speed of optimized C. Iterative type analysis and extended message splitting have cut the performance penalty for dynamically-typed object-oriented languages in half.This work has been generously supported by National Science Foundation Presidential Young Investigator Grant #CCR-8657631, and by Sun Microsystems, IBM, Apple Computer, Tandem Computers, NCR, Texas Instruments, the Powell Foundation, and DEC.This paper was originally published in theProceedings of the ACM SIGPLAN '90 Conference on Programming Language Design and Implementation (SIGPLAN Notices, 25, 6 (1990) 150–160).  相似文献   

2.
Many embedded Java platforms execute two types of Java classes: those installed statically on the client device and those downloaded dynamically from service providers at run time. For achieving higher performance, the static Java classes can be compiled into machine code by ahead‐of‐time compiler (AOTC) in the server, and the translated machine code can be installed on the client device. Unfortunately, AOTC cannot be applicable to the dynamically downloaded classes. This paper proposes client‐AOTC (c‐AOTC), which performs AOTC on the client device using the just‐in‐time compiler (JITC) module installed on the device, obviating the JITC overhead and complementing the server‐AOTC. The machine code of a method translated by JITC is cached on a persistent memory of the device, and when the method is invoked again in a later run of the program, the machine code is loaded and executed directly without any translation overhead. A major issue in c‐AOTC is relocation because some of the address constants embedded in the cached machine code are not correct when the machine code is loaded and used in a different run; those addresses should be corrected before they are used. Constant pool resolution and inlining complicate the relocation problem, and we propose our solutions. The persistent memory overhead for saving the relocation information is also an issue, and we propose a technique to encode the relocation information and compress the machine code efficiently. We developed a c‐AOTC on Sun's CDC VM reference implementation, and our evaluation results indicate that c‐AOTC can improve the performance significantly, as much as an average of 12% for EEMBC and 4% for SpecJVM98, with a persistent memory overhead of 1% on average. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

3.
We present the design and evaluation of a new data-race-detection technique. Our technique executes at runtime rather than post-mortem, and handles unmodified shared-memory applications that run on top of CVM, a software distributed shared memory system. We do not assume explicit associations between synchronization and shared data, and require neither compiler support nor program source. Instead, we use a binary code re-writer to instrument instructions that may access shared memory. The most novel aspect of our system is that we are able to use information from the underlying memory system implementation in order to reduce the number of comparisons made at runtime. We present an experimental evaluation of our techniques by using our system to look for data races in five common shared-memory programs. We quantify the effect of several optimizations to the basic technique: data flow analysis, instrumentation batching, runtime code modification, and instrumentation inlining. Our system correctly found races in three of the five programs, including two from a standard benchmark suite. The slowdown of this debugging technique averages less than 2.5 for our applications.  相似文献   

4.
Frequently invoked large functions are common in non‐numeric applications. These large functions present challenges to modern compilers not only because they require more time and resources at compilation time, but also because they may prevent optimizations such as function inlining. Often large portions of the code in a hot function fhost are executed much less frequently than fhost itself. Partial inlining is a natural solution to the problems caused by including cold code segments that are seldom executed into hot functions that are frequently invoked. When applying partial inlining, a compiler outlines cold statements from a hot function fhost. After outlining, fhost becomes smaller and thus can be easily inlined. This paper presents Ablego, a framework for function outlining and partial inlining that includes several innovations: (1) an abstract‐syntax‐tree‐based analysis and transformation to form cold regions for outlining; (2) a set of flexible heuristics to control the aggressiveness of function outlining; (3) several possible function outlining strategies; (4) explicit variable spilling, a new technique that overcomes negative side‐effects of function outlining. With the proper strategy, partial inlining improves performance by up to 5.75%. A performance study also suggests that partial inlining's effect on enabling more aggressive inlining is limited. The performance improvement from partial inlining actually comes from better code placement and better code generation. Copyright © 2006 John Wiley & Sons, Ltd.  相似文献   

5.
This paper describes an experimental message-driven programming system for fine-grain multicomputers. The initial target architecture is the J-machine designed at MIT. This machine combines a unique collection of architectural features that include fine-grain processes, on-chip associative memory; and hardware support for process synchronization. The programming system uses these mechanisms via a simple message-driven process model that blurs the distinction between processes and messages: messages correspond to processes that are executed elsewhere in the network. This model allows code and data to be distributed across the computers in the machine, and is supported at every stage of the program development cycle. The prototype system we have developed includes a basic set of programming tools to support the model; these include a compiler, linker, archiver, loader and microkernel. Although the concepts are language independent, our prototype system is based on GNU-C.  相似文献   

6.
面向IXP网络处理器的内联优化   总被引:1,自引:0,他引:1  
内联优化是一种有效的编译优化技术,它通过将函数体直接嵌入到调用点来消除函数调用开销。然而,网络处理器特殊的体系结构对内联优化提出了新的要求,需要新的技术辅助传统内联优化来更好地适应这种特殊的体系结构。本文描述了如何利用关键路径提取技术和迭代编译技术对传统内联优化技术进行扩充和改造,来更好地适应IXP体系结构。实验数据表明,改进后的内联优化能够有效地提高网络系统的性能。  相似文献   

7.
Current programming languages do not offer adequate abstractions to discover and compose heterogenous objects over unreliable networks. This forces programmers to discover objects one by one, compose them manually, and keep track of their individual connectivity state at all times. In this paper we propose Ambient Contracts, a novel programming abstraction to deal with the difficulties of composing objects connected over unreliable networks. Ambient Contracts provide declarative heterogenous group discovery and composition while dealing with the unreliability of the network. An ambient contract allows runtime verification and enforcement of the messages sent between the participants in the contract. The use of our abstraction significantly reduces the code base and allows programmers to focus on the core functionality of their application. Our claims are reinforced by comparing the implementation of an example scenario in our contracts with a Java implementation using M2MI.  相似文献   

8.
9.

The most recent and advanced implementation of constraint handling rules (CHR) is introduced in a logic programming language. The Prolog implementation consists of a runtime system and a compiler. The runtime system utilizes attributed variables for the realization of the constraint store with efficient retrieval and update mechanisms. Rules describing the interactions between constraints are compiled into Prolog clauses by a compiler, the core of which comprises a small number of compact code generating templates in the form of definite clause grammar rules.  相似文献   

10.
This paper introduces a new load balancing and communication minimizing heuristic used in theInverse Remote Procedure Call(IRPC) system. While the paper briefly describes the IRPC system, the focus is on the newIRPCassignment heuristic. The IRPC compiler maps a distributed program to a graph that represents program objects and their dependencies (due to invocations and parameter passing) as nodes and edges, respectively. In the graph, the system preserves conditional and iterative flows, records network transmission and execution costs, and marks nodes that have to reside at specific network sites. The graph is then partitioned by the heuristic to derive a (sub)optimal node assignment to network sites minimizing load balancing and network data transport. The resulting program partition is then reflected in the physical object distribution, and remote and local object communication is transparently implemented. The compiler and run-time system use efficient implementation techniques such as type prediction, inlining, splitting and subprogram passing. The last of these allows remote code to be copied to local data, as an alternative to copying data to the remote site, whenever this will reduce network data transport. The IRPC graph partitioning heuristic operates in timeO(E(logd+l+ logM)), whereMis the number of network sites,Eis the number of communication edges, anddis the maximum degree of a node;lis a parameter of the algorithm, and can vary between 1 andN, whereNis the number of communicating objects. This complexity is more nearly independent ofM, and considerably better in terms ofEandN, than that of previously known related algorithms, such as A*, which employs backtracking and is potentially exponential, or the max-flow/min-cut class of network flow algorithms or heuristics which tend to be at least of Ω(MN2E), and it can be made (by choosinglappropriately) as efficient as even such fast heuristics as heaviest-edge-first, minimal communication, and Kernighan–Lin. In an extensive quantitative evaluation, the heuristic has been demonstrated to perform very well, giving on the average 75% traffic cost reductions for over 95% of the programs when compared to random partitioning, and outperforming in cost reduction and actual execution time the three aforementioned fast heuristics, even with a largel. Thus, to the best of our knowledge, this is the first report of a well-performing assignment heuristic that is bothessentially linearin the number of communication edges, andbetterthan existing, established heuristics of no better complexity.  相似文献   

11.
Multiprocessor execution of functional programs   总被引:1,自引:0,他引:1  
Functional languages have recently gained attention as vehicles for programming in a concise and elegant manner. In addition, it has been suggested that functional programming provides a natural methodology for programming multiprocessor computers. This paper describes research that was performed to demonstrate that multiprocessor execution of functional programs on current multiprocessors is feasible, and results in a significant reduction in their execution times.Two implementations of the functional language ALFL were built on commercially available multiprocessors.Alfalfa is an implementation on the Intel iPSC hypercube multiprocessor, andBuckwheat is an implementation on the Encore Multimax shared-memory multiprocessor. Each implementation includes a compiler that performs automatic decomposition of ALFL programs and a run-time system that supports their execution. The compiler is responsible for detecting the inherent parallelism in a program, and decomposing the program into a collection of tasks, calledserial combinators, that can be executed in parallel.The abstract machine model supported by Alfalfa and Buckwheat is calledheterogeneous graph reduction, which is a hybrid of graph reduction and conventional stack-oriented execution. This model supports parallelism, lazy evaluation, and highe order functions while at the same time making efficient use of the processors in the system. The Alfalfa and Buckwheat runtime systems support dynamic load balancing, interprocessor communication (if required), and storage management. A large number of experiments were performed on Alfalfa and Buckwheat for a variety of programs. The results of these experiments, as well as the conclusions drawn from them, are presented.This research was supported in part by National Science Foundation grants DCR-8302018 and DCR-8521451, by a DARPA subcontract with SDC/Unisys, and by gifts from Burroughs Austin Research Center and the Intel Corporation.  相似文献   

12.
This paper examines various issues that pertain to the access-control mechanism of the program component manager. We examine such questions as what constitutes an access-right, what kind of information about managed resources does one need, how revocation keys, right-sets, and exception conditions are handled, etc. It is argued that most of these issues can be handled by the compiler with no explicit programming required. This simplifies the task of programming and enhances reliability. A possible method for handling these issues which was adopted in our implementation is also described.  相似文献   

13.
Simulation models involve the concepts oftime andspace. In designing a distribution simulation programming system, introducing a temporal construct results in a specification language for describing a changing world, introducing a spatial construct makes it possible to coordinate multiple, simultaneous, nondeterministic activities.In this paper, we present a new distributed logic programming model and discuss its implementation. A distributed program is represented by avirtual space—a set of process which are logical representations of system objects, and is evaluated with respect tovirtual time—a temporal coordinate which is used to measure computational progress and specify synchronization. The major focus of the implemention is the ability to accomplish global backtracking. The proposed implementation collects global knowledge through interprocess communication, controls global backtracking distributedly according tovirtual time anddependency relations, and capture heuristics in that earlier synchronizations may make subsequent synchronizations more likely to succeed.As compared with other distributed logic programming systems, our system provides a simpler syntax, well-defined semantics, and an efficient implementation.  相似文献   

14.
Types in current programming languages specify constant sets of messages always acceptable throughout the lifetime of the types’ instances. However, especially in concurrent object-oriented systems, the acceptability of messages often changes with the objects’ states. We propose a typed concurrent object calculus where static type checking ensures that users send only acceptable messages although message acceptability may change dynamically. The programmer specifies in types predictable state changes and dependences of message acceptance on states; a compiler infers the needed state information. This state inference has polynomial time complexity and can be used together with subtyping.  相似文献   

15.
A pointer logic and certifying compiler   总被引:6,自引:0,他引:6  
Proof-Carrying Code brings two big challenges to the research field of programming languages. One is to seek more expressive logics or type systems to specify or reason about the properties of low-level or high-level programs. The other is to study the technology of certifying compilation in which the compiler generates proofs for programs with annotations. This paper presents our progress in the above two aspects. A pointer logic was designed for PointerC (a C-like programming language) in our research. As an extension of Hoare logic, our pointer logic expresses the change of pointer information for each statement in its inference rules to support program verification. Meanwhile, based on the ideas from CAP (Certified Assembly Programming) and SCAP (Stack-based Certified Assembly Programming), a reasoning framework was built to verify the properties of object code in a Hoare style. And a certifying compiler prototype for PointerC was implemented based on this framework. The main contribution of this paper is the design of the pointer logic and the implementation of the certifying compiler prototype. In our certifying compiler, the source language contains rich pointer types and operations and also supports dynamic storage allocation and deallocation.  相似文献   

16.
Starting from the seminal work of Volpano and Smith, there has been growing evidence that type systems may be used to enforce confidentiality of programs through non-interference. However, most type systems operate on high-level languages and calculi, and “low-level languages have not received much attention in studies of secure information flow” (Sabelfeld and Myers, [Language-based information-flow security. IEEE Journal on Selected Areas in Communications 2003; 21:5–19]). Therefore, we introduce an information flow type system for a low-level language featuring jumps and calls, and show that the type system enforces termination-insensitive non-interference.Furthermore, information flow type systems for low-level languages should appropriately relate to their counterparts for high-level languages. Therefore, we introduce a compiler from a high-level imperative programming language to our low-level language, and show that the compiler preserves information flow types.  相似文献   

17.
事务存储并行程序编程接口研究   总被引:1,自引:0,他引:1       下载免费PDF全文
事务存储并行程序编程接口按照实现方式和实现层次的不同,分为三种形式:库函数接口、语言扩展和编译器指导命令。本文以RSTM、英特尔C/C++软件事务存储编译器原型和OpenTM为例,讨论了三种事务存储编程接口的特点,对OpenTM编程接口进行了扩展和完善,并对未来编程接口的发展进行了展望。  相似文献   

18.
A general strategy is presented for multiprocessing that combines programming technique, machine architecture, and performance estimation. The programmer decomposes an application into manipulations of protocol-based programming primitives (protocols) usingPlans andscenarios from software engineering. The programmer may select from generic protocols, which include shared-memory locations and messages, or may build his own. A system architecture that supports efficient emlation of protocols is presented along with a method of estimating program performance based on network characteristics. Results are given from a protocol-based operating system on the 64 processor BTL Hypercube multiprocessor.  相似文献   

19.
C++ uses inheritance as a substitute for subtype polymorphism. We give examples where this makes the type system too inflexible. We then describe a conservative language extension that allows a programmer to define an abstract type hierarchy independent of any implementation hierarchies, to retroactively abstract over an implementation, and to decouple subtyping from inheritance. This extension gives the user more of the flexibility of dynamic typing while retaining the efficiency and security of static typing. With default implementations and views flexible mechanisms are provided for implementing an abstract type by different concrete class types. We first show how the language extension can be implemented in a preprocessor to a C++ compiler, and then detail and analyse the efficiency of an implementation we directly incorporated in the GNU C++ compiler.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号