首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 734 毫秒
1.
This paper describes a system for compressed code generation. The code of applications is partioned into time‐critical and non‐time‐critical code. Critical code is compiled to native code, and non‐critical code is compiled to a very dense virtual instruction set which is executed on a highly optimized interpreter. The system employs dictionary‐based compression by means of superinstructions which correspond to patterns of frequently used base instructions. The code compression system is designed for the Philips TriMedia VLIW processor. The interpreter is pipelined to achieve a high interpretation speed. The pipeline consists of three stages: fetch, decode, and execute. While one instruction is being executed, the next instruction is decoded, and the next one after that is fetched from memory. On a TriMedia VLIW with a load latency of three cycles and a jump latency of four cycles, the interpreter achieves a peak performance of four cycles per instruction and a sustained performance of 6.27 cycles per instruction. Experiments are described that demonstrate the compression quality of the system and the execution speed of the pipelined interpreter; these were found to be about five times more compact than native TriMedia code and a slowdown of about eight times, respectively. Copyright © 1999 John Wiley & Sons, Ltd.  相似文献   

2.
Binary translation attempts to emulate one instruction set with another on the same or different platforms. The important technique is widely used in modern software. Vector and floating‐point instructions are widely used in many applications, including multimedia, graphics, and gaming. Although these instructions are usually simulated with software in a binary translator, it is important to support them such that the host single‐instruction, multiple‐data (SIMD) and floating‐point hardware are efficiently used during emulation. We report our design and implementation of the emulation of ARM Neon and vector floating point (VFP) instructions in the machine‐code‐to‐low‐level‐virtual‐machine (MC2LLVM) binary translator. The Neon and VFP instructions are first translated into carefully chosen sequences of LLVM intermediate representation (IR), and later, the IR sequences are optimized and translated into the host native binary by the existing LLVM backend. Because MC2LLVM makes use of the vector and floating‐point types in LLVM IR, the generated host native binary can take full advantage of the vector and floating‐point functional units, if present, of the host machine. To be fully compliant with Neon and VFP instruction sets, all the features are supported, including the flush‐to‐zero mode, default not a number mode, and floating‐point exceptions. The experimental results show that code generated by MC2LLVM with the Neon and VFP extensions achieves an average speedup of 1.174× in SPEC 2006 benchmark suites and exhibits a floating‐point throughput of 12.05× in LINPACK, compared with code generated by MC2LLVM without the Neon and VFP extensions. Furthermore, MC2LLVM is 3.36× faster than QEMU for processing Neon/VFP instructions. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

3.
Portable mobile code is often executed by a host virtual machine using just‐in‐time compilation. In this context, the compilation time in the host virtual machine is critical. This compilation time can be reduced if optimizations are performed ahead‐of‐time before distribution of the mobile code. Unfortunately, the portable nature of mobile code limits ahead‐of‐time optimizations to those that are machine‐independent. This work examines the effect of machine‐independent optimizations on the performance of mobile code applications. All experiments use the SafeTSA Format, a mobile code format that is based on Static Single Assignment Form (SSA Form). The experiments, which are performed on both the PowerPC and IA32 architectures, indicate that the effects of performing classical machine‐independent optimizations are—in fact—quite machine‐dependent. Nevertheless, the results demonstrate that applying such optimizations in a mobile code system can be beneficial. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

4.
The Dalvik virtual machine (VM) is an integral component used to execute applications in Android, which is one of the leading operating systems for mobile devices. The Dalvik VM is an interpreter and is equipped with a trace‐based just‐in‐time compiler for enhancing the execution performance of frequently executed paths, or traces. However, traces generated by the Dalvik VM can be stopped in a conditional branch or a method call/return, which means that these traces usually have a short lifetime, decreasing the effectiveness of the compiler optimizations applied to them. Furthermore, the just‐in‐time compiler applies only a few simple optimizations because of performance considerations. In this article we present a traces‐to‐region (T2R) framework that extends traces to regions and statically compiles these regions into native binaries so as to improve the execution of Android applications. The T2R framework involves three main stages: (i) the profiling stage, in which the run‐time trace information of an application is extracted; (ii) the compilation stage, in which regions are constructed from the extracted traces and are statically compiled into a native binary; and (iii) the execution stage, in which the compiled binary is loaded into the code cache when the application starts to execute. Experiments performed on an Android tablet demonstrated that the T2R framework was effective in improving the execution performance of applications by 10.5–16.2% and decreasing the size of the code cache by 4.6–28.5%. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

5.
This paper illustrates a precise method for comparing microcomputer instruction sets for the purpose of language implementation. The standard of comparison is the Concurrent Pascal interpreter for the PDP 11/45 minicomputer. The paper identifies the most frequent virtual instructions and their implementation on the PDP 11/45 computer. It then compares the speed and size of interpreters implemented on various 16-bit microcomputers. It also includes the process switching times of these computers.  相似文献   

6.
We present a study of the static structure of real Java bytecode programs. A total of 1132 Java jar‐files were collected from the Internet and analyzed. In addition to simple counts (number of methods per class, number of bytecode instructions per method, etc.), structural metrics such as the complexity of control‐flow and inheritance graphs were computed. We believe this study will be valuable in the design of future programming languages and virtual machine instruction sets, as well as in the efficient implementation of compilers and other language processors. Copyright © 2006 John Wiley & Sons, Ltd.  相似文献   

7.
SimpleScalar: an infrastructure for computer system modeling   总被引:8,自引:0,他引:8  
Austin  T. Larson  E. Ernst  D. 《Computer》2002,35(2):59-67
Designers can execute programs on software models to validate a proposed hardware design's performance and correctness, while programmers can use these models to develop and test software before the real hardware becomes available. Three critical requirements drive the implementation of a software model: performance, flexibility, and detail. Performance determines the amount of workload the model can exercise given the machine resources available for simulation. Flexibility indicates how well the model is structured to simplify modification, permitting design variants or even completely different designs to be modeled with ease. Detail defines the level of abstraction used to implement the model's components. The SimpleScalar tool set provides an infrastructure for simulation and architectural modeling. It can model a variety of platforms ranging from simple unpipelined processors to detailed dynamically scheduled microarchitectures with multiple-level memory hierarchies. SimpleScalar simulators reproduce computing device operations by executing all program instructions using an interpreter. The tool set's instruction interpreters also support several popular instruction sets, including Alpha, PPC, x86, and ARM  相似文献   

8.
为了增强嵌入式组态软件的功能,引入一种类似于C语言的脚本.设计一个编译器把该脚本编译成中间代码,采用中间代码的优点是可提高程序运行的速度,也减小了脚本解释程序的设计难度.提出一种仿微处理器结构的脚本虚拟机设计方案,在运行时对中间代码进行解释.虚拟机主要由程序存储器、指令译码器、运算器、程序计数器、控制器以及动态容器组成,其中动态容器的设计是关键,它具有可动态分配内存、自动释放内存等优点,适合于嵌入式操作系统下运行.实验与测试结果表明,该脚本解释虚拟机可满足嵌入式组态软件设计的要求.  相似文献   

9.
Debugging multi‐language software systems requires examining and executing these systems at multiple levels of abstraction. Embedded systems, for example, often comprise a mix of assembly language device drivers and C language control code. Embedded systems increasingly utilize Java to support dynamic loading and run‐time reconfiguration. The RTEEM (Research version of the Tcl Environment for Extensible Modeling) debugger employs three design patterns in solving the problems of multi‐language embedded system debugging. The Reflective virtual machine (VM) pattern models a language‐neutral virtual machine abstraction, with language‐specific interfaces extending this abstraction. Reflection allows a debugger to inspect and control a target VM. The Chain of Responsibility is a classic pattern used to arrange language‐specific debugger command interpreters in a delegation chain. All interpreters share a single command syntax, but each interpreter adapts commands to its language abstraction by interacting with its language‐specific VM view. Composite is another classic pattern, used to combine objects into tree structures. RTEEM employs it to aggregate VM debugger chains into a hierarchy that supports uniform command syntax for debugging threads, processes, multiprocessor systems, and compositions of these entities. This paper illustrates how combining two classic design patterns with the VM abstraction as a pattern results in an architecture that is powerful and flexible in adapting to the debugging needs of heterogeneous, distributed embedded systems. Copyright © 2004 John Wiley & Sons, Ltd.  相似文献   

10.
This paper describes a new method for code space optimization for interpreted languages called LZW‐CC . The method is based on a well‐known and widely used compression algorithm, LZW , which has been adapted to compress executable program code represented as bytecode. Frequently occurring sequences of bytecode instructions are replaced by shorter encodings for newly generated bytecode instructions. The interpreter for the compressed code is modified to recognize and execute those new instructions. When applied to systems where a copy of the interpreter is supplied with each user program, space is saved not only by compressing the program code but also by automatically removing the unused implementation code from the interpreter. The method's implementation within two compiler systems for the programming languages Haskell and Java is described and implementation issues of interest are presented, notably the recalculations of target jumps and the automated tailoring of the interpreter to program code. Applying LZW‐CC to nhc98 Haskell results in bytecode size reduction by up to 15.23% and executable size reduction by up to 11.9%. Java bytecode is reduced by up to 52%. The impact of compression on execution speed is also discussed; the typical speed penalty for Java programs is between 1.8 and 6.6%, while most compressed Haskell executables run faster than the original. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

11.
Iain Milne  Glenn Rowe 《Software》2005,35(15):1477-1493
Although the principles of writing compilers and interpreters are well known, we have found that the ideas needed to develop an interpreter for the express purpose of allowing direct interaction with the running code do not yet appear to have been published in an academic context. We describe a programming method that can be used for the production of an interpreter for common object‐oriented languages such as C++, Java and C#. The main purpose of the interpreter is to parse short, relatively simple programs and allow direct interaction between the user and the running code. Such a system is useful for projects such as OGRE, which is an educational tool allowing students to visualize in three‐dimensional graphics the state of a program as it runs. The interpreter works by first parsing the source code and building up a data structure capable of representing the program's source code in a form that can be used to both run the program and extract detailed information from the running program. This extraction allows for novel uses of the interpreter, such as forming the basis for a visualization system that must display and provide such information to the user as they watch their executing program. This paper considers the construction of such an interpreter specifically for C++, but the principles should be the same for other similar languages such as Java and C#. We cover the main tasks required of the programmer to create and use the data structure, highlighting areas such as its design, initial construction during parsing, and techniques required to use it for interpretation. These include the ability for the data structure to intelligently clone subsets of itself when multiple copies of one of its elements are required by the running program, how it handles C++'s complicated function overloading and overriding rules, and how inheritance and polymorphism can be supported. Copyright © 2005 John Wiley & Sons, Ltd.  相似文献   

12.
C/C++代码自动生成脚本语言接口的实现   总被引:1,自引:0,他引:1  
对于开发灵活的科学软件来说,脚本语言是一个强大的工具。然而开发人员经常遇到一个问题:如何将编译过的C/C 代码集成到一个解释器。为了解决上述问题,设计了一个可扩展的编译器——接口产生器(IG)。IG主要任务是把编译过的C/C 代码集成到脚本语言解释器中。因此,该文的主要目的就是解决上述相关问题。  相似文献   

13.
14.
Philipp Adler  Wolfram Amme 《Software》2014,44(10):1223-1249
Most constrained systems use interpreters to run mobile programs written in Java. Such interpreters are designed to minimize resource usage and often do not allow mobile code in the devices to be changed. For this reason, runtime optimization is typically not supported, even though it is completely feasible. In this paper, we propose optimistic optimization as a concept for improving application performance in restricted interpreter environments. In an optimistic optimization, a mobile program is restructured speculatively during code generation. This requires that it is possible to undo such optimizations, at runtime, if an incorrect use is detected or the set of available classes has changed when compared with compile time. Experimental results show that interpreted applications using optimistic optimizations tend to run faster when compared with their conventionally optimized counterparts. Compared with standard load elimination, reductions in runtimes of up to 9% for optimistic load elimination and up to 23% for the combined optimization were achieved. Whereas an average performance improvement of 1.87% for optimistic load elimination and 3.7% for the combined optimization could be realized. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

15.
由解释执行实现的指令集仿真是解决二进制兼容问题的有效手段。解释执行各步骤的组织方式对解释器性能有着重要影响。集中方式效率较低,而效率较高的线索方式由于译码过程过于复杂而无法用于CISC指令集的解释执行。本文提出了一种基于DICache的混合线索解释执行技术,DICache实现一种高效的硬件动态预译码,将源指令转换为一种中间表示,在解释例程中对DICache快速访问实现对CISC指令集的线索解释执行。本文在一个源为IA-32、目标为VLIW的解释器上,采用SPEC INT2000中的测试程序对基于DICache的混合线索解释执行技术进行评估。结果表明该方法可以显著提高解释器的性能。  相似文献   

16.
A new instruction adapts LZ77 compression for use inside running programs. The instruction economically references and reuses code fragments that are too small to package as conventional subroutines. The compressed code is interpreted directly, with neither prior nor on‐the‐fly decompression. Hardware implementations seem plausible and could benefit both memory‐constrained and more conventional systems. The method is extremely simple. It has been added to a pre‐existing, bytecoded instruction set, and it added only 10 lines of C to the bytecode interpreter. It typically cuts code size by a third; that is, typical compression ratios are roughly 0.67×. More ambitious compressors are available, but they are more complex, which retards adoption. The current method offers a useful trade‐off to these more complex systems. Copyright © 2005 John Wiley & Sons, Ltd.  相似文献   

17.
The programming language LISP is usually implemented via an interpreter, and a compiler is added later as a LISP program. However, all such production compilers known to the authors produce explicit instructions for the given computer being used. This paper describes the development of a portable LISP compiler in the sense that only Standard LISP functions are used in its definition and the output is a sequence of abstract machine codes, easily mapped to instruction sequences on current computers. The resulting code is quite efficient, demonstrating once again the maxim that most compiler optimization is largely machine independent.  相似文献   

18.
Increases in instruction level parallelism are needed to exploit the potential parallelism available in future wide issue architectures. Predicated execution is an architectural mechanism that increases instruction level parallelism by removing branches and allowing simultaneous execution of multiple paths of control, only committing instructions from the correct path. In order for the compiler to expose and use such parallelism, traditional compiler data-flow and path analysis needs to be extended to predicated code. In this paper, we motivate the need for renaming and for predicates that reflect path information. We present Predicated Static Single Assignment (PSSA) which uses renaming and introduces Full -Path Predicates to remove false dependences and enable aggressive predicated optimization and instruction scheduling. We demonstrate the usefulness of PSSA for Predicated Speculation and Control Height Reduction. These two predicated code optimizations used during instruction scheduling reduce the dependence length of the critical paths through a predicated region. Our results show that using PSSA to enable speculation and control height reduction reduces execution time from 12 to 68%.  相似文献   

19.
A way to host a full general purpose virtual machine (VM) interpreter on a very small microcontroller platform is described. This machine provides a comprehensive set of general and enhanced functionality efficiently by abstracting the VM instruction set. Measurements were made on the execution of software programs in the virtual machine while running on the target platform in order to demonstrate the machine??s capabilities. Additionally, multitasking capabilities were added to the baseline and found to perform efficiently within the VM. The results proved to be satisfactory and demonstrate that a robust virtual machine can be made available for very small embedded platforms based on simple microcontrollers, such as those that are widely found in aerospace applications.  相似文献   

20.
为提高壳的安全性,研究了基于虚拟机的加壳技术.虚拟机通过编译器,把原始字节码编译成伪指令;在执行时,通过虚拟机的解释器把伪指令翻译成原始字节码.设计了伪指令,实现了编译器,并对PE文件补充解释器,构建新的PE头部、节表和节.实验表明,基于虚拟机加壳提高了软件的安全性.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号