首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The problem of data dependences detection in codes based on dynamic data structures, is crucial to various compiler optimizations. The approach presented in this paper focuses on detecting data dependences induced by heap-directed pointers on loops that access dynamic data structures. Knowledge about the shape of the data structure accessible from a heap-directed pointer provides critical information for disambiguating heap accesses originating from it. The new approach is based on a previously developed shape analysis that maintains topological information of the connections among the different nodes (memory locations) in the data structure. As a novelty, our approach carries out abstract interpretation of the statements being analyzed, annotating memory locations with read/write information. This information will be later used in a very accurate data dependence test which we describe in this paper. We also discuss its application to several different benchmarks.  相似文献   

2.
扩充内存在长时间高速数据采集中的应用   总被引:1,自引:0,他引:1  
本文讨论了IBM-PC微机扩充内存的访问方法,编制了Microsoft C语言直接访问扩充内存的汇编语言接口函数,并将其应用于某型液体火箭发动机振动信号分析系统数据采集。  相似文献   

3.
We give an in-depth introduction to the design of our functional array programming language SaC, the main aspects of its compilation into host machine code, and its parallelisation based on multi-threading. The language design of SaC aims at combining high-level, compositional array programming with fully automatic resource management for highly productive code development and maintenance. We outline the compilation process that maps SaC programs to computing machinery. Here, our focus is on optimisation techniques that aim at restructuring entire applications from nested compositions of general fine-grained operations into specialised coarse-grained operations. We present our implicit parallelisation technology for shared memory architectures based on multi-threading and discuss further optimisation opportunities on this level of code generation. Both optimisation and parallelisation rigorously exploit the absence of side-effects and the explicit data flow characteristic of a functional setting.  相似文献   

4.
陈志锋  李清宝  张平  丁文博 《软件学报》2016,27(12):3172-3191
内核恶意软件对操作系统的安全造成了严重威胁.现有的内核恶意软件检测方法主要从代码角度出发,无法检测代码复用、代码混淆攻击,且少量检测数据篡改攻击的方法因不变量特征有限导致检测能力受限.针对这些问题,提出了一种基于数据特征的内核恶意软件检测方法,通过分析内核运行过程中内核数据对象的访问过程,构建了内核数据对象访问模型;然后,基于该模型讨论了构建数据特征的过程,采用动态监控和静态分析相结合的方法识别内核数据对象,利用EPT监控内存访问操作构建数据特征;最后讨论了基于数据特征的内核恶意软件检测算法.在此基础上,实现了内核恶意软件检测原型系统MDS-DCB,并通过实验评测MDS-DCB的有效性和性能.实验结果表明:MDS-DCB能够有效检测内核恶意软件,且性能开销在可接受的范围内.  相似文献   

5.
Uwe Kastens  William Waite 《Software》2017,47(11):1597-1631
Classical strategies for matching identifier uses with declarations cannot handle the complexities of modern languages: arbitrarily qualified superclass names, cyclic dependence among lookup operations, and contextual access constraints. We have developed a language‐independent algorithm and supporting data structure that overcome these problems. A well‐defined interface allows introduction of arbitrary code to enforce language‐specific constraints within the basic lookup operations. This paper explains the limitations of the classical strategies, presents the concepts on which our approach is based, and showcases an implementation based on attribute grammars. We explore the major issues through a series of examples and show how one can deal with those issues in a general framework. Many of the issues are specific to a particular language, and in those cases, we explain the solutions that our general interface supports. Although attribute grammars simplify the task of incorporating the model into a compiler, the model itself is completely independent of attribute grammars. We validated our model by using an implementation to process programs in several representative languages. In particular, we mechanically compared the results produced by that implementation with those produced by the Java SE 8 compiler on complete Java programs that are in general use. Performance data obtained during this processing show that our implementation is efficient. Copyright © 2017 John Wiley & Sons, Ltd.  相似文献   

6.
Attacks on smart cards can only be based on a black box approach where the code of cryptographic primitives and operating system are not accessible. To perform hardware or software attacks, a white box approach providing access to the binary code is more efficient. In this paper, we propose a methodology to discover the romized code whose access is protected by the virtual machine. It uses a hooked code in an indirection table. We gained access to the real processor, thus allowing us to run a shell code written in 8051 assembly language. As a result, this code has been able to dump completely the ROM of a Java Card operating system. One of the issues is the possibility to reverse the cryptographic algorithm and all the embedded countermeasures. Finally, our attack is evaluated on different cards from distinct manufacturers.  相似文献   

7.
Memory accesses introduce big‐time overhead and power consumption because of the performance gap between processors and main memory. This paper describes and evaluates a technique, loop scheduling with memory access reduction (LSMAR), that replaces hidden redundant load operations with register operations in loop kernels and performs partial scheduling for newly generated register operations subject to register constraints. By exploiting data dependence of memory access operations, the LSMAR technique can effectively reduce the number of memory accesses of loop kernels, thereby improving timing performance. The technique has been implemented into the Trimaran compiler and evaluated using a set of benchmarks from DSPstone and MiBench on the cycle‐accurate simulator of the Trimaran infrastructure. The experimental results show that when the LSMAR technique is applied, the number of memory accesses can be reduced by 18.47% on average over the benchmarks when it is not applied. The measurements also indicate that the optimizations only lead to an average 1.41% increase in code size. With such small code size expansion, the technique is more suitable for embedded systems compared with prior work.Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

8.
利用扩展内存实现实时数据采集   总被引:1,自引:1,他引:0  
本文讨论了PC系列机的扩展存储器管理方法。基于扩展内存管理程序EMM386,EXE,编制了使用扩展内存的通用汇编语言函数集,并提供了与TurboC语言的接口。并以某飞行模拟转台控制系统中的实时数据采集为例,给出了使用扩展内存时的混合语言编程方法。  相似文献   

9.
S. Saxena  J. A. Field 《Software》1985,15(3):277-303
This paper discusses a method for developing efficient and portable software for 8-bit microprocessors used in real-time applications. The technique used is to design an ‘intermediate level language’ (ILL) which defines low-level primitives to support the real-time application programming and the constructs of high level languages. Thus, the high level language (HLL) program goes through two stages of translation; first to the ILL code and then to the machine code of a microprocessor. The ILL instruction set developed bridges the gap between high level languages and the poor instruction set of microprocessors. This allows the development of optimized and portable code for the microprocessors. The ILL operations, data types, data organization, control structures, synchronization, communication and multi-tasking facilities are described. The effectiveness of this technique is shown by comparing the code generated by the ILL approach with the code available for a sample real-time application written directly in assembly level language.  相似文献   

10.
内存数据库是支持高性能信息处理的一种方法,对于内存数据库而言,数据库的存储结构与存取方法是关键。本文给出了一种内存数据库组织与存取的改进图论方法,并对该方法给几个基本的查询处理操作所能带来的优势进行了探讨,最后从存储空间和操作执行时间两方面进行了性能分析,结果显示改进的图论方法能提供很好的性能。  相似文献   

11.
基于GCC的VLIW编译系统研究   总被引:1,自引:1,他引:0  
VLIW机器在单个机器周期中同时发射并执行多个的并行操作,从而获得较高的指令级并行度,这些操作之间的依赖分析和调度工作则被完全交给相应的编译器执行,因此VLIW的并行性能能否充分发挥取决于VLIW体系结构相关编译器的质量。GNU开发的GCC是被最广泛使用的编译系统之一,它具有多语言、多平台支持的能力和开放的结构,能够运用各种成熟的常规编译优化技术生成高效的代码。文章分析了VLIW及GCC的结构特点,提出了一种基于GCC的VLIW编译系统设计方案,利用GCC进行RTL中间代码一级的体系结构无关优化和少量体系结构相关优化,在汇编代码一级针对VLIW结构进行体系结构相关的优化,从而充分利用GCC的成熟编译技术快速开发高效的VLIW多语言编译系统。  相似文献   

12.
用C语言编写DSP软件时,优化设计尤为重要。近年来提出了多种针对DSP代码生成阶段的偏移分配优化算法,这些算法通过调整局部变量在存储器中的布局来提高变量地址的计算效率。该文提出一种将微粒群算法与遗传算法相结合的算法(类PSO算法),对变量访问序列中各变量地址的分配进行优化,使计算地址所需的代码数量最小,从而减少程序的运行时间,提高DSP的工作效率。  相似文献   

13.
Asynchronous method calls have been proposed to better integrate object orientation with distribution. In the language, asynchronous method calls are combined with so-called processor release points in order to allow concurrent objects to adapt local scheduling to network delays in a very flexible way. However, asynchronous method calls complicate the type analysis by decoupling input and output information for method calls, which can be tracked by a type and effect system. Interestingly, backwards type analysis simplifies the effect system considerably and allows analysis in a single pass. This paper presents a kernel language with asynchronous method calls and processor release points, a novel mechanism for local memory deallocation related to asynchronous method calls, an operational semantics in rewriting logic for the language, and a type and effect system for backwards analysis. Source code is translated into runtime code as an effect of the type analysis, automatically inserting inferred type information in method invocations and operations for local memory deallocation in the process. We establish a subject reduction property, showing in particular that method lookup errors do not occur at runtime and that the inserted deallocation operations are safe.  相似文献   

14.
通用的高级程序设计语言的编译器,比如C的编译器,不会为VLIW处理器的特殊功能部件自动生成代码。通常通过汇编语言来使用这些特殊功能部件,但是这个方案有着它的不足。笔者提出了一种新的方法来解决这些问题。定义了一种可视化并行建模语言VRTL-P,使用它来描述不同操作间逻辑上的可并行性。笔者还实现了一个VRTL-P的在线分析器,它可以根据VLIW处理器的具体实现来判断一组操作是否可以拼装到一条VLIW的指令中。还进一步研究了从VRTL-P生成目标代码和仿真执行VRTL-P的方法。通过使用这些技术,可以为VLIW处理器的特殊功能部件生成高质量的代码,并且可以提高软件的生产率。  相似文献   

15.
申威众核片上多级存储层次是缓解众核“访存墙”的重要结构.完全由软件管理的SPM结构和片上RMA通信机制给应用性能提升带来很多机会,但也给应用程序开发优化与移植提出了很大挑战.为充分挖掘片上存储层次特点提升应用程序性能,同时减轻用户编程优化负担,本文提出了一种多级存储层次访存与通信融合的编译优化方法.该方法首先设计了融合编译指示,将程序高层信息传递给编译器.其次构建了编译优化收益模型并设计了启发式循环优化方案迭代求解框架,并由编译器完成循环优化方案的求解和优化代码的变换.通过编译生成的DMA和RMA批量数据传输操作,将较低存储层次空间中高访问延迟的核心数据批量缓冲进低访问延迟的更高存储层次空间中.在三个典型测试用例上进行了优化实验测试与分析,结果表明本文所提出的优化在性能上与手工优化相当,较未优化版程序性能有显著提升.  相似文献   

16.
分析了Linux 内核模块特点,针对内核模块中二进制指令执行时带来的访存错误,设计了一种针对内核模块的静态检测方法。通过模拟内核模块中指令的执行,并比较访存指令请求与相关内存区域信息,静态检测方法目标是找出代码对内存的非法访问,并对可疑的访存行为发出警告。针对 ARM 处理器平台,给出了静态检测方法的具体实现,并对内核模块中的访存错误就行了检测验证。实验表明,静态检测方法能够有效找出包括地址越界访问、读未初始化内存、访问已释放内存等访存错误,本文的静态检测方法达到了预期的检测效果。  相似文献   

17.
Many simulations in the physical sciences are expressed in terms of rectilinear arrays of variables. It is attractive to develop such simulations for use in 1‐, 2‐, 3‐ or arbitrary physical dimensions and also in a manner that supports exploitation of data‐parallelism on fast modern processing devices. We report on data layouts and transformation algorithms that support both conventional and data‐parallel memory layouts. We present our implementations expressed in both conventional serial C code as well as in NVIDIA's Compute Unified Device Architecture concurrent programming language for use on general purpose graphical processing units. We discuss: general memory layouts; specific optimizations possible for dimensions that are powers‐of‐two and common transformations, such as inverting, shifting and crinkling. We present performance data for some illustrative scientific applications of these layouts and transforms using several current GPU devices and discuss the code and speed scalability of this approach. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

18.
The iteration space of a loop nest is the set of all loop iterations bounded by the loop limits. Tiling the iteration space can effectively exploit the available parallelism, which is essential to multiprocessor compiling and pipelined architecture design. Another improvement brought by tiling is the better data locality that can dramatically reduce memory access and, consequently, the relevant memory access energy consumptions. However, previous studies on tiling were based on the data dependence, thus arrays without dependencies such as input arrays (data streams) were not considered. In this paper, we extend the tiling exploration to also accommodate those dependence-free arrays, and propose a stream-conscious tiling scheme for off-chip memory access optimization. We show that input arrays are as important, if not more, as the arrays with data dependencies when the focus is on memory access optimization instead of parallelism extraction. Our approach is verified on TI’s low power C55X DSP with popular multimedia applications, exhibiting off-chip memory access reduction by 67% on average over the traditional iteration space tiling.  相似文献   

19.
An instruction set is given for an abstract machine which uses a pushdown stack as its principal memory. The proposed instructions serve the similar purposes of (1) defining the dynamic semantics of programming languages by describing the operations of programs on the abstract machine and (2) describing an intermediate language to be used in compiling programming languages into machine language. It is shown how the intermediate language can be used in the translation of the programming languages ADA, FORTRAN and PASCAL into IBM 360 assembly language and advantages over other intermediate languages such as three-address code and P-code.  相似文献   

20.
We present compiler analyses and optimizations for explicitly parallel programs that communicate through a shared address space. Any type of code motion on explicitly parallel programs requires a new kind of analysis to ensure that operations reordered on one processor cannot be observed by another. The analysis, calledcycle detection, is based on work by Shasha and Snir and checks for cycles among interfering accesses. We improve the accuracy of their analysis by using additional information fromsynchronization analysis, which handles post–wait synchronization, barriers, and locks. We also make the analysis efficient by exploiting the common code image property of SPMD programs. We make no assumptions on the use of synchronization constructs: our transformations preserve program meaning even in the presence of race conditions, user-defined spin locks, or other synchronization mechanisms built from shared memory. However, programs that use linguistic synchronization constructs rather than their user-defined shared memory counterparts will benefit from more accurate analysis and therefore better optimization. We demonstrate the use of this analysis for communication optimizations on distributed memory machines by automatically transforming programs written in a conventional shared memory style into a Split-C program, which has primitives for nonblocking memory operations and one-way communication. The optimizations includemessage pipelining, to allow multiple outstanding remote memory operations, conversion of two-way to one-way communication, and elimination of communication through data reuse. The performance improvements are as high as 20–35% for programs running on a CM-5 multiprocessor using the Split-C language as a global address layer. Even larger benefits can be expected on machines with higher communication latency relative to processor speed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号