首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In modern processors, instructions to perform operations are often produced before it becomes known that this is required. Such an expedient, which is called speculative execution, helps to reveal parallelism at the instruction level. In the EPIC architectures, the speculative execution is completely controlled by the compiler, which makes it possible to avoid using complex hardware mechanisms for supporting speculative instruction production. Moreover, the idea of the speculative execution can be used by the compiler in machine-independent optimizations. The paper describes a scheme of construction of the speculative optimization that is based on the selection of properties of the control flow and data flow that are important from the optimization standpoint and on the estimation of the probabilities of their fulfillment. The probabilities found are used for searching and constructing advantageous speculative and bookkeeping transformations. For optimizations that include only speculative movements of instructions upwards along the control flow graph, on the basis of the suggested scheme, a method has been developed that includes algorithms for finding probabilities of data and control dependences, for estimating benefit of speculative movements, and for constructing a recovery code. On the basis of this method, an algorithm for the speculative scheduling of instructions for the Intel Itanium architecture has been developed and implemented. Specific features of its implementation and experimental results are described.  相似文献   

2.
This paper describes a new method for code space optimization for interpreted languages called LZW‐CC . The method is based on a well‐known and widely used compression algorithm, LZW , which has been adapted to compress executable program code represented as bytecode. Frequently occurring sequences of bytecode instructions are replaced by shorter encodings for newly generated bytecode instructions. The interpreter for the compressed code is modified to recognize and execute those new instructions. When applied to systems where a copy of the interpreter is supplied with each user program, space is saved not only by compressing the program code but also by automatically removing the unused implementation code from the interpreter. The method's implementation within two compiler systems for the programming languages Haskell and Java is described and implementation issues of interest are presented, notably the recalculations of target jumps and the automated tailoring of the interpreter to program code. Applying LZW‐CC to nhc98 Haskell results in bytecode size reduction by up to 15.23% and executable size reduction by up to 11.9%. Java bytecode is reduced by up to 52%. The impact of compression on execution speed is also discussed; the typical speed penalty for Java programs is between 1.8 and 6.6%, while most compressed Haskell executables run faster than the original. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

3.
In a recent study, we discovered that many single load/store operations in embedded applications can be parallelized and thus encoded simultaneously in a single‐instruction multiple‐data instruction, called the multiple load/store (MLS) instruction. In this work, we investigate the problem of utilizing MLS instructions to produce optimized machine code, and propose an effective approach to the problem. Specifically, we formalize the MLS problem, that is, the problem of maximizing the use of MLS instructions with an unlimited register file size. Based on this analysis, we show that we can solve the problem efficiently by translating it into a variant of the problem finding a maximum weighted path cover in a dynamic weighted graph. To handle a more realistic case of the finite size of the register file, our solution is then extended to take into account the constraints of register sequencing in MLS instructions and the limited register resource available in the target processor. We demonstrate the effectiveness of our approach experimentally by using a set of benchmark programs. In summary, our approach can reduce the number of loads/stores by 13.3% on average, compared with the code generated from existing compilers. The total code size reduction is 3.6%. This code size reduction comes at almost no cost because the overall increase in compilation time as a result of our technique remains quite minimal. Copyright © 2007 John Wiley & Sons, Ltd.  相似文献   

4.
In this work, we introduce multi‐column graph convolutional networks (MGCNs), a deep generative model for 3D mesh surfaces that effectively learns a non‐linear facial representation. We perform spectral decomposition of meshes and apply convolutions directly in the frequency domain. Our network architecture involves multiple columns of graph convolutional networks (GCNs), namely large GCN (L‐GCN), medium GCN (M‐GCN) and small GCN (S‐GCN), with different filter sizes to extract features at different scales. L‐GCN is more useful to extract large‐scale features, whereas S‐GCN is effective for extracting subtle and fine‐grained features, and M‐GCN captures information in between. Therefore, to obtain a high‐quality representation, we propose a selective fusion method that adaptively integrates these three kinds of information. Spatially non‐local relationships are also exploited through a self‐attention mechanism to further improve the representation ability in the latent vector space. Through extensive experiments, we demonstrate the superiority of our end‐to‐end framework in improving the accuracy of 3D face reconstruction. Moreover, with the help of variational inference, our model has excellent generating ability.  相似文献   

5.
Many embedded Java platforms execute two types of Java classes: those installed statically on the client device and those downloaded dynamically from service providers at run time. For achieving higher performance, the static Java classes can be compiled into machine code by ahead‐of‐time compiler (AOTC) in the server, and the translated machine code can be installed on the client device. Unfortunately, AOTC cannot be applicable to the dynamically downloaded classes. This paper proposes client‐AOTC (c‐AOTC), which performs AOTC on the client device using the just‐in‐time compiler (JITC) module installed on the device, obviating the JITC overhead and complementing the server‐AOTC. The machine code of a method translated by JITC is cached on a persistent memory of the device, and when the method is invoked again in a later run of the program, the machine code is loaded and executed directly without any translation overhead. A major issue in c‐AOTC is relocation because some of the address constants embedded in the cached machine code are not correct when the machine code is loaded and used in a different run; those addresses should be corrected before they are used. Constant pool resolution and inlining complicate the relocation problem, and we propose our solutions. The persistent memory overhead for saving the relocation information is also an issue, and we propose a technique to encode the relocation information and compress the machine code efficiently. We developed a c‐AOTC on Sun's CDC VM reference implementation, and our evaluation results indicate that c‐AOTC can improve the performance significantly, as much as an average of 12% for EEMBC and 4% for SpecJVM98, with a persistent memory overhead of 1% on average. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

6.
7.
针对代码与模型之间的不一致性问题,提出了一种基于UML模型和Java代码之间的一致性检测方法.首先,对UML类图和时序图进行形式化描述,并提出时序调用图(SD-CG)这一概念,在此基础上完成类的关联关系到关联属性的转换以及UML时序图到时序调用图SD-CG的转换;其次,通过方法调用图CG来表达类方法之间的调用关系,从而反映代码动态行为,由此通过对Java源代码的词法分析与语法分析,可获得类的信息及方法调用图CG;然后设计了UML模型与Java源代码间一致性检测算法,包括对类间静态信息以及时序调用图SD-CG与方法调用图CG间的一致性检测;最后,通过开发UML模型与Java源代码一致性检测工具,验证了所提出的方法是可行有效的.  相似文献   

8.
Implementing a concurrent programming language such as Java by means of a translator to an existing language is attractive as it provides portability over all platforms supported by the host language and reduces development time—as many low‐level tasks can be delegated to the host compiler. The C and C++ programming languages are popular choices for many language implementations due to the availability of efficient compilers on a wide range of platforms. For garbage‐collected languages, however, they are not a perfect match as no support is provided for accurately discovering pointers to heap‐allocated data on thread stacks. We evaluate several previously published techniques and propose a new mechanism, lazy pointer stacks, for performing accurate garbage collection in such uncooperative environments. We implemented the new technique in the Ovm Java virtual machine with our own Java‐to‐C/C++ compiler using GCC as a back‐end compiler. Our extensive experimental results confirm that lazy pointer stacks outperform existing approaches: we provide a speedup of 4.5% over Henderson's accurate collector with a 17% increase in code size. Accurate collection is essential in the context of real‐time systems, we thus validate our approach with the implementation of a real‐time concurrent garbage collection algorithm. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

9.
Eric Bruneton  Michel Riveill 《Software》2001,31(13):1237-1264
This article presents a middleware platform architecture whose goals, motivated by the needs of a real‐world application, are the following: separation of functional and non‐functional code in applications, composition of non‐functional properties, and modularity and extensibility of the middleware platform itself. This architecture is inspired by the Enterprise Java Beans platform, and uses a new object composition model to separate and compose the non‐functional properties. In order to evaluate this architecture, we have implemented the JavaPod platform which we have used to implement a prototype of the application that motivated our goals. The results of these experiments show that our goals can indeed be achieved with our architecture. Copyright © 2001 John Wiley & Sons, Ltd.  相似文献   

10.
In object programming languages, the Visitor design pattern allows separation of algorithms and data structures. When applying this pattern to tree‐like structures, programmers are always confronted with the difficulty of making their code evolve. One reason is that the code implementing the algorithm is interwound with the code implementing the traversal inside the visitor. When implementing algorithms such as data analyses or transformations, encoding the traversal directly into the algorithm turns out to be cumbersome as this type of algorithm only focuses on a small part of the data‐structure model (e.g., program optimization). Unfortunately, typed programming languages like Java do not offer simple solutions for expressing generic traversals. Rewrite‐based languages like ELAN or Stratego have introduced the notion of strategies to express both generic traversal and rule application control in a declarative way. Starting from this approach, our goal was to make the notion of strategic programming available in a widely used language such as Java and thus to offer generic traversals in typed Java structures. In this paper, we present the strategy language SL that provides programming support for strategies in Java. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

11.
Fault attack represents one of the serious threats against Java Card security. It consists of physical perturbation of chip components to introduce faults in the code execution. A fault may be induced using a laser beam to impact opcodes and operands of instructions. This could lead to a mutation of the application code in such a way that it becomes hostile. Any successful attack may reveal a secret information stored in the card or grant an undesired authorisation. We propose a methodology to recognise, during the development step, the sensitive patterns to the fault attack in the Java Card applications. It is based on the concepts from text categorisation and machine learning. In fact, in this method, we represented the patterns using opcodes n-grams as features, and we evaluated different machine learning classifiers. The results show that the classifiers performed poorly when classifying dangerous sensitive patterns, due to the imbalance of our data-set. The number of dangerous sensitive patterns is much lower than the number of not dangerous patterns. We used resampling techniques to balance the class distribution in our data-set. The experimental results indicated that the resampling techniques improved the accuracy of the classifiers. In addition, our proposed method reduces the execution time of sensitive patterns classification in comparison to the SmartCM tool. This tool is used in our study to evaluate the effect of faults on Java Card applications.  相似文献   

12.
Graph-based malware detection using dynamic analysis   总被引:1,自引:0,他引:1  
We introduce a novel malware detection algorithm based on the analysis of graphs constructed from dynamically collected instruction traces of the target executable. These graphs represent Markov chains, where the vertices are the instructions and the transition probabilities are estimated by the data contained in the trace. We use a combination of graph kernels to create a similarity matrix between the instruction trace graphs. The resulting graph kernel measures similarity between graphs on both local and global levels. Finally, the similarity matrix is sent to a support vector machine to perform classification. Our method is particularly appealing because we do not base our classifications on the raw n-gram data, but rather use our data representation to perform classification in graph space. We demonstrate the performance of our algorithm on two classification problems: benign software versus malware, and the Netbull virus with different packers versus other classes of viruses. Our results show a statistically significant improvement over signature-based and other machine learning-based detection methods.  相似文献   

13.
Conventional remote sensing classification algorithms assume that the data in each class can be modelled using a multivariate Gaussian distribution. As this assumption is often not valid in practice, conventional algorithms do not perform well. In this paper, we present an independent component analysis (ICA)‐based approach for unsupervised classification of multi/hyperspectral imagery. ICA used for a mixture model estimates the data density in each class and models class distributions with non‐Gaussian (sub‐ and super‐Gaussian) probability density functions, resulting in the ICA mixture model (ICAMM) algorithm. Independent components and the mixing matrix for each class are found using an extended information‐maximization algorithm, and the class membership probabilities for each pixel are computed. The pixel is allocated to the class having maximum class membership probability to produce a classification. We apply the ICAMM algorithm for unsupervised classification of images obtained from both multispectral and hyperspectral sensors. Four feature extraction techniques are considered as a preprocessing step to reduce the dimensionality of the hyperspectral data. The results demonstrate that the ICAMM algorithm significantly outperforms the conventional K‐means algorithm for land cover classification produced from both multi‐ and hyperspectral remote sensing images.  相似文献   

14.
Java supports the monitor construct for language‐level synchronization in the context of multi‐threading. This paper introduces the lightweight monitor, an efficient user‐level monitor implementation. The lightweight monitor is useful for single‐threaded Java programs as well as for multi‐threaded Java programs with little lock contention. A 32‐bit lock is embedded in each object for efficient lock access while other monitor data structures are managed using a hash table. We highly optimized the lock manipulation code, which is translated and inlined by a just‐in‐time (JIT) compiler. In the most probable cases, only nine SPARC instructions are spent for lock acquisition and five instructions are spent for lock release. Our experimental results indicate that the lightweight monitor is faster than the monitor implementation in the SUN JDK 1.2 RC1 by up to 21 times in the absence of lock contention and by up to seven times in the presence of lock contention. Copyright © 2004 John Wiley & Sons, Ltd.  相似文献   

15.
16.
In here we consider a technique to automatically extract three-argument instructions from sequential arithmetic code. The instructions include: multiply and add, three argument additions and three argument multiplications (MUL3). The proposed solution combines a height reduction technique that generates three-argument instructions and a VLIW scheduling that can benefit from these instructions. The proposed height reduction technique is based on a known theoretical algorithm (MRK) that in some cases can evaluate an algebraic circuit faster than its depth. We modified the MRK algorithm to generate less instructions and emit VLIW instructions. The modified MRK algorithm was implemented in the LLVM compiler and the potential usefulness was measured. Our results show that for arithmetic benchmarks the proposed technique can improve the VLIW scheduling while emitting three-argument instructions. The contribution of this work includes: the modified MRK algorithm as a new technique for height reduction optimizations and the study of the potential usefulness of three-argument instructions. Though our results are for a non existing hardware they show the usefulness of adding such instructions to VLIW CPUs. Note that a previous research showed that MUL3 can be executed as fast as MUL2.  相似文献   

17.
This work presents a static method implemented in a compiler for extracting high instruction level parallelism for the 32-bit QueueCore, a queue computation-based processor. The instructions of a queue processor implicitly read and write their operands, making instructions short and the programs free of false dependencies. This characteristic allows the exploitation of maximum parallelism and improves code density. Compiling for the QueueCore requires a new approach since the concept of registers disappears. We propose a new efficient code generation algorithm for the QueueCore. For a set of numerical benchmark programs, our compiler extracts more parallelism than the optimizing compiler for an RISC machine by a factor of 1.38. Through the use of QueueCore’s reduced instruction set, we are able to generate 20% and 26% denser code than two embedded RISC processors.  相似文献   

18.
肖成龙  林军  王珊珊  王宁 《计算机应用》2018,38(7):2024-2031
针对在高层次综合(HLS)过程中性能提升、功耗降低困难等问题,提出了一种面向高层次综合的自定义指令自动识别方法。在高层次综合过程之前实现对自定义指令的枚举和选择,从而为高层次综合提供通用的自定义指令识别方法。首先,将高层次源代码转换为控制数据流图(CDFG),实现了对源代码的预处理;其次,基于控制数据流图内的数据流图(DFG),采用子图枚举算法以自底而上的方式枚举出所有连通凸子图,有效提高了用户可灵活修改约束条件的能力;然后,分别从面积、性能和代码量三个角度考虑,利用子图选择算法选择部分最佳子图作为最终的自定义指令;最后,用所选的自定义指令重新生成新代码作为高层次综合工具的输入。与传统高层次综合相比,采用基于出现频率的模式选择可平均减少19.1%的面积,采用基于关键路径的子图选择可平均减少22.3%的时延。此外,与TD算法相比,所提算法的枚举效率平均提升70.8%。实验结果表明,自定义指令自动识别方法使高层次综合在电路设计中能够显著地提升性能,减少面积和代码量。  相似文献   

19.
Communication networks form the backbone of our society. Topology control algorithms optimize the topology of such communication networks. Due to the importance of communication networks, a topology control algorithm should guarantee certain required consistency properties (e.g., connectivity of the topology), while achieving desired optimization properties (e.g., a bounded number of neighbors). Real-world topologies are dynamic (e.g., because nodes join, leave, or move within the network), which requires topology control algorithms to operate in an incremental way, i.e., based on the recently introduced modifications of a topology. Visual programming and specification languages are a proven means for specifying the structure as well as consistency and optimization properties of topologies. In this paper, we present a novel methodology, based on a visual graph transformation and graph constraint language, for developing incremental topology control algorithms that are guaranteed to fulfill a set of specified consistency and optimization constraints. More specifically, we model the possible modifications of a topology control algorithm and the environment using graph transformation rules, and we describe consistency and optimization properties using graph constraints. On this basis, we apply and extend a well-known constructive approach to derive refined graph transformation rules that preserve these graph constraints. We apply our methodology to re-engineer an established topology control algorithm, kTC, and evaluate it in a network simulation study to show the practical applicability of our approach.  相似文献   

20.
Interceptors are an emerging middleware technology enabling the addition of specific network‐oriented capabilities to distributed applications. By exploiting interceptors, developers can register code within interception points, extending the basic middleware mechanisms with specific functionality, e.g. authentication, flow control, caching, etc. Notably, these extensions can be achieved without modifying either the application or the middleware code. In this paper we report the results of our experiences with CORBA request portable interceptors. In particular, we point out (i) the basic mechanisms implementable by these interceptors, i.e. request redirection and piggybacking and (ii) we analyze their limitations. We then propose a proxy‐based technique to overcome the interceptors' limitations. Successively, we present a performance analysis carried out on three Java‐CORBA platforms currently implementing the portable interceptors specification. Finally, we conclude our work with a case study in which portable interceptors are used to implement the fault‐tolerant CORBA client invocation semantic without impacting on the client application code and on the CORBA ORB. We also release fragments of Java code for implementing the described techniques. Copyright © 2003 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号