首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
S. Baskiyar 《Computing》2002,69(4):273-289
Dynamic type checking and method binding slows pure object-oriented programs such as those written in Smalltalk. Dynamic method lookup is needed to support method overriding and type-method conformance. We have developed a platform independent technique that implements method overriding efficiently for pure object-oriented languages. The technique involves binding non-overridden method calls at compile time. The overridden method calls are also resolved (and bound) statically using simple inferences. Overridden method bindings are corrected dynamically using an Overridden Method Dictionary. Using simulations, we demonstrate that this technique offers 35% –70% reduction in the average method lookup delay over the Berkeley Smalltalk implementation. This technique has very low compile time overhead, memory overhead, does not need specialized hardware and allows shared code pages in concurrent programs. For programs which may be re-used, it is proper to have a follow-up compilation to generate efficient code after the program development is complete. Received November 7, 2001; revised March 27, 2002 Published online: October 24, 2002  相似文献   

2.
In this papaer was present Safe Self-Scheduling (SSS), a new scheduling scheme that schedules parallel loops with variable length iteration execution times not known at compile time. The scheme assumes a shared memory space. SSS combines static scheduling with dynamic scheduling and draws favorable advantages from each. First, it reduces the dynamic scheduling overhead by statically scheduling a major portion of loop iterations. Second, the workload is balanced with a simple and efficient self-scheduling scheme by applying a new measure, thesmallest critical chore size. Experimental results comparing SSS with other scheduling schemes indicate that SSS surpasses other scheduling schemes. In the experiment on Gauss-Jordan, an application that is suitable for static scheduling schemes, SSS is the only self-scheduling scheme that outperforms the static scheduling scheme. This indicates that SSS achieves a balanced workload with a very small amount of overhead. This research has been supported in part by the National Science Foundation under Contract No. CCR-9210568.  相似文献   

3.
APL in its present interpretive implementations provides an ideal tool for the development of programs. The interpretive overhead, however, often prohibits the running of large production programs. Our results indicate that the overhead involved in checking type and size compatibility of the arguments of APL primitive functions during execution can be substantially avoided without loss of error detection. Using a large sample of syntactically correct APL programs, a statistical analysis was performed on the required conformability checks for each function. We have compiled (for each type of conformability check in each program) statistics on the percentage of checks that could be eliminated by static analysis and verification. The programs analysed contained 1,454 primitive functions. Of the 2,412 rank, length, domain and value checks required by a ‘naïve’ interpreter, 1,966 (80 per cent) can be validated statically. Index checks were required only thirty-one times, but essentially all must be evaluated at run time. This suggests that compilation of APL programs to conventional target codes may be feasible as a high performance implementation whenever the cost of such translation does not dominate.  相似文献   

4.
Many embedded Java platforms execute two types of Java classes: those installed statically on the client device and those downloaded dynamically from service providers at run time. For achieving higher performance, the static Java classes can be compiled into machine code by ahead‐of‐time compiler (AOTC) in the server, and the translated machine code can be installed on the client device. Unfortunately, AOTC cannot be applicable to the dynamically downloaded classes. This paper proposes client‐AOTC (c‐AOTC), which performs AOTC on the client device using the just‐in‐time compiler (JITC) module installed on the device, obviating the JITC overhead and complementing the server‐AOTC. The machine code of a method translated by JITC is cached on a persistent memory of the device, and when the method is invoked again in a later run of the program, the machine code is loaded and executed directly without any translation overhead. A major issue in c‐AOTC is relocation because some of the address constants embedded in the cached machine code are not correct when the machine code is loaded and used in a different run; those addresses should be corrected before they are used. Constant pool resolution and inlining complicate the relocation problem, and we propose our solutions. The persistent memory overhead for saving the relocation information is also an issue, and we propose a technique to encode the relocation information and compress the machine code efficiently. We developed a c‐AOTC on Sun's CDC VM reference implementation, and our evaluation results indicate that c‐AOTC can improve the performance significantly, as much as an average of 12% for EEMBC and 4% for SpecJVM98, with a persistent memory overhead of 1% on average. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

5.
6.
Most of the current compiler projects for distributed memory architectures leave the critical and time-consuming problem of finding performance-efficient data distributions and profitable program transformations for a given parallel program almost entirely to the programmer. Performance estimators provide critical performance information to both programmers and parallelizing compilers, the most crucial part of which involves determining the communication overhead induced by a program. In this paper, we present a very practical approach to the problem of compile-time estimation of communication costs for regular codes that includes analytical methods to model the number of messages exchanged, data volume transferred, transfer time, and network contention. In order to achieve high estimation accuracy, our estimator aggressively exploits compiler analysis and optimization information. It is assumed that machine parameters and problem size are known at compile time. We conducted a variety of experiments to validate the estimation accuracy and the ability to support both the programmer and compiler in the effort of performance tuning of parallel programs. We believe that our approach can be automatically applied to a large class of regular codes.  相似文献   

7.
《Computer Networks》2007,51(2):480-495
One of the most difficult tasks in software development is that the programmer must implement a feature going through a laborious and error prone process of modifying the programs of other features. The programs of the different features entangle in the same reusable program units of the programming language, making them also difficult to be verified, maintained and reused. We show that if (C1) the features interact, (C2) they are executed by the same process and (C3) they are implemented in a programming language that requires the programmer to specify execution flows, program entanglement is inevitable and the problem cannot be solved by software design alone. Applications with interacting features are common including those that require exception handling.The feature language extensions (FLX) is a set of programming language constructs designed to enable the programmer to develop interacting features as separate and reusable program modules even though the features interact. The programmer uses FLX to specify non-procedural program units, organize the program units into reusable features and integrate features into executable feature packages. He develops a feature based on a model instead of the code of other features. FLX supports an automatic procedure to detect the interaction condition among features; the programmer then resolve the interaction in a feature package without changing feature code. FLX features and feature packages are reusable; the programmer may package different combinations of them and resolve their interactions differently to meet different user needs. An FLX to Java compiler has been implemented; our experience of using it has been very positive.  相似文献   

8.
In this paper we present a novel approach, based on the integration of static program analysis and simulation techniques, for the performance prediction of message passing programs. PS, a simulator of PVM applications developed in the last years by our research group, is fed with traces collected by executing the parallel program to be analyzed in quasi-concurrent mode on a single workstation. Since this process is typically a non negligible part of the simulation complexity, we have devised a technique based on static analysis and code restructuring for significantly speeding up the trace generation. We show how, by statically analyzing and restructuring the program, it is possible to obtain a simplified code (shrinked code) to be run for collecting a reduced version of the traces.  相似文献   

9.
Java just‐in‐time compilers often compile only hot methods because the compilation overhead is a part of the running time. This requires precise and efficient hot spot detection, which includes distinguishing hot methods from cold ones, detecting them as early as possible, and paying a small detection overhead. Hot spot detection is especially important in embedded applications because they show more of a start‐up phase behavior of a regular application where methods are not executed heavily, so the hot methods are not definite. Because a long‐running method is likely to be a hot method, we can detect a hot method by measuring its running time during interpretation. However, precise measurement of the running time during execution is too expensive, especially in embedded systems, so many counter‐based heuristics have been proposed to estimate it such as Oracle's HotSpot heuristic. One problem is that although the overhead of these heuristics is low, they do not estimate the running time precisely, which may lead to imprecise hot spot detection.This paper proposes a new hot spot detection heuristic called flow‐sensitive runtime estimation, which can estimate the running time more precisely than others with a relatively low overhead. It only counts important bytecode instructions dynamically, but it can obtain the precise count of all interpreted bytecode instructions with a simple arithmetic calculation. We also propose a static analysis technique to predict those hot methods which spends a huge execution time once invoked, so as to compile them at their first invocation. Our experimental results show that these techniques can improve the performance by as much as an average of 7.4% compared with the HotSpot heuristic for the benchmarks when they run once, which is often regarded as showing the start‐up phase behavior. Even for real embedded Java applications such as the digital TV Java Xlet applications, our techniques can improve the user response time by an average of 7.1%. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

10.
Software product lines (SPLs) are diverse systems that are developed using a dual engineering process: (a) family engineering defines the commonality and variability among all members of the SPL, and (b) application engineering derives specific products based on the common foundation combined with a variable selection of features. The number of derivable products in an SPL can thus be exponential in the number of features. This inherent complexity poses two main challenges when it comes to modelling: firstly, the formalism used for modelling SPLs needs to be modular and scalable. Secondly, it should ensure that all products behave correctly by providing the ability to analyse and verify complex models efficiently. In this paper, we propose to integrate an established modelling formalism (Petri nets) with the domain of software product line engineering. To this end, we extend Petri nets to Feature Nets. While Petri nets provide a framework for formally modelling and verifying single software systems, Feature Nets offer the same sort of benefits for software product lines. We show how SPLs can be modelled in an incremental, modular fashion using Feature Nets, provide a Feature Nets variant that supports modelling dynamic SPLs, and propose an analysis method for SPL modelled as Feature Nets. By facilitating the construction of a single model that includes the various behaviours exhibited by the products in an SPL, we make a significant step towards efficient and practical quality assurance methods for software product lines.  相似文献   

11.
A software product line (SPL) is a family of related programs of a domain. The programs of an SPL are distinguished in terms of features, which are end-user visible characteristics of programs. Based on a selection of features, stakeholders can derive tailor-made programs that satisfy functional requirements. Besides functional requirements, different application scenarios raise the need for optimizing non-functional properties of a variant. The diversity of application scenarios leads to heterogeneous optimization goals with respect to non-functional properties (e.g., performance vs. footprint vs. energy optimized variants). Hence, an SPL has to satisfy different and sometimes contradicting requirements regarding non-functional properties. Usually, the actually required non-functional properties are not known before product derivation and can vary for each application scenario and customer. Allowing stakeholders to derive optimized variants requires us to measure non-functional properties after the SPL is developed. Unfortunately, the high variability provided by SPLs complicates measurement and optimization of non-functional properties due to a large variant space. With SPL Conqueror, we provide a holistic approach to optimize non-functional properties in SPL engineering. We show how non-functional properties can be qualitatively specified and quantitatively measured in the context of SPLs. Furthermore, we discuss the variant-derivation process in SPL Conqueror that reduces the effort of computing an optimal variant. We demonstrate the applicability of our approach by means of nine case studies of a broad range of application domains (e.g., database management and operating systems). Moreover, we show that SPL Conqueror is implementation and language independent by using SPLs that are implemented with different mechanisms, such as conditional compilation and feature-oriented programming.  相似文献   

12.
Our earlier work reported a Threshold Scheduling Method for compile-time mapping of functional parallism on distributed-memory systems. The work reported in this paper discusses run-time issues in efficiently supporting the functional parallism with minimal overheads, through a combination of compile-time and run-time ownership analysis. At compile time, the code generation phase determines whether a local copy of a live definition of a variable needed by a task is available on a given processor, through an ownership analysis. In case ownership cannot be resolved at compile time, an appropriate code is generated to perform analysis at run time. The code generation is carried out so that all the processors carry the same copy of the compiled program with the individual processor's code being isolated and the universally owned code being replicated on all processors to minimize run-time overheads. The run-time system maintains the static and dynamic ownerships at every processor to avoid communication overhead on ownership information. We demonstrate the approach by incorporating it in the compiler for targeting a parallel functional language, Sisal (streams and iterations in single assignment language), to Intel Touchstone i860 systems. Several benchmarks demonstrate the viability of the approach.  相似文献   

13.
Accurately Selecting Block Size at Runtime in Pipelined Parallel Programs   总被引:2,自引:0,他引:2  
Loops that contain cross-processor data dependencies, known as DOACROSS loops, are often found in scientific programs. Efficiently parallelizing such loops is important yet nontrivial. One useful parallelization technique for DOACROSS loops is pipelining, where each processor (node) performs its computation in blocks; after each, it sends data to the next node in the pipeline. The amount of computation before sending a message is called the block size; its choice, although difficult to make statically, is important for efficient execution. This paper describes a flexible runtime approach to choosing the block size. Rather than rely on static estimation of workload, our system takes measurements during the first two iterations of a program and then uses the results to build an execution model and choose an appropriate block size which, unlike a static choice, may be nonuniform. To increase accuracy of the chosen block size, our execution model takes intra- and inter-node performance into account. It is important to note that our system finds an effective block size automatically, without experimentation that is necessary when using a statically chosen block size. Performance on a network of workstations shows that programs that use our runtime analysis outperform those that use static block sizes by as much as 18% when the workload is unbalanced. When the workload is balanced, competitive performance is achieved as long as the initial overhead is sufficiently amortized.  相似文献   

14.
Multi-core based systems are ubiquitous in data centers. Efficient exploitation of hardware parallelism supported by such systems is imperative on multiple fronts: minimizing latency and power consumption and maximizing throughput. This in turn calls for advanced program analysis and optimization. Call graphs have been long used to this end. Although several static call graph extraction techniques have been proposed in the past, these techniques cannot be applied to analyze programs already running in production. Likewise, the existing dynamic call graph extraction tools have limited use in production owing to, say (but not limited to), lack of support for capturing wall clock time spent in functions of a given program and lack of means to analyze the call graph information captured at run time. In this paper, we present a Pin-based dynamic call graph extraction framework called Trin-Trin. The framework enables extraction of complete, precise and dynamic call graphs. Additionally, the framework can be used seamlessly with already running applications. Furthermore, an analytics engine is provided to facilitate advanced program analysis, e.g., different multithreading context(s) of any function can be extracted in a demand-driven fashion. We evaluate the overhead of Trin-Trin using several Unix utilities, applications from the industry-standard SPEC CINT2006, CFP2006 benchmark suite and Yahoo! properties. Additionally, we present a case study to illustrate how Trin-Trin can be used to analyze performance bottlenecks and performance regressions.  相似文献   

15.
The increased transistor count resulting from ever-decreasing feature sizes has enabled the design of architectures containing many small but efficient processing units (cores). At the same time, many new applications have evolved with varying performance requirements. The fixed architecture of multiCore platforms often fails to accommodate the inherent diverse requirements of different applications. We present a dynamically reconfigurable multiCore architecture that detects program phase change at runtime and adapts to the changing program behavior by reconfiguring itself. We introduce simple but efficient performance counters to monitor vital parameters of reconfigurable architectures. We also present static, dynamic and adaptive reconfiguration techniques for reconfiguring the architecture. Our evaluation of the proposed reconfigurable architecture using an adaptive reconfiguration technique shows an improvement of up to 23% for multi-threaded applications and up to 27% for multiprogrammed workloads over that on statically chosen architectures, and up to 41% over the baseline SMP configuration.  相似文献   

16.
Software product-lines (SPLs) are software platforms that can be readily reconfigured for different project requirements. A key part of an SPL is a model that captures the rules for reconfiguring the software. SPLs commonly use feature models to capture SPL configuration rules. Each SPL configuration is represented as a selection of features from the feature model. Invalid SPL configurations can be created due to feature conflicts introduced via staged or parallel configuration or changes to the constraints in a feature model. When invalid configurations are created, a method is needed to automate the diagnosis of the errors and repair the feature selections.This paper provides two contributions to research on automated configuration of SPLs. First, it shows how configurations and feature models can be transformed into constraint satisfaction problems to automatically diagnose errors and repair invalid feature selections. Second, it presents empirical results from diagnosing configuration errors in feature models ranging in size from 100 to 5,000 features. The results of our experiments show that our CSP-based diagnostic technique can scale up to models with thousands of features.  相似文献   

17.
The array programming paradigm adopts multidimensional arrays as the fundamental data structures of computation. Array operations process entire arrays instead of just single elements. This makes array programs highly expressive and introduces data parallelism in a natural way. Array programming imposes non-trivial structural constraints on ranks, shapes, and element values of arrays. A prominent example where such constraints are violated are out-of-bound array accesses. Usually, such constraints are enforced by means of run time checks. Both the run time overhead inflicted by dynamic constraint checking and the uncertainty of proper program evaluation are undesirable.We propose a novel type system for array programs based on dependent types. Our type system makes dynamic constraint checks obsolete and guarantees orderly evaluation of well-typed programs. We employ integer vectors of statically unknown length to index array types. We also show how constraints on these vectors are resolved using a suitable reduction to integer scalars. Our presentation is based on a functional array calculus that captures the essence of the paradigm without the legacy and obfuscation of a fully-fledged array programming language.  相似文献   

18.
GOP is a graph‐oriented programming model which aims at providing high‐level abstractions for configuring and programming cooperative parallel processes. With GOP, the programmer can configure the logical structure of a parallel/distributed program by constructing a logical graph to represent the communication and synchronization between the local programs in a distributed processing environment. This paper describes a visual programming environment, called VisualGOP, for the design, coding, and execution of GOP programs. VisualGOP applies visual techniques to provide the programmer with automated and intelligent assistance throughout the program design and construction process. It provides a graphical interface with support for interactive graph drawing and editing, visual programming functions and automation facilities for program mapping and execution. VisualGOP is a generic programming environment independent of programming languages and platforms. GOP programs constructed under VisualGOP can run in heterogeneous parallel/distributed systems. Copyright © 2005 John Wiley & Sons, Ltd.  相似文献   

19.
Software product line (SPL) is an approach used to develop a range of software products with a high degree of similarity. In this approach, a feature model is usually used to keep track of similarities and differences. Over time, as modifications are made to the SPL, inconsistencies with the feature model could arise. The first approach to dealing with these inconsistencies is refactoring. Refactoring consists of small steps which, when accumulated, may lead to large-scale changes in the SPL, resulting in features being added to or eliminated from the SPL. In this paper, we propose a framework for refactoring SPLs, which helps keep SPLs consistent with the feature model. After some introductory remarks, we describe a formal model for representing the feature model. We express various refactoring patterns applicable to the feature model and the SPL formally, and then introduce an algorithm for finding them in the SPL. In the end, we use a real-world case study of an SPL to illustrate the applicability of the framework introduced in the paper.  相似文献   

20.
In most database systems, the values of many important run-time parameters of the system, the data, or the query are unknown at query optimization time. Parametric query optimization attempts to identify at compile time several execution plans, each one of which is optimal for a subset of all possible values of the run-time parameters. The goal is that at run time, when the actual parameter values are known, the appropriate plan should be identifiable with essentially no overhead. We present a general formulation of this problem and study it primarily for the buffer size parameter. We adopt randomized algorithms as the main approach to this style of optimization and enhance them with a sideways information passing feature that increases their effectiveness in the new task. Experimental results of these enhanced algorithms show that they optimize queries for large numbers of buffer sizes in the same time needed by their conventional versions for a single buffer size, without much sacrifice in the output quality and with essentially zero run-time overhead. Edited by S. Zdonik / Received June 1993 / Accepted April 1996  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号