期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

吕锋鲍刚《计算机工程与设计》2008,29(17)

Java中的远程方法调用是实现分布式系统的主要技术之一,但在实际应用中,过多重复的远程方法调用会影响程序性能.给出了一种基于客户端缓存的RMI方法,使用客户端本地缓存保存使用过的存根和结果,当客户端需要调用远程方法时首先查询缓存中是否存在将要调用的存根或结果,如果存在就直接从缓存中返回而不需要通过网络进行RMI调用.经过性能比较可以发现客户端缓存的加入能有效减少重复RMI调用的次数,从而提高程序响应速度并减少了占用的网络带宽. 相似文献

2.

Compilers for instruction-level parallelism

Schlansker M. Conte T.M. Dehnert J. Ebcioglu K. Fang J.Z. Thompson C.L. 《Computer》1997,30(12):63-69

Discovering and exploiting instruction level parallelism in code will be key to future increases in microprocessor performance. What technical challenges must compiler writers meet to better use ILP? Instruction level parallelism allows a sequence of instructions derived from a sequential program to be parallelized for execution on multiple pipelined functional units. If industry acceptance is a measure of importance, ILP has blossomed. It now profoundly influences the design of almost all leading edge microprocessors and their compilers. Yet the development of ILP is far from complete, as research continues to find better ways to use more hardware parallelism over a broader class of applications 相似文献

3.

Java in high performance environments

Ghahramani B. Pauley M.A. 《Computer》2003,36(9):109-111

Java programs are executed by a Java virtual machine (JVM), which interprets intermediate compiled bytecode that is nominally platform independent. Although early versions of Java interpreted unoptimized bytecode in a relatively unsophisticated manner, recent developments including static analysis, just-in-time compilation, JVM optimization, and instruction-level optimizations have improved execution efficiency. Consequently, Java is now competitive with C and C++ for some applications and on some platforms. Despite Java's increasing popularity, there is a lingering perception that deficiencies in the language make it unsuitable for high-performance computing. In this paper we address some of those deficiencies and discuss the suitability of using Java in a distributed environment. 相似文献

4.

Statement-Level Communication-Free Partitioning Techniques for Parallelizing Compilers

Shih Kuei-Ping Sheu Jang-Ping Huang Chua-Huang 《The Journal of supercomputing》2000,15(3):243-269

This paper addresses the problem of communication-free partition of iteration spaces and data spaces along hyperplanes. To finding more possible communication-free hyperplane partitions, we treat statements within a loop body as separate schedulable units. Instead of using the information about data dependence distance or direction vectors, our technique explicitly formulates array references as transformations from statement-iteration spaces to data spaces. Based on these transformations, the necessary and sufficient conditions for communication-free partition along hyperplanes to be feasible have been proposed. This approach can be applied to all programs with an imperfectly nested loop or sequences of imperfectly nested loops, whose array references are affine functions of outer loop indices or loop invariant variables. The proposed approach is more practical than existing methods in finding the data and computation distribution patterns that can cause the processor to execute fully-parallel on multicomputers without any interprocessor communication. 相似文献

5.

Development of DSL Compilers for Specialized Processors

Sovetov P. N. 《Programming and Computer Software》2021,47(7):541-554

Programming and Computer Software - Modern computer systems often include specialized processors that are programmed in domain-specific languages. The compiler-in-the-loop technology, which assumes... 相似文献

6.

Compilers and Computer Architecture

Wulf W.A. 《Computer》1981,14(7):41-47

An examination of the relation between architecture and compiler design leads to several principles which can simplify compilers and improve the object code they produce. 相似文献

7.

Remodularizing Java programs for improved locality of feature implementations in source code

Andrzej Olszak Bo Nørregaard Jørgensen 《Science of Computer Programming》2012,77(3):131-151

Explicit traceability between features and source code is known to help programmers to understand and modify programs during maintenance tasks. However, the complex relations between features and their implementations are not evident from the source code of object-oriented Java programs. Consequently, the implementations of individual features are difficult to locate, comprehend, and modify in isolation. In this paper, we present a novel remodularization approach that improves the representation of features in the source code of Java programs. Both forward and reverse restructurings are supported through on-demand bidirectional restructuring between feature-oriented and object-oriented decompositions. The approach includes a feature location phase based on tracing of program execution, a feature representation phase that reallocates classes into a new package structure based on single-feature and multi-feature packages, and an annotation-based reverse transformation of code. Case studies performed on two open-source projects indicate that our approach requires relatively little manual effort and reduces tangling and scattering of feature implementations in the source code. 相似文献

8.

Optimizing graph algorithms for improved cache performance

Park J.-S. Penner M. Prasanna V.K. 《Parallel and Distributed Systems, IEEE Transactions on》2004,15(9):769-782

We develop algorithmic optimizations to improve the cache performance of four fundamental graph algorithms. We present a cache-oblivious implementation of the Floyd-Warshall algorithm for the fundamental graph problem of all-pairs shortest paths by relaxing some dependencies in the iterative version. We show that this implementation achieves the lower bound on processor-memory traffic of /spl Omega/(N/sup 3///spl radic/C), where N and C are the problem size and cache size, respectively. Experimental results show that this cache-oblivious implementation shows more than six times the improvement in real execution time over that of the iterative implementation with the usual row major data layout, on three state-of-the-art architectures. Second, we address Dijkstra's algorithm for the single-source shortest paths problem and Prim's algorithm for minimum spanning tree problem. For these algorithms, we demonstrate up to two times the improvement in real execution time by using a simple cache-friendly graph representation, namely adjacency arrays. Finally, we address the matching algorithm for bipartite graphs. We show performance improvements of two to three times in real execution time by using the technique of making the algorithm initially work on subproblems to generate a suboptimal solution and, then, solving the whole problem using the suboptimal solution as a starting point. Experimental results are shown for the Pentium III, UltraSPARC III, Alpha 21264, and MIPS R12000 machines. 相似文献

9.

Adaptive Optimizing Compilers for the 21st Century

Keith D. Cooper Devika Subramanian Linda Torczon 《The Journal of supercomputing》2002,23(1):7-22

Historically, compilers have operated by applying a fixed set of optimizations in a predetermined order. We call such an ordered list of optimizations a compilation sequence. This paper describes a prototype system that uses biased random search to discover a program-specific compilation sequence that minimizes an explicit, external objective function. The result is a compiler framework that adapts its behavior to the application being compiled, to the pool of available transformations, to the objective function, and to the target machine. This paper describes experiments that attempt to characterize the space that the adaptive compiler must search. The preliminary results suggest that optimal solutions are rare and that local minima are frequent. If this holds true, biased random searches, such as a genetic algorithm, should find good solutions more quickly than simpler strategies, such as hill climbing. 相似文献

10.

A Divide-and-Conquer Algorithm for Irregular Redistribution in Parallelizing Compilers

Wang Hui Guo Minyi Wei Daming 《The Journal of supercomputing》2004,29(2):157-170

In order to achieve higher load balancing, it is necessary to solve irregular block redistribution problems, which are different from regular block-cyclic redistribution. High Performance Fortran version 2 (HPF-2) provides irregular distribution functionalities, such as GEN_BLOCK and INDIRECT. This paper is devoted to develop an efficient algorithm that attempts to obtain near optimal scheduling while satisfying the conditions of minimal message size of total steps and the minimal number of steps for irregular array redistribution. The algorithm intends to decrease the computation costs by dividing the whole block into sub-blocks and solving the sub-problems accordingly, and then merging them together to get final results. Simulation results show that our algorithm has comparable performance with a relocation algorithm developed previously (H. Yook and M. Park. Proceedings of the IASTED International Conference Parallel and Distributed Computingand Systems, Nov. 3–6, MIT, Boston, USA, 1999). 相似文献

11.

Arithmetically improved algorithmic performance

M. Mascagni W. L. Miranker 《Computing》1985,35(2):153-175

An augmented set of floating-point arithmetic operations which includes the accurate inner product can be routinely employed with benefit in some standard iterative numerical algorithms. Benefits include the requirement of fewer iterations for achieving computational convergence criteria and more accurate results for a given number of iterations. Not all algorithms are benefited, but favorable results have been obtained for the QR algorithm, the conjugate gradient algorithm and the separating hyperplane algorithm. 相似文献

12.

Test Generation for Compilers and Other Formal Text Processors

Zelenov S. V. Zelenova S. A. Kossatchev A. S. Petrenko A. K. 《Programming and Computer Software》2003,29(2):104-111

A concept of automated test suites generation for testing compilers and other formal text processors is suggested. An approach based on the generation of tests from models is used. 相似文献

13.

PETRA: Performance Evaluation Tool for Modern Parallelizing Compilers

Dheya Mustafa Rudolf Eigenmann 《International journal of parallel programming》2015,43(4):549-571

相似文献

14.

The design and performance of real-time Java middleware

Corsaro A. Schmidt D.C. 《Parallel and Distributed Systems, IEEE Transactions on》2003,14(11):1155-1167

More than 90 percent of all microprocessors are now used for real-time and embedded applications. The behavior of these applications is often constrained by the physical world. It is therefore important to devise higher-level languages and middleware that meet conventional functional requirements, as well as dependably and productively enforce real-time constraints. We provide two contributions to the study of languages and middleware for real-time and embedded applications. We first describe the architecture of jRate, which is an open-source ahead-of-time-compiled implementation of the RTSJ middleware. We then show performance results obtained using RTJPerf, which is an open-source benchmarking suite that systematically compares the performance of RTSJ middleware implementations. We show that, while research remains to be done to make RTSJ a bullet-proof technology, the initial results are promising. The performance and predictability of JRate provides a baseline for what can be achieved by using ahead-of-time compilation. Likewise, RTJPerf enables researchers and practitioners to evaluate the pros and cons of RTSJ middleware systematically as implementations mature. 相似文献

15.

Efficient Symbolic Analysis for Parallelizing Compilers and Performance Estimators 总被引：1，自引：1，他引：1

Fahringer Thomas 《The Journal of supercomputing》1998,12(3):227-252

Symbolic analysis is of paramount importance for parallelizing compilers and performance estimators to examine symbolic expressions with program unknowns such as machine and problem sizes and to solve queries based on systems of constraints (equalities and inequalities). This paper describes novel techniques for counting the number of solutions to a system of constraints, simplifying systems of constraints, computing lower and upper bounds of symbolic expressions, and determining the relationship between symbolic expressions. All techniques target wide classes of linear and non-linearsymbolic expressions and systems of constraints. Our techniques have been implemented and are used as part of a parallelizing compiler and a performance estimator to support analysis and optimization of parallel programs. Various examples and experiments demonstrate the effectiveness of our symbolic analysis techniques. 相似文献

16.

JEETuningExpert: A software assistant for improving Java Enterprise Edition application performance 总被引：1，自引：0，他引：1

Marco Crasso Alejandro Zunino Leonardo Moreno Marcelo Campo 《Expert systems with applications》2009,36(9):11718-11729

Designing a JEE (Java Enterprise Edition)-based enterprise application capable of achieving its performance objectives is rather hard. Predicting the performance of this type of systems at the design level is difficult and sometimes not viable, because this requires having precise knowledge of the expected load conditions and the underlying software infrastructure. Besides, the requirement for rapid time-to-market leads to postpone performance tuning until systems are developed, packaged and running. In this paper we present a novel approach for automatically detecting performance problems in JEE-based applications and, in turn, suggesting courses of actions to correct them. The idea is to allow developers to smoothly identify and eradicate performance anti-patterns by automatically analyzing execution traces. The approach has been implemented as a tool called JEETuningExpert, and validated using three well-known JEE reference applications. Specifically, we evaluated the effectiveness of JEETuningExpert for detecting performance problems, measured the overhead imposed by online monitoring each application and the improvements were achieved after following the suggested corrective actions. These results empirically showed that the refactored applications are 40.08%, 76.94% and 61.13% faster, on average. 相似文献

17.

Guest Editorial: Parallel Systems and Compilers

Valentina Salapura Michael Gschwind Jens Knoop 《International journal of parallel programming》2012,40(1):1-3

相似文献

18.

Design and performance analysis of a distributed Java VirtualMachine

Surdeanu M. Moldovan D. 《Parallel and Distributed Systems, IEEE Transactions on》2002,13(6):611-627

This paper introduces DISK, a distributed Java Virtual Machine for networks of heterogenous workstations. Several research issues are addressed. A novelty of the system is its object-based, multiple-writer memory consistency protocol (OMW). The correctness of the protocol and its Java compliance is demonstrated by comparing the nonoperational definitions of release consistency, the consistency model implemented by OMW, with the Java Virtual Machine memory consistency model (JVMC), as defined in the Java Virtual Machine Specification. An analytical performance model was developed to study and compare the design trade-offs between OMW and the lazy invalidate release consistency (LI) protocols as a function of the number of processors, network characteristics, and application types. The DISK system has been implemented and running on a network of 16 Pentium III computers interconnected by a 100 Mbps Ethernet network. Experiments performed with two applications: parallel matrix multiplication and traveling salesman problem confirm the analytical model 相似文献

19.

Java for real-time

Kelvin Nilsen 《Real-Time Systems》1996,11(2):197-205

Java is a new programming language publicly released by Sun Microsystems in May of 1995 with hopes of revolutionizing the software industry. The popular press has responded with numerous articles touting the language's benefits. Since many of the applications which Java is intended to serve have real-time characteristics, we have recently undertaken to develop a set of standard extensions to provide Java programmers with the ability to describe the real-time requirements of their Java applications. This brief report summarizes the issues that have influenced the design of Real-Time Java and provides an overview of its current embodiment. 相似文献

20.

C编译中的优化技术分析

钟卫吴雨《计算机研究与发展》1992,29(12):26-32

相似文献