期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A performance debugging tool for high performance Fortran programs

Takashi Suzuoka Jaspal Subhlok Thomas Gross 《Concurrency and Computation》1997,9(10):927-945

Parallel languages allow the programmer to express parallelism at a high level. The management of parallelism and the generation of interprocessor communication is left to the compiler and the runtime system. This approach to parallel programming is particularly attractive if a suitable widely accepted parallel language is available. High Performance Fortran (HPF) has emerged as the first popular machine independent parallel language, and remarkable progress has been made towards compiling HPF efficiently. However, the performance of HPF programs is often poor and unpredictable, and obtaining adequate performance is a major stumbling block that must be overcome if HPF is to gain widespread acceptance. The programmer is often in the dark about how to improve the performance of an HPF program since poor performance can be attributed to a variety of reasons, including poor choice of algorithm, limited use of parallelism, or an inefficient data mapping. This paper presents a profiling tool that allows the programmer to identify the regions of the program that execute inefficiently, and to focus on the potential causes of poor performance. The central idea is to distinguish the code that is executing efficiently from the code that is executing poorly. Efficient code uses all processors of a parallel system to make progress, while inefficient code causes processors to wait, execute replicated code, idle, communicate, or perform compiler bookkeeping. We designate the latter code as non-scalable, since adding more processors generally does not lead to improved performance for such code. By analogy, the former code is called scalable. The tool presented here separates a program into scalable and non-scalable components and identifies the causes of non-scalability of different components. We show that compiler information is the key to dividing the execution times into logical categories that are meaningful to the programmer. We present the design and implementation of a profiler that is integrated with Fx, a compiler for a variant of HPF. The paper includes two examples that demonstrate how the data reported by the profiler are used to identify and resolve performance bugs in parallel programs. © 1997 John Wiley & Sons, Ltd. 相似文献

2.

An overview of a graphical multilanguage applications environment

Fisher G. 《IEEE transactions on pattern analysis and machine intelligence》1988,14(6):774-786

A programming environment to support the development and use of engineering applications is presented. The environment provides uniform support for a set of Pascal-class languages in which engineering and scientific applications are commonly written. The environment includes a dynamically multilanguage interpreter debugger to aid in the interactive development of applications. For the application and user, the environment provides a graphical program interface based on the concept of a software control panel. Through a control panel, the user may interactively modify program parameters and exercise fine-grain control over program execution. The environment also includes a graphical design tool for constructing executable block diagrams based on standard application programs. The control-panel tool is integrated with the design tool, to provide a uniform interface to all levels of program execution 相似文献

3.

GMB: A tool for manipulating and animating graph data structures

David Jablonowski Vincent A. Guarna 《Software》1989,19(3):283-301

This paper describes a tool graph originally developed for the Faust environment. Faust is a scientific program development environment being implemented at the Center for Supercomputing Research and Development at the University of Illinois at Urbana-Champaign. The graph tool comprises two major components: the Graph Manager that implements an abstract graph data type, and the Graph Browser that handles the details of displaying a subgraph of a graph created through the Graph Manager. The Graph Browser displays graph views, where a graph view is a subgraph of its parent graph. The concept of graph views is analogous to the concept of views in the traditional database sense. Several graph views may simultaneously exist for a single parent graph, where each view's subgraph depends on the context of the application requesting the view. Goals of the graph tool, GMB, included providing an abstract graph data type for general use and animating graphs efficiently. 相似文献

4.

The PEPPHER composition tool: performance-aware composition for GPU-based systems

Usman Dastgeer Lu Li Christoph Kessler 《Computing》2014,96(12):1195-1211

The PEPPHER (EU FP7 project) component model defines the notion of component, interface and meta-data for homogeneous and heterogeneous parallel systems. In this paper, we describe and evaluate the PEPPHER composition tool, which explores the application’s components and their implementation variants, generates the necessary low-level code that interacts with the runtime system, and coordinates the native compilation and linking of the various code units to compose the overall application code to optimize performance. We discuss the concept of smart containers and its benefits for reducing dispatch overhead, exploiting implicit parallelism across component invocations and runtime optimization of data transfers. In an experimental evaluation with several applications, we demonstrate that the composition tool provides a high-level programming front-end while effectively utilizing the task-based PEPPHER runtime system (StarPU) underneath for different usage scenarios on GPU-based systems. 相似文献

5.

Performance Optimization Using Extended Critical Path Analysis in Multithreaded Programs on Multiprocessors

《Journal of Parallel and Distributed Computing》2001,61(1):115-136

Efficient performance tuning of parallel programs is often hard. Optimization is often done when the program is written as a last effort to increase the performance. With sequential programs each (executed) code segment will affect the completion time. In the case of a parallel program executed on a multiprocessor this is not always true, due to dependencies between the different threads. Thus, certain code segments of the execution may not affect the completion time of the program. Optimization of such code segments will not increase the performance. In this paper we present an approach to optimize performance by finding the extended critical path of the multithreaded program. The extended critical path analysis is a generalization of the critical path analysis in the sense that it also deals with more threads than processors. We have implemented the extended critical path analysis in a performance optimization tool. The tool allows the user to determine the extended critical path of a multithreaded application written for the Solaris operating system for any number of processors based on execution on a single processor workstation. 相似文献

6.

Simulation of anomalous transport in tokamaks using the FACETS code

Alexei Y. Pankin Alex Pletzer Srinath Vadlamani John R. Cary Ammar Hakim Scott E. Kruger Mahmood Miah Thomas D. Rognlien Svetlana Shasharina Glenn Bateman Arnold H. Kritz Tariq Rafiq FACETS team 《Computer Physics Communications》2011,(1):180-184

The development of a new parallel framework for integrated modeling of tokamak plasmas is a primary objective of the SciDAC Framework Architecture for Core-Edge Transport Simulations (FACETS) project. The FACETS code will be used to predict the performance of tokamak discharges and to optimize tokamak discharge scenarios. Novel parallel numerical algorithms and solvers have been developed in the FACETS project in order to simulate the multi-scale dynamics of tokamak plasmas. The status of development of modules for anomalous transport in the FACETS code is described in this paper. Mechanisms that are used for coupling 1D anomalous transport in the plasma core together with 2D transport in the plasma edge (in near separatrix and scrape-off-layer regions) are considered. Results of the first verification studies based on predictive modeling of several analytical and experimental equilibria are presented. 相似文献

7.

On the use of diagnostic dependence-analysis tools in parallel programming: Experiences using PTOOL

Leslie A. Henderson Robert E. Hiromoto Olaf M. Lubeck Margaret L. Simmons 《The Journal of supercomputing》1990,4(1):83-96

Although considerable technology has been developed for debugging and developing sequential programs, producing verifiably correct parallel code is a much harder task. In view of the large number of possible scheduling sequences, exhaustive testing is not a feasible method for determining whether a given parallel program is correct; nor have there been sufficient theoretical developments to allow the automatic verification of parallel programs. PTOOL, a tool being developed at Rice University in collaboration with users at Los Alamos National Laboratory, provides an alternative mechanism for producing correct parallel code. PTOOL is a semi-automatic tool for detecting implicit parallelism in sequential Fortran code. It uses vectorizing compiler techniques to identify dependences preventing the parallelization of sequential regions. According to the model supported by PTOOL, a programmer should first implement and test his program using traditional sequential debugging techniques. Then, using PTOOL, he can select loop bodies that can be safely executed in parallel. At Los Alamos, we have been interested in examining the role of dependence-analysis tools in the parallel programming process. Therefore, we have used PTOOL as a static debugging tool to analyze parallel Fortran programs. Our experiences using PTOOL lead us to conclude that dependence-analysis tools are useful to today's parallel programmers. Dependence-analysis is particularly useful in the development of asynchronous parallel code. With a tool like PTOOL, a programmer can guarantee that processor scheduling cannot affect the results of his parallel program. If a programmer wishes to implement a partially parallelized region through the use of synchronization primitives, however, he will find that dependence analysis is less useful. While a dependence-analysis tool can greatly simplify the task of writing synchronization code, the ultimate responsibility of correctness is left to the programmer.This work was performed under the auspices of the U.S. Department of Energy. 相似文献

8.

代码/需求行为差异检测

刘智萍黄箐《计算机应用研究》2016,33(7)

为解决软件开发后期（维护/演化）程序代码与需求模型不一致的问题,本文面向逆向需求工程,重点研究检测变更代码与原始需求模型之间行为差异的算法：首先沿用模型/代码转换技术,分析模型/代码比较原理,设计比早期连续型单向串行检测算法快2N倍（N为路径数）的离散型双向并行检测算法。然后采用该算法开发图形化需求/代码比较工具RCCT,并将其集成进综合需求建模系统（RMTS）,使动画建模、特性检测、模型转换、需求/代码差异检测等功能融为一体。最后,通过电子转账案例演示该工具的使用方法,并编写测试程序证明离散型双向并行算法不但比原始算法高效,而且更加可靠。相似文献

9.

基于C++及其扩展语言的动态剖析

苏铭王华王清贤张慧成《计算机工程与应用》2001,37(19):170-174

程序优化是提高程序运行效率的重要步骤,程序剖析是程序优化的第一步。对于串行语言,程序剖析代码是由编译器通过一个命令行开关自动插入。但是,大部分并行语言编译器都不具有这个功能。该文以并行C++语言的可移植的动态剖析程序(profiler)为例,从两方面对问题进行了论述:首先给出实现可移植动态剖析程序的一般方法;然后分析一个用于pC++插桩(Instrumentation)工具。相似文献

10.

Performance Visualization on Monsoon

《Journal of Parallel and Distributed Computing》1993,18(2):169-180

The performance of an applications program running on a parallel machine is affected by several factors such as the algorithm, the programming language, the compiler, and the operating system. Performance evaluation of parallel machines requires quick and easy-to-use analysis of large amounts of data. This paper describes a performance evaluation tool built for Monsoon, a multithreaded multiprocessor machine built by Motorola in collaboration with MIT. The tool offers integrated data collection, analysis, and visualization and is designed to be simple but powerful. Software layers built on top of simple hardware monitors offer a flexible, yet nonintrusive performance evaluation tool. Examples of successful use of the tool by both systems and applications programmers are included. 相似文献

11.

版本管理与代码走查系统设计与实现

李嵩泉刘陟升《计算机与网络》2014,(6):53-56

为优化软件项目管理,解决软件版本管理混乱、人工代码走查过程负责和结果难于控制的问题,研究了软件版本管理工具和代码走查工具,介绍了版本管理工具和代码走查工具在软件项目管理中应用。根据实践中软件项目管理的需求,提供了一种版本管理工具Subversion与代码走查工具Reviewboard相结合的系统的实现方法,介绍了版本管理工具和代码走查工具的安装部署,给出了一种基于该系统的软件开发应用流程。相似文献

12.

Towards Efficient Execution of MPI Applications on the Grid: Porting and Optimization Issues

Rainer Keller Edgar Gabriel Bettina Krammer Matthias S. Müller Michael M. Resch 《Journal of Grid Computing》2003,1(2):133-149

The message passing interface (MPI) is a standard used by many parallel scientific applications. It offers the advantage of a smoother migration path for porting applications from high performance computing systems to the Grid. In this paper Grid-enabled tools and libraries for developing MPI applications are presented. The first is MARMOT, a tool that checks the adherence of an application to the MPI standard. The second is PACX-MPI, an implementation of the MPI standard optimized for Grid environments. Besides the efficient development of the program, an optimal execution is of paramount importance for most scientific applications. We therefore discuss not only performance on the level of the MPI library, but also several application specific optimizations, e.g., for a sparse, parallel equation solver and an RNA folding code, like latency hiding, prefetching, caching and topology-aware algorithms. 相似文献

13.

渐进式智能回溯向量化代码调优方法

赵博赵荣彩徐金龙高伟《计算机科学》2015,42(1):50-53,58

为了充分发挥高性能计算机的计算能力,缓解程序员设计和编写并行程序的压力,扩充可用软件集合,设计并实现了利用交互界面深入挖掘程序中的可向量化语句,优化生成代码中的向量化语句,提高生成代码的执行效率.该方法对充分发挥高性能计算机的计算能力,增强系统可用性和扩展应用范围具有重要的意义,同时能够提供有效的辅助手段和工具支持.渐进式智能回溯向量化代码调优架构通过对用户提交的串行程序进行程序分析和变换,采用串行程序分析、数据依赖分析、向量化分析等技术手段,根据分析结果对程序进行变换和优化,自动生成最终的向量化代码.该方法通过分析串行程序中潜在的并行性,将其自动变换为等价的向量化代码形式,大大简化了程序员的工作. 相似文献

14.

A fortran language system for mutation-based software testing

K. N. King A. Jefferson Offutt 《Software》1991,21(7):685-718

相似文献

15.

JCrasher: an automatic robustness tester for Java 总被引：1，自引：0，他引：1

Christoph Csallner Yannis Smaragdakis 《Software》2004,34(11):1025-1050

JCrasher is an automatic robustness testing tool for Java code. JCrasher examines the type information of a set of Java classes and constructs code fragments that will create instances of different types to test the behavior of public methods under random data. JCrasher attempts to detect bugs by causing the program under test to ‘crash’, that is, to throw an undeclared runtime exception. Although in general the random testing approach has many limitations, it also has the advantage of being completely automatic: o supervision is required except for off‐line inspection of the test ases that have caused a crash. Compared to other similar commercial and research tools, JCrasher offers several novelties: it transitively analyzes methods, determines the size of each tested method's parameter‐space and selects parameter combinations and therefore test cases at random, taking into account the time allocated for testing; it defines heuristics for determining whether a Java exception should be considered as a program bug or whether the JCrasher supplied inputs have violated the code's preconditions; it includes support for efficiently undoing all the state changes introduced by previous tests; it produces test files for JUnit, a popular Java testing tool; and it can be integrated in the Eclipse IDE. Copyright © 2004 John Wiley & Sons, Ltd. 相似文献

16.

Performance Tuning Software DSM Applications using Visualisation

Brorsson Mats Kral Martin 《The Journal of supercomputing》1999,13(3):249-265

Small organisations can now have access to high raw processing power using networks of workstations (NOW) as parallel computing platforms. Software Distributed Shared Memory (Software DSM) packages have been developed to facilitate the programming of such systems. However, because of the high interprocess latencies in a NOW, the performance of a software DSM application is more susceptible to the partitioning of the problem than what might be expected.This paper presents an approach for a tool to visualise the execution of a program in a way that highlights performance bottlenecks. The tool associates identified bottlenecks with the corresponding source code lines in order to determine what piece of code is the cause of poor performance. The visualisation technique is demonstrated in two case studies. They clearly show that the visualisation is indeed useful and provides an effective way to acquire an understanding of what characterises an applications sharing behaviour. 相似文献

17.

Comprehensive aspect weaving for Java

Alex Villazón Walter Binder Philippe Moret Danilo Ansaloni 《Science of Computer Programming》2011,76(11):1015-1036

Aspect-oriented programming (AOP) has been successfully applied to application code thanks to techniques such as Java bytecode instrumentation. Unfortunately, with existing AOP frameworks for Java such as AspectJ, aspects cannot be woven into the standard Java class library. This restriction is particularly unfortunate for aspects that would benefit from comprehensive aspect weaving with complete method coverage, such as profiling or debugging aspects. In this article we present MAJOR, a new tool for comprehensive aspect weaving, which ensures that aspects are woven into all classes loaded in a Java Virtual Machine, including those in the standard Java class library. MAJOR includes the pluggable module CARAJillo, which supports efficient access to a complete and customizable calling context representation. We validate our approach with three case studies. Firstly, we weave existing profiling aspects with MAJOR which otherwise would generate incomplete profiles. Secondly, we introduce an aspect for memory leak detection that also benefits from comprehensive weaving. Thirdly, we present an aspect subsuming the functionality of ReCrash, an existing tool based on low-level bytecode instrumentation techniques that generates unit tests to reproduce program failures. Our aspect-based tools are concisely implemented in a few lines of code, and leverage MAJOR and CARAJillo for comprehensive aspect weaving and for efficient access to calling context information. 相似文献

18.

Performance modeling for SPMD message-passing programs

JÜRGEN BREHM PATRICK H. WORLEY MANISH MADHUKAR 《Concurrency and Computation》1998,10(5):333-357

Today's massively parallel machines are typically message-passing systems consisting of hundreds or thousands of processors. Implementing parallel applications efficiently in this environment is a challenging task, and poor parallel design decisions can be expensive to correct. Tools and techniques that allow the fast and accurate evaluation of different parallelization strategies would significantly improve the productivity of application developers and increase throughput on parallel architectures. This paper investigates one of the major issues in building tools to compare parallelization strategies: determining what type of performance models of the application code and of the computer system are sufficient for a fast and accurate comparison of different strategies. The paper is built around a case study employing the performance prediction tool (PerPreT) to predict performance of the parallel spectral transform shallow water model code (PSTSWM) on the Intel Paragon. PSTSWM is a parallel application code that was designed to evaluate different parallel strategies for the spectral transform method as it is used in climate modeling and weather forecasting. Multiple parallel algorithms and algorithm variants are embedded in the code. PerPreT uses a relatively simple algebraic model to predict execution time for SPMD (single program multiple data) parallel applications. Applications are modeled through parameterized formulae for communication and computation, where the parameters include the problem size, the number of processors used to execute the program, and system characteristics (e.g. setup times for communication, link bandwidth and sustained computing performance per processor). In this paper we describe performance models that predict the performance of the different algorithms in PSTSWM accurately enough to allow them to be compared, establishing the feasibility of such a demanding application of performance modeling. We also discuss issues in generating and validating the performance models, emphasizing the practical importance of tools such as PerPreT in such studies. © 1998 John Wiley & Sons, Ltd. 相似文献

19.

Global arrays: A nonuniform memory access programming model for high-performance computers 总被引：1，自引：1，他引：0

Jaroslaw Nieplocha Robert J. Harrison Richard J. Littlefield 《The Journal of supercomputing》1996,10(2):169-189

Portability, efficiency, and ease of coding are all important considerations in choosing the programming model for a scalable parallel application. The message-passing programming model is widely used because of its portability, yet some applications are too complex to code in it while also trying to maintain a balanced computation load and avoid redundant computations. The shared-memory programming model simplifies coding, but it is not portable and often provides little control over interprocessor data transfer costs. This paper describes an approach, called Global Arrays (GAs), that combines the better features of both other models, leading to both simple coding and efficient execution. The key concept of GAs is that they provide a portable interface through which each process in a MIMD parallel program can asynchronously access logical blocks of physically distributed matrices, with no need for explicit cooperation by other processes. We have implemented the GA library on a variety of computer systems, including the Intel Delta and Paragon, the IBM SP-1 and SP-2 (all message passers), the Kendall Square Research KSR-1/2 and the Convex SPP-1200 (nonuniform access shared-memory machines), the CRAY T3D (a globally addressable distributed-memory computer), and networks of UNIX workstations. We discuss the design and implementation of these libraries, report their performance, illustrate the use of GAs in the context of computational chemistry applications, and describe the use of a GA performance visualization tool.(An earlier version of this paper was presented at Supercomputing'94.) 相似文献

20.

Retargetability and Extensibility in a Parallel Debugger

John May Francine Berman 《Journal of Parallel and Distributed Computing》1996,35(2):142

This paper describes the design and implementation ofPanorama, a parallel debugger for MIMD message-passing computers. Programmers can readily adapt Panorama to new parallel platforms and extended it to include their ownviewsof a target program. The system comes with three built-in graphical program views, and it also includes a software tool to help programmers design and implement new views. Panorama avoids detailed dependence on target architectures by using thebase debuggersupplied by each hardware vendor to carry out low-level debugging tasks such as setting breakpoints and examining data. Since the interfaces and capabilities of base debuggers vary, we have developed a strategy that models interactions between Panorama and base debuggers. The model separates general-purpose code from the special-case functions that handle specific debugger characteristics. The resulting system is easy to adapt and free from the clutter of conditionally-executed, special-case code. 相似文献