期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Scalable dynamic Monitoring,Analysis and Tuning Environment for parallel applications

P. Caymes-Scutari A. Morajko T. Margalef E. Luque 《Journal of Parallel and Distributed Computing》2010

Parallel/distributed systems are continuously growing. This allows and enables the scalability of the applications, either by considering bigger problems in the same period of time or by solving the problem in a shorter time. In consequence, the methodologies, approaches and tools related to parallel paradigm should be brought up to date to support the increasing requirements of the applications and the users. MATE (Monitoring, Analysis and Tuning Environment) provides automatic and dynamic tuning for parallel/distributed applications. The tuning decisions are made according to performance models, which provide a fast means to decide what to improve in the execution. However, MATE presents some bottlenecks as the application grows, due to the fact that the analysis process is made in a full centralized manner. In this work, we propose a new approach to make MATE scalable. In addition, we present the experimental results and the analysis to validate the proposed approach against the original one. 相似文献

2.

多计算机的自动插桩与监测系统

苏铭宋宗宇王华《计算机工程与应用》2002,38(4):79-82

在设计大规模的并行应用程序时,如何使多处理器的利用率达到最优,这对程序设计人员来讲是一个很大的挑战。一般说来,由于应用程序在运行时性能上的缺陷,计算资源得不到充分利用。因此,迫切需要对应用程序进行“性能调试”,即在正确性的基础上,通过揭示这些缺陷,对程序进行细调而提高程序性能。在这篇文章里,介绍了一个软件工具包—自动插桩和监测系统(theAutomatedInstrumentationandMonitoringSystem),它集程序插桩、运行监测和性能分析为一体,支持在多处理器上对并行应用程序进行性能评估。文章首先论述了一些建立性能调试工具的基本问题;然后,详细描述AIMS系统的体系结构以及在利用AIMS工具包进行性能调试工具的开发中的经验;最后,使用两个例子详细地描述使用AIMS系统进行性能调试的过程。相似文献

3.

Empirical Optimization for a Sparse Linear Solver: A Case Study

Yoon-Ju Lee Pedro C. Diniz Mary W. Hall Robert Lucas 《International journal of parallel programming》2005,33(2-3):165-181

This paper describes initial experiences with semi-automated performance tuning of a sparse linear solver in LS-DYNA, a large, widely used engineering application. Through a collection of tools supporting empirical optimization, we alleviate the burden of performance tuning for mapping today’s sophisticated engineering software to increasingly complex hardware platforms. We describe a tool that automatically isolates code segments to create benchmark subsets for the purposes of performance tuning. We present a collection of automatically generated empirical results that demonstrate the sensitivity of the application’s performance to optimization parameters. Through this case study, we demonstrate the importance of developing automatic performance tuning support for performance-sensitive applications. 相似文献

4.

Model‐based MPI‐IO tuning with Periscope tuning framework

Weifeng Liu Michael Gerndt Bin Gong 《Concurrency and Computation》2016,28(1):3-20

For many parallel applications, I/O performance is a major bottleneck. MPI‐IO, defined by the MPI forum, can help parallel applications overcome the performance and portability limitations of existing parallel I/O interfaces. Although autotuning has been used to improve the performance of computing kernels, MPI‐IO autotuning has rarely been studied. To automate MPI‐IO performance tuning, we designed and implemented an automatic tuner. The tuner relies on the Periscope tuning framework for transparently passing hints to the MPI‐IO library and for automatically collecting performance data. Unlike computational code, each MPI‐IO function takes a relatively long time to complete. Thus, exhaustively searching through the entire parameter space is impractical. So we developed a performance model that can direct us to shorten the tuning time. Copyright © 2015 John Wiley & Sons, Ltd. 相似文献

5.

GMATE: Dynamic Tuning of Parallel Applications in Grid Environment

Genaro Costa Anna Sikora Josep Jorba Tomàs Margalef 《Journal of Grid Computing》2014,12(2):371-398

Performance is a main issue in parallel application development. Dynamic tuning is a technique that changes certain applications’ parameters on-line to improve their performance adapting the execution to actual conditions. To perform that, it is necessary to collect measurements, analyze application behavior and carry out tuning actions during the application execution. Computational Grids present proclivity for dynamic changes in the environment during the application execution. Therefore, dynamic tuning tools are necessary to reach the expected performance indexes of applications on those environments. This paper addresses the dynamic tuning of parallel/distributed applications on Computational Grids. We analyze Grid environments to determine their characteristics and we present the development of dynamic tuning tool GMATE enabled for such environments. The performance analysis is based on performance models that indicate how to improve the application execution. A particular problem which provokes performance bottlenecks is the load imbalance in Master/Worker applications. A heuristic to dynamically tune granularity of work and number of workers is proposed. Finally, we describe the experimental validation of the performance model and its applicability on a set of real parallel applications. 相似文献

6.

Development and performance analysis of real‐world applications for distributed and parallel architectures

T. Fahringer P. Blaha A. Hssinger J. Luitz E. Mehofer H. Moritsch B. Scholz 《Concurrency and Computation》2001,13(10):841-868

Several large real‐world applications have been developed for distributed and parallel architectures. We examine two different program development approaches. First, the usage of a high‐level programming paradigm which reduces the time to create a parallel program dramatically but sometimes at the cost of a reduced performance; a source‐to‐source compiler, has been employed to automatically compile programs—written in a high‐level programming paradigm—into message passing codes. Second, a manual program development by using a low‐level programming paradigm—such as message passing—enables the programmer to fully exploit a given architecture at the cost of a time‐consuming and error‐prone effort. Performance tools play a central role in supporting the performance‐oriented development of applications for distributed and parallel architectures. SCALA—a portable instrumentation, measurement, and post‐execution performance analysis system for distributed and parallel programs—has been used to analyze and to guide the application development, by selectively instrumenting and measuring the code versions, by comparing performance information of several program executions, by computing a variety of important performance metrics, by detecting performance bottlenecks, and by relating performance information back to the input program. We show several experiments of SCALA when applied to real‐world applications. These experiments are conducted for a NEC Cenju‐4 distributed‐memory machine and a cluster of heterogeneous workstations and networks. Copyright © 2001 John Wiley & Sons, Ltd. 相似文献

7.

Performance measurement,visualization and modeling of parallel and distributed programs using the AIMS toolkit

Jerry Yan Sekhar Sarukkai Pankaj Mehra 《Software》1995,25(4):429-461

Writing large-scale parallel and distributed scientific applications that make optimum use of the multiprocessor is a challenging problem. Typically, computational resources are underused due to performance failures in the application being executed. Performance-tuning tools are essential for exposing these performance failures and for suggesting ways to improve program performance. In this paper, we first address fundamental issues in building useful performance-tuning tools and then describe our experience with the AIMS toolkit for tuning parallel and distributed programs on a variety of platforms. AIMS supports source-code instrumentation, run-time monitoring, graphical execution profiles, performance indices and automated modeling techniques as ways to expose performance problems of programs. Using several examples representing a broad range of scientific applications, we illustrate AIMS' effectiveness in exposing performance problems in parallel and distributed programs. 相似文献

8.

Automatic performance debugging of SPMD-style parallel programs

Xu LiuAuthor Vitae Jianfeng ZhanAuthor Vitae Kunlin ZhanAuthor Vitae Dan Meng^{Author Vitae} 《Journal of Parallel and Distributed Computing》2011,71(7):925-937

Automatic performance debugging of parallel applications includes two main steps: locating performance bottlenecks and uncovering their root causes for performance optimization. Previous work fails to resolve this challenging issue in two ways: first, several previous efforts automate locating bottlenecks, but present results in a confined way that only identifies performance problems with a priori knowledge; second, several tools take exploratory or confirmatory data analysis to automatically discover relevant performance data relationships, but these efforts do not focus on locating performance bottlenecks or uncovering their root causes.The simple program and multiple data (SPMD) programming model is widely used for both high performance computing and Cloud computing. In this paper, we design and implement an innovative system, AutoAnalyzer, that automates the process of debugging performance problems of SPMD-style parallel programs, including data collection, performance behavior analysis, locating bottlenecks, and uncovering their root causes. AutoAnalyzer is unique in terms of two features: first, without any prior knowledge, it automatically locates bottlenecks and uncovers their root causes for performance optimization; second, it is lightweight in terms of the size of performance data to be collected and analyzed. Our contributions are three-fold: first, we propose two effective clustering algorithms to investigate the existence of performance bottlenecks that cause process behavior dissimilarity or code region behavior disparity, respectively; meanwhile, we present two searching algorithms to locate bottlenecks; second, on the basis of the rough set theory, we propose an innovative approach to automatically uncover root causes of bottlenecks; third, on the cluster systems with two different configurations, we use two production applications, written in Fortran 77, and one open source code—MPIBZIP2 (http://compression.ca/mpibzip2/), written in C++, to verify the effectiveness and correctness of our methods. For three applications, we also propose an experimental approach to investigating the effects of different metrics on locating bottlenecks. 相似文献

9.

Faust: an integrated environment for parallel programming

Guarna V.A. Jr. Gannon D. Jablonowski D. Malony A.D. Gaur Y. 《Software, IEEE》1989,6(4):20-27

相似文献

10.

Distributed performance monitoring: methods, tools, andapplications

Hofmann R. Klar R. Mohr B. Quick A. Siegle M. 《Parallel and Distributed Systems, IEEE Transactions on》1994,5(6):585-598

相似文献

11.

Modeling the performance of parallel applications using model selection techniques

D. R. Martínez V. Blanco J. C. Cabaleiro T. F. Pena F. F. Rivera 《Concurrency and Computation》2014,26(2):586-599

Nowadays, parallel architectures are changing so fast that there is a need for scalable and efficient tools to analyze and predict the performance of parallel applications. Analytical models are proved to be a useful approximation for characterizing parallel algorithms, but developing accurate analytical models is a hard issue, and, in general, they provide coarse performance predictions due to their intrinsic lack of accuracy. In this paper, we describe in detail the Tools for Instrumentation and Analysis (TIA) framework, an easy‐to‐use tool that automatically obtains accurate performance models by means of analytical expressions. This framework automatizes most of its internal tasks, reducing opportunities for human error, and it only requires the user to focus on the metrics and execution parameters that might influence the performance, those that should be considered in the modeling process. Its main advantage over other tools is that TIA uses model selection techniques that allow the automation of the modeling process. As a case of study, the use of TIA to obtain analytical models of different implementations of the broadcast collective communication in a cluster of multicores is shown. The results obtained by TIA are evaluated and compared with theoretical approaches based on the LogGP model. Copyright © 2013 John Wiley & Sons, Ltd. 相似文献

12.

Care HPS: A high performance simulation tool for parallel and distributed agent-based modeling

《Future Generation Computer Systems》2017

Parallel and distributed simulation is a powerful tool for developing complex agent-based simulation. Complex simulations require parallel and distributed high performance computing solutions. It is necessary because their sequential solutions are not able to give answers in a feasible total execution time. Therefore, for the advance of computing science, it is important that High Performance Computing (HPC) techniques and solutions be proposed and studied. In literature, we can find some agent-based modeling and simulation tools that use HPC. However, none of these tools are designed to enable the HPC expert to be able to propose new techniques and solutions without great effort. In this paper, we introduce Care High Performance Simulation (HPS), which is a scientific instrument that enables researchers to: (1) develop techniques and solutions of high performance distributed simulations for agent-based models; and, (2) study, design and implement complex agent-based models that require HPC solutions. Care HPS was designed to easily and quickly develop new agent-based models. It was also designed to extend and implement new solutions for the main issues of parallel and distributed solutions such as: synchronization, communication, load and computing balancing, and partitioning algorithms. We conducted some experiments with the aim of showing the completeness and functionality of Care HPS. As a result, we show that Care HPS can be used as a scientific instrument for the advance of the agent-based parallel and distributed simulations field. 相似文献

13.

Fault tolerance for data parallel programs

C. Bertolli M. Vanneschi 《Concurrency and Computation》2011,23(6):595-632

The main issues when supporting fault tolerance based on checkpointing and rollback recovery for High‐Performance applications are related to the scalability of the introduced support, the possibility of analyzing the induced overhead and, in more general terms, the optimization of the trade‐off between failure‐free and recovery performances. In this paper we describe our contribution in fault tolerance for high‐level structured parallelism models. We take a different viewpoint w.r.t. existing contributions, by introducing a methodology to derive interesting properties to support fault tolerance. We show how to apply this methodology to a general data parallel model, deriving useful properties to introduce a class of checkpointing protocols. Thanks to this methodology, this class of protocols is not affected by the described issues. We exemplify two checkpointing protocols and the related rollback recovery techniques. For each protocol we also derive cost models statically describing the failure‐free performance, which can be used for performance tuning or to target some Quality of Service parameter. To assess the innovation of the results we analytically and experimentally compare the introduced protocols with two literature protocols. Results show that while the protocols introduced in this paper permit the definition of cost models and have a good scalability, the literature protocols do not always have these properties. Copyright © 2010 John Wiley & Sons, Ltd. 相似文献

14.

POEMS: end-to-end performance design of large parallel adaptive computational systems

Adve V.S. Bagrodia R. Browne J.C. Deelman E. Dube A. Houstis E.N. Rice J.R. Sakellariou R. Sundaram-Stukel D.J. Teller P.J. Vernon M.K. 《IEEE transactions on pattern analysis and machine intelligence》2000,26(11):1027-1048

The POEMS project is creating an environment for end-to-end performance modeling of complex parallel and distributed systems, spanning the domains of application software, runtime and operating system software, and hardware architecture. Toward this end, the POEMS framework supports composition of component models from these different domains into an end-to-end system model. This composition can be specified using a generalized graph model of a parallel system, together with interface specifications that carry information about component behaviors and evaluation methods. The POEMS Specification Language compiler will generate an end-to-end system model automatically from such a specification. The components of the target system may be modeled using different modeling paradigms and at various levels of detail. Therefore, evaluation of a POEMS end-to-end system model may require a variety of evaluation tools including specialized equation solvers, queuing network solvers, and discrete event simulators. A single application representation based on static and dynamic task graphs serves as a common workload representation for all these modeling approaches. Sophisticated parallelizing compiler techniques allow this representation to be generated automatically for a given parallel program. POEMS includes a library of predefined analytical and simulation component models of the different domains and a knowledge base that describes performance properties of widely used algorithms. The paper provides an overview of the POEMS methodology and illustrates several of its key components. The modeling capabilities are demonstrated by predicting the performance of alternative configurations of Sweep3D, a benchmark for evaluating wavefront application technologies and high-performance, parallel architectures. 相似文献

15.

一种基于深度学习的性能分析框架设计与实现

冯赟龙刘勇何王全《计算机工程与科学》2018,40(6):984-991

高性能计算系统的体系结构日益复杂和现有性能分析工具的智能程度不足,导致高性能计算应用的程序性能分析和优化的成本代价日益高昂。所幸,人工智能领域目前取得了重要进展,其中深度学习技术发挥了重要作用,它给性能分析工具的智能化带来了契机。提出一种基于深度学习的程序性能智能分析框架,其核心思想是将程序的性能分析问题抽象成可用机器学习技术描述的分类问题,使用处理器支持的PMU采集分类所需的性能数据并标准化,使用簇评估技术结合簇的实际含义确定性能问题类别,通过稀疏编码自动学习性能数据特征并构建性能问题分类模型。在神威太湖之光超级计算机上实现了程序性能分析框架原型。实验结果表明,该性能分析方法能够直观地指导程序员快速把握当前应用最为突出的性能瓶颈问题,提高应用优化的效率,降低用户调优代码的成本。相似文献

16.

Performance analysis of a distributed processing system — a case study

JS Saini EJ Zaluska 《Microprocessors and Microsystems》1985,9(4):184-190

A cost-effective computer-aided methodology is established for the performance analysis of distributed processing systems. Software tools (based on probabilistic and simulation models) to support this methodology are briefly described. The methodology and software tools are illustrated by applying them to the analysis of a local area network. An instrumentation system is used to monitor a prototype network and to assess the accuracy of the models. The models are shown to be good approximations. Although the case study considers the analysis of a file-server system, the software tools are quite general and may be used for other types of distributed processing system. 相似文献

17.

Modelling test data for performance evaluation of large parallel database machines

Chris Bates Innes Jelly Jon Kerridge 《Distributed and Parallel Databases》1996,4(1):5-23

Parallel servers offer improved processing power for relational database systems and provide system scalability. In order to support the users of these systems, new ways of assessing the performance of such machines are required. If these assessments are to show how the machines perform under commercial workloads they need to be based upon models which have a real commercial basis. This paper shows how a realistic model of a financial application has been developed and how a set of tools has been created which allow the implementation of the model on any commercial database system. The tools allow the generation of large quantities of test data in a manner which renders it amenable to subsequent independent analysis. The test data thus generated forms the basis for the performance tuning of parallel database machines.Recommended by: Patrick Valduriez 相似文献

18.

Reliable self‐deployment of distributed cloud applications

下载免费PDF全文

Xavier Etchevers Gwen Salaün Fabienne Boyer Thierry Coupaye Noel De Palma 《Software》2017,47(1):3-20

Cloud applications consist of a set of interconnected software elements distributed over several virtual machines, themselves hosted on remote physical servers. Most existing solutions for deploying such applications require human intervention to configure parts of the system, do not conform to functional dependencies among elements that must be respected when starting them, and do not handle virtual machine failures that can occur when deploying an application. This paper presents a self‐deployment protocol that was designed to automatically configure a set of software elements to be deployed on different virtual machines. This protocol works in a decentralized way, that is, there is no need for a centralized server. It also starts the software elements in a certain order, respecting important architectural invariants. This protocol supports virtual machine and network failures and always succeeds in deploying an application when faced with a finite number of failures. Designing such highly parallel management protocols is difficult; therefore, formal modeling techniques and verification tools were used for validation purposes. The protocol was implemented in Java and was used to deploy industrial applications. Copyright © 2016 John Wiley & Sons, Ltd. 相似文献

19.

Visualizing a hierarchy of performance models for software systems

Reda A. Ammar Carolyn Pe Rosiene 《Software》1993,23(3):293-315

相似文献

20.

Distributed volunteer computing for solving ensemble learning problems

《Future Generation Computer Systems》2016

The volunteer computing paradigm, along with the tailored use of peer-to-peer communication, has recently proven capable of solving a wide area of data-intensive problems in a distributed scenario. The Mining@Home framework is based on these paradigms and it has been implemented to run a wide range of distributed data mining applications. The efficiency and scalability of the architecture can be fully exploited when the overall task can be partitioned into distinct jobs that may be executed in parallel, and input data can be reused, which naturally leads to the use of data cachers. This paper explores the opportunities offered by Mining@Home for coping with the discovery of classifiers through the use of the bagging approach: multiple learners are used to compute models from the same input data, so as to extract a final model with high statistical accuracy. Analysis focuses on the evaluation of experiments performed in a real distributed environment, enriched with simulation assessment–to evaluate very large environments–and with an analytical investigation based on the iso-efficiency methodology. An extensive set of experiments allowed to analyze a number of heterogeneous scenarios, with different problem sizes, which helps to improve the performance by appropriately tuning the number of workers and the number of interconnected domains. 相似文献