首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
在设计大规模的并行应用程序时,如何使多处理器的利用率达到最优,这对程序设计人员来讲是一个很大的挑战。一般说来,由于应用程序在运行时性能上的缺陷,计算资源得不到充分利用。因此,迫切需要对应用程序进行“性能调试”,即在正确性的基础上,通过揭示这些缺陷,对程序进行细调而提高程序性能。在这篇文章里,介绍了一个软件工具包—自动插桩和监测系统(theAutomatedInstrumentationandMonitoringSystem),它集程序插桩、运行监测和性能分析为一体,支持在多处理器上对并行应用程序进行性能评估。文章首先论述了一些建立性能调试工具的基本问题;然后,详细描述AIMS系统的体系结构以及在利用AIMS工具包进行性能调试工具的开发中的经验;最后,使用两个例子详细地描述使用AIMS系统进行性能调试的过程。  相似文献   

2.
Large-scale scientific and engineering computation problems are usually complex and consequently the development of parallel programs for solving these problems is a difficult task. In this paper, we describe the graph-oriented programming (GOP) model and environment for building and evaluating parallel applications. The GOP model provides higher level abstractions for message-passing parallel programming and the software environment offers tools which can ease programmers for parallelizing, writing, and deploying scientific and engineering computing applications. We discuss the motivations and various issues in developing the model and the software environment, present the design of the system architecture and the components, and describe the evaluation of the environment implemented on top of MPI with a sample parallel scientific application program. With the support of the high-level abstractions provided by the proposed GOP environment, programming of parallel applications on various parallel architectures can be greatly simplified.  相似文献   

3.
This paper gives an overview of two related tools that we have developed to provide more accurate measurement and modelling of the performance of message-passing communication and application programs on distributed memory parallel computers. MPIBench uses a very precise, globally synchronised clock to measure the performance of MPI communication routines. It can generate probability distributions of communication times, not just the average values produced by other MPI benchmarks. This allows useful insights to be made into the MPI communication performance of parallel computers, and in particular how performance is affected by network contention. The Performance Evaluating Virtual Parallel Machine (PEVPM) provides a simple, fast and accurate technique for modelling and predicting the performance of message-passing parallel programs. It uses a virtual parallel machine to simulate the execution of the parallel program. The effects of network contention can be accurately modelled by sampling from the probability distributions generated by MPIBench. These tools are particularly useful on clusters with commodity Ethernet networks, where relatively high latencies, network congestion and TCP problems can significantly affect communication performance, which is difficult to model accurately using other tools. Experiments with example parallel programs demonstrate that PEVPM gives accurate performance predictions on commodity clusters. We also show that modelling communication performance using average times rather than sampling from probability distributions can give misleading results, particularly for programs running on a large number of processors.  相似文献   

4.
Throughout much of the parallel processing community there is the sense that writing software for distributed-memory parallel processors Is subject to a ‘no pain—no gain’ rule: that In order to reap the benefits of parallel computation one must first suffer the pain of converting the application to run on a parallel machine. We believe this Is the result of Inadequate programming tools and not a problem Inherent to parallel processing. We will show that one can parallelize real scientific applications and obtain good performance with little effort If the right tools are used. Our vehicle for this demonstration is a 6000-line DNA and protein sequence comparison application that we have implemented in Mental, an object-oriented parallel processing system for both parallel and distributed architectures. We briefly describe the application and present performance information for both the Mentat version and a hand-coded parallel version of the application.  相似文献   

5.
Several large real‐world applications have been developed for distributed and parallel architectures. We examine two different program development approaches. First, the usage of a high‐level programming paradigm which reduces the time to create a parallel program dramatically but sometimes at the cost of a reduced performance; a source‐to‐source compiler, has been employed to automatically compile programs—written in a high‐level programming paradigm—into message passing codes. Second, a manual program development by using a low‐level programming paradigm—such as message passing—enables the programmer to fully exploit a given architecture at the cost of a time‐consuming and error‐prone effort. Performance tools play a central role in supporting the performance‐oriented development of applications for distributed and parallel architectures. SCALA—a portable instrumentation, measurement, and post‐execution performance analysis system for distributed and parallel programs—has been used to analyze and to guide the application development, by selectively instrumenting and measuring the code versions, by comparing performance information of several program executions, by computing a variety of important performance metrics, by detecting performance bottlenecks, and by relating performance information back to the input program. We show several experiments of SCALA when applied to real‐world applications. These experiments are conducted for a NEC Cenju‐4 distributed‐memory machine and a cluster of heterogeneous workstations and networks. Copyright © 2001 John Wiley & Sons, Ltd.  相似文献   

6.

Modern scientific research challenges require new technologies, integrated tools, reusable and complex experiments in distributed computing infrastructures. But above all, computing power for efficient data processing and analyzing. Containers technologies have emerged as a new paradigm to address such intensive scientific applications problems. Their easy deployment in a reasonable amount of time and the few required computational resource make them more suitable. Containers are considered light virtualization solutions. They enable performance isolation and flexible deployment of complex, parallel, and high-performance systems. Moreover, they gained popularity to modernize and migrate scientific applications in computing infrastructure management. Additionally, they reduce computational time processing. In this paper, we first give an overview of virtualization and containerization technologies. We discuss the taxonomies of containerization technologies of the literature, and then we provide a new one that covers and completes those proposed in the literature. We identify the most important application domains of containerization and their technological progress. Furthermore, we discuss the performance metrics used in most containerization techniques. Finally, we point out research gaps in the related aspects of containerization technology that require more research.

  相似文献   

7.
《Parallel Computing》1997,22(13):1747-1770
To provide high-level graphical support for PVM (Parallel Virtual Machine) based program development, a complex programming environment (GRADE) is being developed. GRADE currently provides tools to construct, execute, debug, monitor and visualize message-passing parallel programs. It offers a high-level graphical programming abstraction mechanism to construct parallel applications by introducing a new graphical language called GRAPNEL. GRADE also provides the programmer with the same graphical user interface during the program design and debugging stages. A distributed debugging engine (DDBG) assists the user in debugging GRAPNEL programs on distributed memory computer architectures. Tape/PVM and PROVE support the performance monitoring and visualization of parallel programs developed in the GRADE environment.  相似文献   

8.
Parallel and distributed simulation is a powerful tool for developing complex agent-based simulation. Complex simulations require parallel and distributed high performance computing solutions. It is necessary because their sequential solutions are not able to give answers in a feasible total execution time. Therefore, for the advance of computing science, it is important that High Performance Computing (HPC) techniques and solutions be proposed and studied. In literature, we can find some agent-based modeling and simulation tools that use HPC. However, none of these tools are designed to enable the HPC expert to be able to propose new techniques and solutions without great effort. In this paper, we introduce Care High Performance Simulation (HPS), which is a scientific instrument that enables researchers to: (1) develop techniques and solutions of high performance distributed simulations for agent-based models; and, (2) study, design and implement complex agent-based models that require HPC solutions. Care HPS was designed to easily and quickly develop new agent-based models. It was also designed to extend and implement new solutions for the main issues of parallel and distributed solutions such as: synchronization, communication, load and computing balancing, and partitioning algorithms. We conducted some experiments with the aim of showing the completeness and functionality of Care HPS. As a result, we show that Care HPS can be used as a scientific instrument for the advance of the agent-based parallel and distributed simulations field.  相似文献   

9.
Distributed data mining on grids: services, tools, and applications   总被引:4,自引:0,他引:4  
Data mining algorithms are widely used today for the analysis of large corporate and scientific datasets stored in databases and data archives. Industry, science, and commerce fields often need to analyze very large datasets maintained over geographically distributed sites by using the computational power of distributed and parallel systems. The grid can play a significant role in providing an effective computational support for distributed knowledge discovery applications. For the development of data mining applications on grids we designed a system called Knowledge Grid. This paper describes the Knowledge Grid framework and presents the toolset provided by the Knowledge Grid for implementing distributed knowledge discovery. The paper discusses how to design and implement data mining applications by using the Knowledge Grid tools starting from searching grid resources, composing software and data components, and executing the resulting data mining process on a grid. Some performance results are also discussed.  相似文献   

10.
Workflows are used to orchestrate data-intensive applications in many different scientific domains. Workflow applications typically communicate data between processing steps using intermediate files. When tasks are distributed, these files are either transferred from one computational node to another, or accessed through a shared storage system. As a result, the efficient management of data is a key factor in achieving good performance for workflow applications in distributed environments. In this paper we investigate some of the ways in which data can be managed for workflows in the cloud. We ran experiments using three typical workflow applications on Amazon’s EC2 cloud computing platform. We discuss the various storage and file systems we used, describe the issues and problems we encountered deploying them on EC2, and analyze the resulting performance and cost of the workflows.  相似文献   

11.
High-speed, wide-area networks have made it both possible and desirable to interconnect geographically distributed applications that control distributed collections of scientific data, remote scientific instruments and high-performance computer systems. Historically, performance analysis has focused on monolithic applications executing on large, stand-alone, parallel systems. In such a domain, measurement, postmortem analysis and code optimization suffice to eliminate performance bottlenecks and optimize applications. Distributed visualization, data mining and analysis tools allow scientists to collaboratively analyze and understand complex phenomena. Likewise, real-time performance measurement and immersive performance display systems-i.e. systems providing large stereoscopic displays of complex data-enable collaborating groups to interact with executing software, tuning its behavior to meet research and performance goals. To satisfy these demands, the authors designed Virtue, a prototype system that integrates collaborative, immersive performance visualization with real-time performance measurement and adaptive control of applications on computational grids. These tools enable physically distributed users to explore and steer the behavior of complex software in real time and to analyze and optimize distributed application dynamics  相似文献   

12.
This paper presents a parallel file object environment to support distributed array store on shared-nothing distributed computing environments. Our environment enables programmers to extend the concept of array distributions from memory levels to file levels. It allows parallel I/O that facilitates the distribution of objects in an application. When objects are read and/or written by multiple applications using different distributions, we present a novel scheme to help programmers to select the best data distribution pattern according to a minimum amount of remote data movements for the store of array objects on distributed file systems. Our selection scheme, to the best of our knowledge, is the first work to attempt to optimize the distribution patterns in the secondary storage for HPF-like programs with inter-application cases. This is especially important for a class of problems called multiple disciplinary optimization (MDO) problems. Our test bed is built on an 8-node DEC Farm connected with an ethernet, FDDI, or ATM switch. Our experimental results with scientific applications show that not only our parallel file system can provide aggregate bandwidths, but also our selection scheme effectively reduces the communication traffic for the system.  相似文献   

13.
Parallelism has become a way of life for many scientific programmers. A significant challenge in bringing the power of parallel machines to these programmers is providing them with a suite of software tools similar to the tools that sequential programmers currently utilize. Unfortunately, writing correct parallel programs remains a challenging task.In particular, automatic or semi‐automatic testing tools for parallel programs are lacking. This paper takes a first step in developing an approach to providing all‐uses coverage for parallel programs. A testing framework and theoretical foundations for structural testing are presented, including test data adequacy criteria and hierarchy, formulation and illustration of all‐uses testing problems, classification of all‐uses test cases for parallel programs, and both theoretical and empirical results with regard to what can be achieved with all‐uses coverage for parallel programs. Copyright © 2003 John Wiley & Sons, Ltd.  相似文献   

14.
As the size of large-scale computer systems increases, their mean-time-between-failures are becoming significantly shorter than the execution time of many current scientific applications. To complete the execution of scientific applications, they must tolerate hardware failures. Conventional rollback-recovery protocols redo the computation of the crashed process since the last checkpoint on a single processor. As a result, the recovery time of all protocols is no less than the time between the last checkpoint and the crash. In this paper, we propose a new application-level fault-tolerant approach for parallel applications called the Fault-Tolerant Parallel Algorithm (FTPA), which provides fast self-recovery. When fail-stop failures occur and are detected, all surviving processes recompute the workload of failed processes in parallel. FTPA, however, requires the user to be involved in fault tolerance. In order to ease the FTPA implementation, we developed Get it Fault-Tolerant (GiFT), a source-to-source precompiler tool to automate the FTPA implementation. We evaluate the performance of FTPA with parallel matrix multiplication and five kernels of NAS Parallel Benchmarks on a cluster system with 1,024 CPUs. The experimental results show that the performance of FTPA is better than the performance of the traditional checkpointing approach.  相似文献   

15.
Message passing interface (MPI) is the de facto standard in writing parallel scientific applications on distributed memory systems. Performance prediction of MPI programs on current or future parallel systems can help to find system bottleneck or optimize programs. To effectively analyze and predict performance of a large and complex MPI program, an efficient and accurate communication model is highly needed. A series of communication models have been proposed, such as the LogP model family, which assume th...  相似文献   

16.
Automatic performance debugging of parallel applications includes two main steps: locating performance bottlenecks and uncovering their root causes for performance optimization. Previous work fails to resolve this challenging issue in two ways: first, several previous efforts automate locating bottlenecks, but present results in a confined way that only identifies performance problems with a priori knowledge; second, several tools take exploratory or confirmatory data analysis to automatically discover relevant performance data relationships, but these efforts do not focus on locating performance bottlenecks or uncovering their root causes.The simple program and multiple data (SPMD) programming model is widely used for both high performance computing and Cloud computing. In this paper, we design and implement an innovative system, AutoAnalyzer, that automates the process of debugging performance problems of SPMD-style parallel programs, including data collection, performance behavior analysis, locating bottlenecks, and uncovering their root causes. AutoAnalyzer is unique in terms of two features: first, without any prior knowledge, it automatically locates bottlenecks and uncovers their root causes for performance optimization; second, it is lightweight in terms of the size of performance data to be collected and analyzed. Our contributions are three-fold: first, we propose two effective clustering algorithms to investigate the existence of performance bottlenecks that cause process behavior dissimilarity or code region behavior disparity, respectively; meanwhile, we present two searching algorithms to locate bottlenecks; second, on the basis of the rough set theory, we propose an innovative approach to automatically uncover root causes of bottlenecks; third, on the cluster systems with two different configurations, we use two production applications, written in Fortran 77, and one open source code—MPIBZIP2 (http://compression.ca/mpibzip2/), written in C++, to verify the effectiveness and correctness of our methods. For three applications, we also propose an experimental approach to investigating the effects of different metrics on locating bottlenecks.  相似文献   

17.
Novel approaches are presented for designing performance measurement systems for parallel and distributed programs. The first approach involves unifying performance information into a single, regular structure that reflects the structure of programs under measurement. The authors define a hierarchical model for the execution of parallel and distributed programs as a framework for the performance measurement. A complete different levels of detail in the hierarchy. The second approach is based on the development of automatic guidance techniques that can direct users to the location of performance problems in the program. Guidance information from such techniques supplies facts about problems in the program and provides possible answers for the further improvement of program efficiency. A performance measurement system, called IPS, has been developed as a prototype of the authors' model and design. Some of the test results from IPS are also discussed  相似文献   

18.
Parallel/distributed systems are continuously growing. This allows and enables the scalability of the applications, either by considering bigger problems in the same period of time or by solving the problem in a shorter time. In consequence, the methodologies, approaches and tools related to parallel paradigm should be brought up to date to support the increasing requirements of the applications and the users. MATE (Monitoring, Analysis and Tuning Environment) provides automatic and dynamic tuning for parallel/distributed applications. The tuning decisions are made according to performance models, which provide a fast means to decide what to improve in the execution. However, MATE presents some bottlenecks as the application grows, due to the fact that the analysis process is made in a full centralized manner. In this work, we propose a new approach to make MATE scalable. In addition, we present the experimental results and the analysis to validate the proposed approach against the original one.  相似文献   

19.
This paper describes problems, challenges, and opportunities forintelligent simulation of physical systems. Prototype intelligent simulation tools have been constructed for interpreting massive data sets from physical fields and for designing engineering systems. We identify the characteristics of intelligent simulation and describe several concrete application examples. These applications, which include weather data interpretation, distributed control optimization, and spatio-temporal diffusion-reaction pattern analysis, demonstrate that intelligent simulation tools are indispensable for the rapid prototyping of application programs in many challenging scientific and engineering domains.  相似文献   

20.
Reconfigurable MPSoCs (Multiprocessor System-on-Chip) could be viable for certain applications niche where the flexibility of FPGAs (Field-Programmable Gate Array) and software is needed, and a small number of units dismiss other silicon options. However, their design complexity is very high, and raises additional problems, i.e. the definition of a suitable programming model, an efficient memory organization, and the need for ways to optimize application performance.In this paper, we propose a complete development process, which addresses these problems by complementing the current SoC (System-on-Chip) development process with additional steps to support parallel programming and software optimization. This work explains systematically problems and solutions to achieve a FPGA-based MPSoC following our systematic flow and offering tools and techniques to develop parallel applications for such systems.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号