期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

An on‐line performance visualization technology

Aleksandar M. Baki&#x; Matt W. Mutka Diane T. Rover 《Software》2003,33(15):1447-1469

A new software technology for on‐line performance analysis and the visualization of complex parallel and distributed systems is presented. Often heterogeneous, these systems need capabilities for the flexible integration and configuration of performance analysis and visualization. Our technology is based on an object‐oriented framework for the rapid prototyping and development of distributable visual objects. The visual objects consist of two levels, a platform/device‐specific low level and an analysis‐ and visualization‐specific high level. We have developed a very high‐level markup language called VOML and a compiler for the component‐based development of high‐level visual objects. The VOML is based on a software architecture for on‐line event processing and performance visualization called EPIRA. The technology lends itself to constructing high‐level visual objects from globally distributed component definitions. Details of the technology and tools used, as well as how an example visual object can be rapidly prototyped from several reusable components, are presented. Copyright © 2003 John Wiley & Sons, Ltd. 相似文献

2.

一种对应用透明的分布式系统性能分析方法

下载免费PDF全文

马晓晨孔小利《计算机工程与应用》2008,44(17):107-110

分布式系统的性能问题分析是一个公认的难题。以往的很多研究应用相关的数据和方法分析应用性能,这些方法通常需要对应用代码进行修改从而获取必要的执行信息。论文提供了一种针对分布式系统性能问题的新的分析方法,该方法通过动态探针获取通信信息,通过分析分布式系统的通信模式对性能问题进行诊断。实验证明该方法具有通用性和高效性。相似文献

3.

Performance measurement,visualization and modeling of parallel and distributed programs using the AIMS toolkit

Jerry Yan Sekhar Sarukkai Pankaj Mehra 《Software》1995,25(4):429-461

Writing large-scale parallel and distributed scientific applications that make optimum use of the multiprocessor is a challenging problem. Typically, computational resources are underused due to performance failures in the application being executed. Performance-tuning tools are essential for exposing these performance failures and for suggesting ways to improve program performance. In this paper, we first address fundamental issues in building useful performance-tuning tools and then describe our experience with the AIMS toolkit for tuning parallel and distributed programs on a variety of platforms. AIMS supports source-code instrumentation, run-time monitoring, graphical execution profiles, performance indices and automated modeling techniques as ways to expose performance problems of programs. Using several examples representing a broad range of scientific applications, we illustrate AIMS' effectiveness in exposing performance problems in parallel and distributed programs. 相似文献

4.

Utilizing commodity hardware and software to distribute a real‐world application: maximizing reuse while improving performance

Michael Davis Randy Smith Brandon Dixon Allen Parrish David Cordes 《Software》2005,35(7):621-641

Commodity computing hardware continues to increase performance while decreasing price. This combination is driving a renewed interest in parallel and distributed computing. In this study, we examine the performance of an existing application in a ten‐node computing cluster using commodity off‐the‐shelf components. The application is a statistical analysis software package that processes categorical data used by state public safety programs. The study examines various network topologies and focuses on minimizing the software modifications required to distribute the application. We conclude that parallel computing using commodity components is an effective mechanism to increase the performance of real‐world applications especially when the underlying application architectures have the flexibility to support efficient reuse of the existing code. Copyright © 2005 John Wiley & Sons, Ltd. 相似文献

5.

基于插桩技术的并行程序性能分析方法设计和实现*

马桂杰蒋昌俊刘吟王忱《计算机应用研究》2007,24(10):225-228

介绍了一种异构环境下的并行调试及性能分析工具ParaVT的设计方法和实现.通过对并行程序源代码的分析处理,利用自动插桩模板插入用于调试和性能分析的用户代码,从而对并行程序进行断点调试和性能参数收集,达到进一步优化程序设计的目的. 相似文献

6.

Model-driven monitoring support for the multi-view performance analysis of parallel embedded applications 总被引：1，自引：0，他引：1

J. Reference to Garcí a J. Reference to Entrialgo F. J. Reference to Su rez D. F. Reference to Garcí a 《Performance Evaluation》2000,39(1-4):81-98

This paper describes an approach to carry out performance analysis of parallel embedded applications. The approach is based on measurement, but in addition, the idea of driving the measurement process (application instrumentation and monitoring) by a behavioral model is introduced. Using this model, highly comprehensible performance information can be collected. The whole approach is based on this behavioral model, one instrumentation method and two tools, one for monitoring and the other for visualization and analysis. Each of these is briefly described, and the steps to carry out performance analysis using them are clearly defined. They are explained by means of a case study. Finally, one method to evaluate the intrusiveness of the monitoring approach is proposed, and the intrusiveness results for the case study are presented. 相似文献

7.

OpenMP compiler for distributed memory architectures

WANG Jue HU ChangJun ZHANG JiLin & LI JianJiang School of Information Engineering University of Science Technology Beijing Beijing China 《中国科学:信息科学(英文版)》2010,(5):932-944

OpenMP is an emerging industry standard for shared memory architectures. While OpenMP has advantages on its ease of use and incremental programming, message passing is today still the most widely-used programming model for distributed memory architectures. How to effectively extend OpenMP to distributed memory architectures has been a hot spot. This paper proposes an OpenMP system, called KLCoMP, for distributed memory architectures. Based on the partially replicating shared arrays memory model, we propose ... 相似文献

8.

Zorilla: a peer‐to‐peer middleware for real‐world distributed systems

Niels Drost Rob V. van Nieuwpoort Jason Maassen Frank J. Seinstra Henri E. Bal 《Concurrency and Computation》2011,23(13):1506-1521

The inherent complex nature of current distributed computing architectures hinders the widespread adoption of these systems for mainstream use. In general, users have access to a highly heterogeneous set of compute resources, which may include clusters, grids, desktop grids, clouds, and other compute platforms. This heterogeneity is especially problematic when running parallel and distributed applications. Software is needed which easily combines as many resources as possible into one coherent computing platform. In this paper, we introduce Zorilla: peer‐to‐peer (P2P) middleware that creates a single distributed environment from any available set of compute resources. Zorilla imposes minimal requirements on the resource used, is platform independent, and does not rely on central components. In addition to providing functionality on bare resources, Zorilla can exploit locally available middleware. Zorilla explicitly supports distributed and parallel applications, and allows resources from multiple sites to cooperate in a single computation. Zorilla makes extensive use of both virtualization and P2P techniques. We will demonstrate how virtualization and P2P combine into a simple design, while enhancing functionality and ease of use. Together, these techniques bring our goal a step closer: transparent, easy use of resources, even on very heterogeneous distributed systems. Copyright © 2011 John Wiley & Sons, Ltd. 相似文献

9.

Grid applications on distributed memory architectures: Implementation and evaluation

Karl Solchenbach 《Parallel Computing》1988,7(3):341-356

It was shown in the paper of Solchenbach and Trottenberg (in this special issue) that grid algorithms are inherently parallel and that parallel grid algorithms for regular grids can be efficiently implemented on dm-mp systems using the concept of grid partitioning.

In this paper, we demonstrate that grid applications can be implemented quite easily on dm-mp systems if a hardware-independent process system exists and convenient tools (such as the SUPRENUM mapping and communications library) are available.

The evaluation of parallel grid algorithms shows that the multiprocessor speedup and efficiency for single grid applications depends on the communication/calculation performance ratio of the hardware, on the communication/calculation ratio of the algorithms, and on the process size. The efficiency of parallel multigrid algorithms additionally depends on the number of nodes. 相似文献

10.

Composing and scheduling service‐oriented applications in time‐triggered distributed real‐time Java environments

Iria Estvez‐Ayres Pablo Basanta‐Val Marisol García‐Valls 《Concurrency and Computation》2014,26(1):152-193

During the last decade, the number of distributed application domains with temporal requirements has significantly augmented, arising the necessity of exploring new concepts and paradigms that allow, on the one hand, the development of dynamic and flexible distributed applications and, on the other hand, the reusability of code. Service‐oriented paradigms have been successfully applied to distributed environments, increasing their flexibility and allowing the reusability of their components. Besides, distributed real‐time Java technologies have shown to be a good candidate to deploy real‐time distributed applications. This paper presents a model for service‐oriented applications on a time‐triggered distributed real‐time Java environment, focusing on the definition of the temporal model of an application and its schedulability, applying and evaluating this model in real‐time service‐oriented composition algorithms. Copyright © 2012 John Wiley & Sons, Ltd. 相似文献

11.

Methodology for predicting performance of distributed and parallel systems

Rakesh Kushwaha 《Performance Evaluation》1993,18(3):189-204

This paper describes an accurate and efficient method to model and predict the performance of distributed/parallel systems. Various performance measures, such as the expected user response time, the system throughput and the average server utilization, can be easily estimated using this method. The methodology is based on known product form queueing network methods, with some additional approximations. The method is illustrated by evaluating performance of a multi-client multi-server distributed system. A system model is constructed and mapped to a probabilistic queueing network model which is used to predict its behavior. The effects of user think time and various design parameters on the performance of the system are investigated by both the analytical method and computer simulation. The accuracy of the former is verified. The methodology is applied to identify the bottleneck server and to establish proper balance between clients and servers in distributed/parallel systems. 相似文献

12.

异构平台下的并行程序性能可视化方法

郑晓薇顾慧《计算机工程与设计》2010,31(4)

为了便于对异构平台下的并行程序性能进行分析,在对可视化技术和并行计算与控制显示平台研究的基础上设计了一种异构环境下的性能可视化模型.针对该模型的特点利用监测代码插桩技术、性能数据事后分析等方法,给出了并行性能数据获取、转换与绘图的具体方法和实现过程,为跨平台并行性能数据的采集和转换提供了一种简便方法.实验结果表明了在异构环境下该方法对并行性能数据可视化的可行性与有效性. 相似文献

13.

微机网络环境下提高PVM并行程序性能的策略

尚月强《计算机工程与设计》2007,28(13):3100-3102,3129

网络并行计算是并行计算与分布式计算技术非常重要的发展方向之一,结合具体的数值试验,探讨了Windows操作系统下基于PVM的网络并行数值计算中影响PVM并行程序性能的几个重要因素,包括负载平衡、通信开销、网络性能、任务粒度、处理机个数、精度要求及处理机内存容量问题等,并提出了提高PVM并行程序性能的相应策略,以高效快速地实现问题的求解. 相似文献

14.

Revisiting conservative time synchronization protocols in parallel and distributed simulation

S. De Munck K. Vanmechelen J. Broeckhove 《Concurrency and Computation》2014,26(2):468-490

Computer simulations have become an indispensable tool for the empirical study of large‐scale systems. The timely simulation of these systems, however, is not without its challenges. Simulators have to be able to harness the full computational power of modern multicore architectures through parallel execution and overcome the memory limitations of a single computer. In this paper, we evaluate the performance of a parallel and distributed simulator using several conventional time synchronization protocols executed on modern multicore hardware. In addition, we comprehensively analyze a hybrid approach, combining two traditional protocols, increasing robustness, and enabling improved performance in a wider range of simulation scenarios. Finally, an adaptive algorithm to automatically configure this hybrid protocol is introduced and evaluated, eliminating manual user intervention and further improving robustness with respect to varying simulation conditions. Copyright © 2013 John Wiley & Sons, Ltd. 相似文献

15.

PLEIADES: An Internet‐based parallel/distributed system

D. Koulopoulos K. Papoutsis G. Goulas E. Housos 《Software》2002,32(11):1035-1049

The use of LAN‐based clusters of computers for computational purposes has been in use for several years with significant success and acceptability. The introduction of the Internet infrastructure as the interconnection medium of the cluster allows for additional flexibility and transparency of such systems. PLEIADES is an Internet‐based parallel/distributed system whose purpose is to allow users to use distant computational resources in order to form virtual clusters. In addition, PLEIADES can be used as a computational infrastructure service provider for applications in need of computational resources. PLEIADES uses a tiered architecture with particular emphasis on the existence of a middle tier, whose task is to assist in the communication between the interface and the resource management tiers. The existence of the middle tier allows for the creation of an open system that is able to easily integrate with new resource management platforms and tools. Since the use of a mature resource management system for parallel/distributed computing was a prerequisite of the PLEIADES architecture, the Condor resource management environment was used. The design and implementation characteristics of PLEIADES together with some experimental uses of the system are also presented. Copyright © 2002 John Wiley & Sons, Ltd. 相似文献

16.

Visual programming support for graph‐oriented parallel/distributed processing

Fan Chan Jiannong Cao Alvin T. S. Chan Kang Zhang 《Software》2005,35(15):1409-1439

GOP is a graph‐oriented programming model which aims at providing high‐level abstractions for configuring and programming cooperative parallel processes. With GOP, the programmer can configure the logical structure of a parallel/distributed program by constructing a logical graph to represent the communication and synchronization between the local programs in a distributed processing environment. This paper describes a visual programming environment, called VisualGOP, for the design, coding, and execution of GOP programs. VisualGOP applies visual techniques to provide the programmer with automated and intelligent assistance throughout the program design and construction process. It provides a graphical interface with support for interactive graph drawing and editing, visual programming functions and automation facilities for program mapping and execution. VisualGOP is a generic programming environment independent of programming languages and platforms. GOP programs constructed under VisualGOP can run in heterogeneous parallel/distributed systems. Copyright © 2005 John Wiley & Sons, Ltd. 相似文献

17.

并行程序开发流程及其辅助工具设计

崔焕庆《计算机工程与设计》2007,28(17):4079-4081,4088

并行程序开发大多遵循"开发-执行-验证和分析"的流程,开发周期较长,效率低下,而正确性和高性能是使用并行程序的首要条件.为此,提出了一种贯彻算法设计、程序开发到结果分析全过程的、可以同时进行正确性验证和性能分析的开发流程,给出了较完善的计算机辅助开发工具设计的原则和方法,并开发了消息传递并行程序设计的辅助工具原型.实验证明,该流程和方法提高了并行程序开发效率,简化了程序员的工作. 相似文献

18.

Automatic performance debugging of SPMD-style parallel programs

Xu LiuAuthor Vitae Jianfeng ZhanAuthor Vitae Kunlin ZhanAuthor Vitae Dan Meng^{Author Vitae} 《Journal of Parallel and Distributed Computing》2011,71(7):925-937

Automatic performance debugging of parallel applications includes two main steps: locating performance bottlenecks and uncovering their root causes for performance optimization. Previous work fails to resolve this challenging issue in two ways: first, several previous efforts automate locating bottlenecks, but present results in a confined way that only identifies performance problems with a priori knowledge; second, several tools take exploratory or confirmatory data analysis to automatically discover relevant performance data relationships, but these efforts do not focus on locating performance bottlenecks or uncovering their root causes.The simple program and multiple data (SPMD) programming model is widely used for both high performance computing and Cloud computing. In this paper, we design and implement an innovative system, AutoAnalyzer, that automates the process of debugging performance problems of SPMD-style parallel programs, including data collection, performance behavior analysis, locating bottlenecks, and uncovering their root causes. AutoAnalyzer is unique in terms of two features: first, without any prior knowledge, it automatically locates bottlenecks and uncovers their root causes for performance optimization; second, it is lightweight in terms of the size of performance data to be collected and analyzed. Our contributions are three-fold: first, we propose two effective clustering algorithms to investigate the existence of performance bottlenecks that cause process behavior dissimilarity or code region behavior disparity, respectively; meanwhile, we present two searching algorithms to locate bottlenecks; second, on the basis of the rough set theory, we propose an innovative approach to automatically uncover root causes of bottlenecks; third, on the cluster systems with two different configurations, we use two production applications, written in Fortran 77, and one open source code—MPIBZIP2 (http://compression.ca/mpibzip2/), written in C++, to verify the effectiveness and correctness of our methods. For three applications, we also propose an experimental approach to investigating the effects of different metrics on locating bottlenecks. 相似文献

19.

Dynamic software updates for parallel high‐performance applications

Dong Kwan Kim Eli Tilevich Calvin J. Ribbens 《Concurrency and Computation》2011,23(4):415-434

Despite using multiple concurrent processors, a typical high‐performance parallel application is long‐running, taking hours, even days to arrive at a solution. To modify a running high‐performance parallel application, the programmer has to stop the computation, change the code, redeploy, and enqueue the updated version to be scheduled to run, thus wasting not only the programmer's time, but also expensive computing resources. To address these inefficiencies, this article describes how dynamic software updates (DSU) can be used to modify a parallel application on the fly, thus saving the programmer's time and using expensive computing resources more productively. The net effect of updating parallel applications dynamically can reduce the total time that elapses between posing a problem and arriving at a solution, otherwise known as time‐to‐discovery. To explore the benefits of dynamic updates for high performance applications, this article takes a two‐pronged approach. First, we describe our experiences of building and evaluating a system for dynamically updating applications running on a parallel cluster. We then review a large body of literature describing the existing state of the art in DSU and point out how this research can be applied to high‐performance applications. Our experimental results indicate that DSU have the potential to become a powerful tool in reducing time‐to‐discovery for high‐performance parallel applications. Copyright © 2010 John Wiley & Sons, Ltd. 相似文献

20.

OpenMP‐oriented applications for distributed shared memory architectures

Ami Marowka Zhenying Liu Barbara Chapman 《Concurrency and Computation》2004,16(4):371-384

The rapid rise of OpenMP as the preferred parallel programming paradigm for small‐to‐medium scale parallelism could slow unless OpenMP can show capabilities for becoming the model‐of‐choice for large scale high‐performance parallel computing in the coming decade. The main stumbling block for the adaptation of OpenMP to distributed shared memory (DSM) machines, which are based on architectures like cc‐NUMA, stems from the lack of capabilities for data placement among processors and threads for achieving data locality. The absence of such a mechanism causes remote memory accesses and inefficient cache memory use, both of which lead to poor performance. This paper presents a simple software programming approach called copy‐inside–copy‐back (CC) that exploits the data privatization mechanism of OpenMP for data placement and replacement. This technique enables one to distribute data manually without taking away control and flexibility from the programmer and is thus an alternative to the automat and implicit approaches. Moreover, the CC approach improves on the OpenMP‐SPMD style of programming that makes the development process of an OpenMP application more structured and simpler. The CC technique was tested and analyzed using the NAS Parallel Benchmarks on SGI Origin 2000 multiprocessor machines. This study shows that OpenMP improves performance of coarse‐grained parallelism, although a fast copy mechanism is essential. Copyright © 2004 John Wiley & Sons, Ltd. 相似文献