首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Debuggers play an important role in developing parallel applications. They are used to control the state of many processes, to present distributed information in a concise and clear way, to observe the execution behavior, and to detect and locate programming errors. More sophisticated debugging systems also try to improve understanding of global execution behavior and intricate details of a program. In this paper we describe the design and implementation of SPiDER, which is an interactive source‐level debugging system for both regular and irregular High‐Performance Fortran (HPF) programs. SPiDER combines a base debugging system for message‐passing programs with a high‐level debugger that interfaces with an HPF compiler. SPiDER, in addition to conventional debugging functionality, allows a single process of a parallel program to be expected or the entire program to be examined from a global point of view. A sophisticated visualization system has been developed and included in SPiDER to visualize data distributions, data‐to‐processor mapping relationships, and array values. SPiDER enables a programmer to dynamically change data distributions as well as array values. For arrays whose distribution can change during program execution, an animated replay displays the distribution sequence together with the associated source code location. Array values can be stored at individual execution points and compared against each other to examine execution behavior (e.g. convergence behavior of a numerical algorithm). Finally, SPiDER also offers limited support to evaluate the performance of parallel programs through a graphical load diagram. SPiDER has been fully implemented and is currently being used for the development of various real‐world applications. Several experiments are presented that demonstrate the usefulness of SPiDER. Copyright © 2002 John Wiley & Sons, Ltd.  相似文献   

2.
Since its release, the Java programming language has attracted considerable attention from the high‐performance computing (HPC) community because of its portability, high programming productivity, and built‐in multithreading and networking support. As a consequence, several initiatives have been taken to develop a high‐performance Java message‐passing library to program distributed memory architectures, such as clusters. The performance of Java message‐passing applications relies heavily on the communications performance. Thus, the design and implementation of low‐level communication devices that support message‐passing libraries is an important research issue in Java for HPC. MPJ Express is our Java message‐passing implementation for developing high‐performance parallel Java applications. Its public release currently contains three communication devices: the first one is built using the Java New Input/Output (NIO) package for the TCP/IP; the second one is specifically designed for the Myrinet Express library on Myrinet; and the third one supports thread‐based shared memory communications. Although these devices have been successfully deployed in many production environments, previous performance evaluations of MPJ Express suggest that the buffering layer, tightly coupled with these devices, incurs a certain degree of copying overhead, which represents one of the main performance penalties. This paper presents a more efficient Java message‐passing communications device, based on Java Input/Output sockets, that avoids this buffering overhead. Moreover, this device implements several strategies, both in the communication protocol and in the HPC hardware support, which optimizes Java message‐passing communications. In order to evaluate its benefits, this paper analyzes the performance of this device comparatively with other Java and native message‐passing libraries on various high‐speed networks, such as Gigabit Ethernet, Scalable Coherent Interface, Myrinet, and InfiniBand, as well as on a shared memory multicore scenario. The reported communication overhead reduction encourages the upcoming incorporation of this device in MPJ Express ( http://mpj‐express.org ). Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   

3.
Multi‐core processors offer a huge potential of parallelism but pose a challenge of program development for achieving high performance in real applications. We compare three popular parallel programming models—POSIX threads (Pthreads), OpenMP, and Threading Building Blocks (TBB)—regarding their use for multi‐core systems. We analyze how these models can be employed for implementing various parallelizations of a real‐world application from the area of medical imaging, and we conduct extensive runtime experiments to measure performance. Our main contribution is a comprehensive comparison of Pthreads, OpenMP, and TBB with respect to the following criteria: program development effort, programming style, level of abstraction, and runtime performance on multi‐cores. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

4.
Developing high‐quality, error‐free message‐passing concurrent programs is not trivial. Although a number of different primitives with associated semantics are available to assist such development, they often increase the complexity of the testing process. In this paper, we extend our previous test model for message‐passing programs and present new structural testing criteria, taking into account additional features used in this paradigm, such as collective communication, non‐blocking sends, distinct semantics for non‐blocking receives, and persistent operations. Our new model also recognizes that sender primitives cannot always be matched with every receive primitive. This improvement allows us to remove statically a significant number of infeasible synchronization edges that would otherwise have to be analyzed later by the tester. In this paper, the test model is presented using the Message‐Passing Interface standard; however, our new model has been designed to be flexible, and it can be configured to support a range of different message‐passing environments or languages. We have carried out case studies showing the applicability of the new test model to represent message‐passing programs and also to reveal errors, mainly those errors related to inter‐process communication. In addition to increasing the number of features supported by the test model, we have also reduced the overall cost of testing significantly. Our case studies suggest that the number of synchronization edges can be reduced by up to 93%, mainly by eliminating infeasible edges between unmatchable communication primitives. The main contribution of the paper is to present a more flexible test model that provides improved coverage for message‐passing programs and at the same time reduces the cost of testing significantly. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

5.
PC机群上JIAJIA与MPI的比较   总被引:3,自引:2,他引:3       下载免费PDF全文
对JIAJIA和MPI (message passing interface)是进行了比较.JIAJIA和MPI分别代表共享存储和消息传递的编程模式.MPI显式进行数据传输,编程复杂;JIAJIA由底层维护数据一致性,并附加提供简单的消息传递函数,编程容易、灵活.JIAJIA分配共享内存时开销较大,初始化时间比MPI长.提出了一个关于并行加速比与进程数目之间关系的近似经验公式,推出JIAJIA和MPI性能差距随着进程数目的增多而增大的结论.测试结果表明,大部分应用程序的JIAJIA和MPI版本的并行性能差距不超过10%.对于通信量很小的应用程序,其JIAJIA和MPI的性能差距较小,而通信量本身较大的应用程序,其JIAJIA和MPI的性能差距主要取决于运行时产生的实际通信量.  相似文献   

6.
曙光1000A上消息传递与共享存储的比较   总被引:12,自引:2,他引:12  
分布式共享存储虽然有易于编程的优点,但往往被认为效率不高、完全由软件实现的分布式共享存储系统(又称为虚拟共享存储系统)更是如此,文中以典型的消息传递系统PVM与分布式共享存储系统JIAJIA粉列,报这两种并行程序设计环境的特点,并用7个应用程序在曙光1000A上分别比较了这两个系统的性能,实验3结果表明,JIAJIA的与PV玎当,但基于JIAJIA的并行程序设计却比PVN简单得多。  相似文献   

7.
Parallel programming is orders of magnitudes more complex than writing sequential programs. This is particularly true for programming distributed memory multiprocessor architectures based on message passing programming models. Apart from understanding the sequential parts of the parallel program, new degrees of freedom lead to additional problems. Understanding the synchronization and communication behavior of parallel programs is the most critical issue in programming distributed memory multiprocessors. The paper describes methods and tools for visualization and animation of the dynamic execution of parallel programs. Based on an evaluation and classification of existing visualization environments, the visualization and animation tool VISTOP (VISualization TOol for Parallel Systems) is presented as part of the integrated tool environment TOPSY S (TOols for Parallel SYStems) for programming distributed memory multiprocessors. VISTOP supports the interactive on-line visualization of message passing programs based on various views; in particular, a process graph based concurrency view for detecting synchronization and communication bugs.  相似文献   

8.
《Computer Languages》1996,22(2-3):181-192
An effective resolution multiprocessor can be built from distributed processing, logic programming, and interface elements. Widely used, portable, components can be modularly composed into a portable parallel system that displays good resistance to premature obsolescence by software evolution. A virtual multiprocessor offering common message passing and configuration services integrates a distributed mesh of sequential resolution engines. Users configure and control the resolution engines and virtual multiprocessor through a GUI using an embedded command language to drive its facilities. Prolog programs either explicitly control parallel execution through message passing or would have to rely on program transformation techniques to extract parallelism implicitly.  相似文献   

9.
在分布式存储系统上,MPI已被证实是理想的并行程序设计模型。MPI是基于消息传递的并行编程模型,进程间的通信是通过调用库函数来实现的,因此MPI并行程序中,通信部分代码的效率对该并行程序的性能有直接的影响。通过用集群通信函数替代点对点通信函数以及通过派生数据类型和建立新通信域这两种方式,两次改进DNS的MPI并行程序实现,并通过实验给出一个优化MPI并行程序的一般思路与方法。  相似文献   

10.
Writing large-scale parallel and distributed scientific applications that make optimum use of the multiprocessor is a challenging problem. Typically, computational resources are underused due to performance failures in the application being executed. Performance-tuning tools are essential for exposing these performance failures and for suggesting ways to improve program performance. In this paper, we first address fundamental issues in building useful performance-tuning tools and then describe our experience with the AIMS toolkit for tuning parallel and distributed programs on a variety of platforms. AIMS supports source-code instrumentation, run-time monitoring, graphical execution profiles, performance indices and automated modeling techniques as ways to expose performance problems of programs. Using several examples representing a broad range of scientific applications, we illustrate AIMS' effectiveness in exposing performance problems in parallel and distributed programs.  相似文献   

11.
有效的消息通讯是提高分布存储器并行计算机性能的关键因素.点对点通讯和广播通讯是2种常用的消息通讯方法,而多播通讯(Multicasting)是指从一个源节点同时给任意多个目标节点发送消息,这种通讯比点对点和广播2种方式更具一般性,适用于很多实际应用的需求.本文针对PAR95并行计算机的二维网格结构,提出一种基于网络分解的多播消息通讯方法,并比较了该方法与用多个点对点方法实现多播通讯的性能.  相似文献   

12.
13.
A model of parallel program that can be effectively interpreted on the development computer guaranteeing the possibility of a sufficiently precise prediction of real run time for a simulated parallel program at the prescribed computer system is studied. The model is worked out for parallel programs with explicit message passing written in the Java language with MPI library access and is included into the composition of ParJava environment. The model is obtained by transforming the program control tree that can be constructed for Java programs by modifying the abstract syntax tree. To model communication functions, the model LogGP is used which allows taking into consideration the specific character of the communication network of the distributed computer system.  相似文献   

14.
Processes are building blocks for the modeling of environments in which parallel and distributed processing occurs. In parallel programming they play the role of standard units (as do subroutines or procedures in sequential programming). Process communication and synchronization can be achieved either through shared variables (common address space) or by message transmission. It has been shown that the message transmission mechanism leads to a more general computational structure. In this paper we develop the beginnings of a methodology to deal with what we call message oriented programming. We note in passing that the methodology for programming with shared variables is well developed and shows a development leading from operational (automata oriented) constructs (semaphores) to high level programming constructs (critical regions and then monitors). Recent mathematical theories of message oriented programming deal with the subject from an operational point of view. However, the models are too far removed from the control and data structures of programs to guide the designer in constructing a process. To be able to bridge this gap between program specification and program implementation (expressed in the high level language that we use), we resort to a definitional specification technique based on the concept of abstract data type. The specification technique is used in conjunction with some useful design principles to illustrate our ideas via solutions to well-known problems. The two main principles we discuss are: differences (and needs) for definitions of process and message structures in a given application and the concepts of module strength and module coupling as put forward by Myers. Finally, we illustrate how the high level definitional technique leads us to a straightforward method for studying some properties of message oriented programs. We give an example of proving the deadlock freeness of a solution to the consumer and producer problem.This work was supported by a research grant (to C. J. Lucena) from the J. S. Guggenheim Memorial Foundation and also by the Natural Sciences and Engineering Research Council of Canada.  相似文献   

15.
《Parallel Computing》1997,22(13):1747-1770
To provide high-level graphical support for PVM (Parallel Virtual Machine) based program development, a complex programming environment (GRADE) is being developed. GRADE currently provides tools to construct, execute, debug, monitor and visualize message-passing parallel programs. It offers a high-level graphical programming abstraction mechanism to construct parallel applications by introducing a new graphical language called GRAPNEL. GRADE also provides the programmer with the same graphical user interface during the program design and debugging stages. A distributed debugging engine (DDBG) assists the user in debugging GRAPNEL programs on distributed memory computer architectures. Tape/PVM and PROVE support the performance monitoring and visualization of parallel programs developed in the GRADE environment.  相似文献   

16.
Commodity computing hardware continues to increase performance while decreasing price. This combination is driving a renewed interest in parallel and distributed computing. In this study, we examine the performance of an existing application in a ten‐node computing cluster using commodity off‐the‐shelf components. The application is a statistical analysis software package that processes categorical data used by state public safety programs. The study examines various network topologies and focuses on minimizing the software modifications required to distribute the application. We conclude that parallel computing using commodity components is an effective mechanism to increase the performance of real‐world applications especially when the underlying application architectures have the flexibility to support efficient reuse of the existing code. Copyright © 2005 John Wiley & Sons, Ltd.  相似文献   

17.
Message passing interface (MPI) is the de facto standard in writing parallel scientific applications on distributed memory systems. Performance prediction of MPI programs on current or future parallel systems can help to find system bottleneck or optimize programs. To effectively analyze and predict performance of a large and complex MPI program, an efficient and accurate communication model is highly needed. A series of communication models have been proposed, such as the LogP model family, which assume th...  相似文献   

18.
Zippy: A Framework for Computation and Visualization on a GPU Cluster   总被引:1,自引:0,他引:1  
Due to its high performance/cost ratio, a GPU cluster is an attractive platform for large scale general‐purpose computation and visualization applications. However, the programming model for high performance general‐purpose computation on GPU clusters remains a complex problem. In this paper, we introduce the Zippy frame‐work, a general and scalable solution to this problem. It abstracts the GPU cluster programming with a two‐level parallelism hierarchy and a non‐uniform memory access (NUMA) model. Zippy preserves the advantages of both message passing and shared‐memory models. It employs global arrays (GA) to simplify the communication, synchronization, and collaboration among multiple GPUs. Moreover, it exposes data locality to the programmer for optimal performance and scalability. We present three example applications developed with Zippy: sort‐last volume rendering, Marching Cubes isosurface extraction and rendering, and lattice Boltzmann flow simulation with online visualization. They demonstrate that Zippy can ease the development and integration of parallel visualization, graphics, and computation modules on a GPU cluster.  相似文献   

19.
Nested parallelism appears naturally in many applications. It is required whenever a function performing parallel statements needs to call a subroutine using parallelism. A particular case occurs when the function is recursive. Nested parallelism is to parallel programming as basic as nested loops to sequential programming. Despite this, most existing parallel languages do not provide this feature. This paper presents a new methodology to expand message passing libraries (MPL) with nested parallelism. The tool to support the methodology has processor virtualization, load balancing, pipeline parallelism and collective operations, among other features. The computational results prove that the performance obtained is comparable to that obtained using classical message passing programs. Since the methodology does not force the programmer to leave the MPL environment, all the efficiency and portability of the MPL model is preserved. Copyright © 1999 John Wiley & Sons, Ltd.  相似文献   

20.
This article proposes a programming language called “Espace” for parallel and distributed computation. In general, it is difficult to code a distributed, parallel program due to multi-threading, message passing, managing clients, and so on. Espace involves a few simple syntax rules added to Java. Developers do not need to know how to write a parallel, distributed program source code in detail. This work applies Espace to parallelize an evolutionary computation program, and shows that the Espace compiler allows the conversion of an evolutionary computation program written in Java into a distributed, parallel system by adding a few words to the program.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号