期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Glenn R. Luecke Yan Zou James Coyle Jim Hoekstra Marina Kraeva 《Concurrency and Computation》2002,14(11):911-932

The Message‐Passing Interface (MPI) is commonly used to write parallel programs for distributed memory parallel computers. MPI‐CHECK is a tool developed to aid in the debugging of MPI programs that are written in free or fixed format Fortran 90 and Fortran 77. This paper presents the methods used in MPI‐CHECK 2.0 to detect many situations where actual and potential deadlocks occur when using blocking and non‐blocking point‐to‐point routines as well as when using collective routines. Copyright © 2002 John Wiley & Sons, Ltd. 相似文献

2.

Gauss: A Framework for Verifying Scientific Computing Software

Robert Palmer Steve Barrus Yu Yang Ganesh Gopalakrishnan Robert M. Kirby 《Electronic Notes in Theoretical Computer Science》2006,144(3):95

High performance scientific computing software is of critical international importance as it supports scientific explorations and engineering. Software development in this area is highly challenging owing to the use of parallel/distributed programming methods and complex communication and synchronization libraries. There is very little use of formal methods to debug software in this area, given that the scientific computing community and the formal methods community have not traditionally worked together. The Utah Gauss project combines expertise from scientific computing and formal methods in addressing this problem. We currently focus on MPI programs which are the kind that run on over 60% of world's supercomputers. These are programs written in C / C++ / FORTRAN employing message passing concurrency supported by the Message Passing Interface (MPI) library. Large-scale MPI programs also employ shared memory threads to manage concurrency within smaller task sub-groups, capitalizing on the recent availability of small-scale (e.g. single-chip) shared memory multiprocessors; such mixed programming styles can result in additional bugs. MPI libraries themselves can be buggy as they strive to implement complex requirements employing aggressive techniques such as multi-threading. We have built a model extractor that extracts from MPI C programs a formal model consisting of communicating processes represented in Microsoft's Zing modeling language. MPI library functions are also being modeled in Zing. This allows us to run formal analysis on the models to detect bugs in the MPI programs being analyzed. Our preliminary results and future plans are described; in addition, our contribution is to expose the special needs of this area and suggest specific avenues for problem- driven advances in software model-checking applied to scientific computing software development and verification. 相似文献

3.

Structural testing criteria for message‐passing parallel programs

S. R. S. Souza S. R. Vergilio P. S. L. Souza A. S. Simo A. C. Hausen 《Concurrency and Computation》2008,20(16):1893-1916

Parallel programs present some features such as concurrency, communication and synchronization that make the test a challenging activity. Because of these characteristics, the direct application of traditional testing is not always possible and adequate testing criteria and tools are necessary. In this paper we investigate the challenges of validating message‐passing parallel programs and present a set of specific testing criteria. We introduce a family of structural testing criteria based on a test model. The model captures control and data flow of the message‐passing programs, by considering their sequential and parallel aspects. The criteria provide a coverage measure that can be used for evaluating the progress of the testing activity and also provide guidelines for the generation of test data. We also describe a tool, called ValiPar, which supports the application of the proposed testing criteria. Currently, ValiPar is configured for parallel virtual machine (PVM) and message‐passing interface (MPI). Results of the application of the proposed criteria to MPI programs are also presented and analyzed. Copyright © 2008 John Wiley & Sons, Ltd. 相似文献

4.

Parallel versions of Stone's strongly implicit algorithm

J. S. Reeve A. D. Scurr J. H. Merlin 《Concurrency and Computation》2001,13(12):1049-1062

In this paper, we describe various methods of deriving a parallel version of Stone's Strongly Implicit Procedure (SIP) for solving sparse linear equations arising from finite difference approximation to partial differential equations (PDEs). Sequential versions of this algorithm have been very successful in solving semi‐conductor, heat conduction and flow simulation problems and an efficient parallel version would enable much larger simulations to be run. An initial investigation of various parallelizing strategies was undertaken using a version of high performance Fortran (HPF) and the best methods were reprogrammed using the MPI message passing libraries for increased efficiency. Early attempts concentrated on developing a parallel version of the characteristic wavefront computation pattern of the existing sequential SIP code. However, a red‐black ordering of grid points, similar to that used in parallel versions of the Gauss–Seidel algorithm, is shown to be far more efficient. The results of both the wavefront and red‐black MPI based algorithms are reported for various size problems and number of processors on a sixteen node IBM SP2. Copyright © 2001 John Wiley & Sons, Ltd. 相似文献

5.

A scalable HPF implementation of a finite‐volume computational electromagnetics application on a CRAY T3E parallel system

Yi Pan Joseph J. S. Shang Minyi Guo 《Concurrency and Computation》2003,15(6):607-621

The time‐dependent Maxwell equations are one of the most important approaches to describing dynamic or wide‐band frequency electromagnetic phenomena. A sequential finite‐volume, characteristic‐based procedure for solving the time‐dependent, three‐dimensional Maxwell equations has been successfully implemented in Fortran before. Due to its need for a large memory space and high demand on CPU time, it is impossible to test the code for a large array. Hence, it is essential to implement the code on a parallel computing system. In this paper, we discuss an efficient and scalable parallelization of the sequential Fortran time‐dependent Maxwell equations solver using High Performance Fortran (HPF). The background to the project, the theory behind the efficiency being achieved, the parallelization methodologies employed and the experimental results obtained on the Cray T3E massively parallel computing system will be described in detail. Experimental runs show that the execution time is reduced drastically through parallel computing. The code is scalable up to 98 processors on the Cray T3E and has a performance similar to that of an MPI implementation. Based on the experimentation carried out in this research, we believe that a high‐level parallel programming language such as HPF is a fast, viable and economical approach to parallelizing many existing sequential codes which exhibit a lot of parallelism. Copyright © 2003 John Wiley & Sons, Ltd. 相似文献

6.

A comparative study of Java and C performance in two large‐scale parallel applications

Aamir Shafi Bryan Carpenter Mark Baker Aftab Hussain 《Concurrency and Computation》2009,21(15):1882-1906

In the 1990s the Message Passing Interface Forum defined MPI bindings for Fortran, C, and C++. With the success of MPI these relatively conservative languages have continued to dominate in the parallel computing community. There are compelling arguments in favour of more modern languages like Java. These include portability, better runtime error checking, modularity, and multi‐threading. But these arguments have not converted many HPC programmers, perhaps due to the scarcity of full‐scale scientific Java codes, and the lack of evidence for performance competitive with C or Fortran. This paper tries to redress this situation by porting two scientific applications to Java. Both of these applications are parallelized using our thread‐safe Java messaging system—MPJ Express. The first application is the Gadget‐2 code, which is a massively parallel structure formation code for cosmological simulations. The second application uses the finite‐domain time‐difference method for simulations in the area of computational electromagnetics. We evaluate and compare the performance of the Java and C versions of these two scientific applications, and demonstrate that the Java codes can achieve performance comparable with legacy applications written in conventional HPC languages. Copyright © 2009 John Wiley & Sons, Ltd. 相似文献

7.

Optimization and Performance of a Fortran 90 MPI-Based Unstructured Code on Large-Scale Parallel Systems

Shires Dale Mohan Ram 《The Journal of supercomputing》2003,25(2):131-141

The message-passing interface (MPI) has become the standard in achieving effective results when using the message passing paradigm of parallelization. Codes written using MPI are extremely portable and are applicable to both clusters and massively parallel computing platforms. Since MPI uses the single program, multiple data (SPMD) approach to parallelism, good performance requires careful tuning of the serial code as well as careful data and control flow analysis to limit communication. We discuss optimization strategies used and their degree of success to increase performance of an MPI-based unstructured finite element simulation code written in Fortran 90. We discuss performance results based on implementations using several modern massively parallel computing platforms including the SGI Origin 3800, IBM Nighthawk 2 SMP, and Cray T3E-1200. 相似文献

8.

Redistribution of block-cyclic data distributions using MPI

D.W. Walker S.W. Otto 《Concurrency and Computation》1996,8(9):707-728

Arrays that are distributed in a block-cyclic fashion are important for many applications in the computational sciences since they often lead to parallel algorithms with good load balancing properties. We consider the problem of redistributing such an array to a new block size. This operation is directly expressible in High Performance Fortran (HPF) and will arise in applications written in this language. Efficient message passing algorithms are given for the redistribution operation, expressed in the standardized message passing interface, MPI. The algorithms are analyzed and performance results from the IBM SP-1 and Intel Paragon are given and discussed. The results show that redistribution can be done in time comparable to other collective communication operations, such as broadcast and MPI_ALLTOALL. 相似文献

9.

Asynchronous parallel simulation of parallel programs

Prakash S. Deelman E. Bagrodia R. 《IEEE transactions on pattern analysis and machine intelligence》2000,26(5):385-400

Parallel simulation of parallel programs for large datasets has been shown to offer significant reduction in the execution time of many discrete event models. The paper describes the design and implementation of MPI-SIM, a library for the execution driven parallel simulation of task and data parallel programs. MPI-SIM can be used to predict the performance of existing programs written using MPI for message passing, or written in UC, a data parallel language, compiled to use message passing. The simulation models can be executed sequentially or in parallel. Parallel execution of the models are synchronized using a set of asynchronous conservative protocols. The paper demonstrates how protocol performance is improved by the use of application-level, runtime analysis. The analysis targets the communication patterns of the application. We show the application-level analysis for message passing and data parallel languages. We present the validation and performance results for the simulator for a set of applications that include the NAS Parallel Benchmark suite. The application-level optimization described in the paper yielded significant performance improvements in the simulation of parallel programs, and in some cases completely eliminated the synchronizations in the parallel execution of the simulation model 相似文献

10.

Factors affecting the performance of parallel mining of minimal unique itemsets on diverse architectures

D. J. Haglin K. R. Mayes A. M. Manning J. Feo J. R. Gurd M. Elliot J. A. Keane 《Concurrency and Computation》2009,21(9):1131-1158

Three parallel implementations of a divide‐and‐conquer search algorithm (called SUDA2) for finding minimal unique itemsets (MUIs) are compared in this paper. The identification of MUIs is used by national statistics agencies for statistical disclosure assessment. The first parallel implementation adapts SUDA2 to a symmetric multi‐processor cluster using the message passing interface (MPI), which we call an MPI cluster; the second optimizes the code for the Cray MTA2 (a shared‐memory, multi‐threaded architecture) and the third uses a heterogeneous ‘group’ of workstations connected by LAN. Each implementation considers the parallel structure of SUDA2, and how the subsearch computation times and sequence of subsearches affect load balancing. All three approaches scale with the number of processors, enabling SUDA2 to handle larger problems than before. For example, the MPI implementation is able to achieve nearly two orders of magnitude improvement with 132 processors. Performance results are given for a number of data sets. Copyright © 2009 John Wiley & Sons, Ltd. 相似文献

11.

Scalability and performance of OpenMP and MPI on a 128‐processor SGI Origin 2000

Glenn R. Luecke Wei‐Hua Lin 《Concurrency and Computation》2001,13(10):905-928

The purpose of this paper is to investigate the scalability and performance of seven, simple OpenMP test programs and to compare their performance with equivalent MPI programs on an SGI Origin 2000. Data distribution directives were used to make sure that the OpenMP implementation had the same data distribution as the MPI implementation. For the matrix‐times‐vector (test 5) and the matrix‐times‐matrix (test 7) tests, the syntax allowed in OpenMP 1.1 does not allow OpenMP compilers to be able to generate efficient code since the reduction clause is not currently allowed for arrays. (This problem is corrected in OpenMP 2.0.) For the remaining five tests, the OpenMP version performed and scaled significantly better than the corresponding MPI implementation, except for the right shift test (test 2) for a small message. Copyright © 2001 John Wiley & Sons, Ltd. 相似文献

12.

PC机群上JIAJIA与MPI的比较 总被引：3，自引：2，他引：3

下载免费PDF全文

胡明昌史岗胡伟武唐志敏张福新《软件学报》2003,14(7):1187-1194

对JIAJIA和MPI (message passing interface)是进行了比较.JIAJIA和MPI分别代表共享存储和消息传递的编程模式.MPI显式进行数据传输,编程复杂;JIAJIA由底层维护数据一致性,并附加提供简单的消息传递函数,编程容易、灵活.JIAJIA分配共享内存时开销较大,初始化时间比MPI长.提出了一个关于并行加速比与进程数目之间关系的近似经验公式,推出JIAJIA和MPI性能差距随着进程数目的增多而增大的结论.测试结果表明,大部分应用程序的JIAJIA和MPI版本的并行性能差距不超过10%.对于通信量很小的应用程序,其JIAJIA和MPI的性能差距较小,而通信量本身较大的应用程序,其JIAJIA和MPI的性能差距主要取决于运行时产生的实际通信量. 相似文献

13.

Estimation of dynamical characteristics of a parallel program on a model

V. P. Ivannikov S. S. Gaisaryan A. I. Avetisyan V. A. Padaryan 《Programming and Computer Software》2006,32(4):203-214

This paper considers a model of a parallel program that can efficiently be interpreted on an instrumental computer, providing a means for a sufficiently accurate prediction of the actual time needed for execution of the parallel program on a given parallel computational system. The model is designed for parallel programs with explicit message passing written in Java with calls to the MPI library and is a part of the ParJava environment. The model is obtained by transforming the program control tree, which, for Java programs, can be constructed by modifying the abstract syntax tree. The communication functions are simulated on the basis of the LogGP model, which makes it possible to take into account specific features of the distributed computational system. 相似文献

14.

Jcluster: an efficient Java parallel environment on a large‐scale heterogeneous cluster

Bao‐Yin Zhang Guang‐Wen Yang Wei‐Min Zheng 《Concurrency and Computation》2006,18(12):1541-1557

In this paper, we present Jcluster, an efficient Java parallel environment that provides some critical services, in particular automatic load balancing and high‐performance communication, for developing parallel applications in Java on a large‐scale heterogeneous cluster. In the Jcluster environment, we implement a task scheduler based on a transitive random stealing (TRS) algorithm. Performance evaluations show that the scheduler based on TRS can make any idle node obtain a task from another node with much fewer stealing times than random stealing (RS), which is a well‐known dynamic load‐balancing algorithm, on a large‐scale cluster. In the performance aspects of communication, with the method of asynchronously multithreaded transmission, we implement a high‐performance PVM‐like and MPI‐like message‐passing interface in pure Java. The evaluation of the communication performance is conducted among the Jcluster environment, LAM‐MPI and mpiJava on LAM‐MPI based on the Java Grande Forum's pingpong benchmark. Copyright © 2005 John Wiley & Sons, Ltd. 相似文献

15.

Using Java and JavaScript in the Virtual Programming Laboratory: a Web-based parallel programming environment

Kivanc Dincer Geoffrey C. Fox 《Concurrency and Computation》1997,9(6):485-508

The Virtual Programming Laboratory (VPL) is a Web-based virtual programming environment built based on a client–server architecture. The system can be accessed on any platform (Unix, PC or Mac) using a standard Java-enabled browser. Software delivery over the Web imposes a novel set of constraints on design. We outline the tradeoffs in this design space, motivate the choices necessary to deliver an application, and detail the lessons learned in the process. We discuss the role of Java and other Web technologies in the realization of the design. VPL facilitates the development and execution of parallel programs. The initial prototype supports high-level parallel programming based on Fortran 90 and High Performance Fortran (HPF), as well as explicit low-level programming with the MPI message-passing interface. Supplementary Java-based platform-independent tools for data and performance visualization are an integral part of the VPL. Pablo SDDF trace files generated by the Pablo performance instrumentation system are used for post-mortem performance visualization. © 1997 John Wiley & Sons, Ltd. 相似文献

16.

多计算机间的文件并行传输服务:MPI-TFTS

万本庭钟元生陈明《计算机工程与设计》2006,27(22):4207-4209,4270

在异构环境中,计算机间快速、安全、可靠的文件传输是非常必要,然而很多基于MPI的分布式并行开发环境没有提供计算机间的文件并行传输服务。提出了一种基于MPI的支持多种传输工具的文件并行传输服务：MPI-TFTS（message application interface tools filetransfer service）,MPI—TFTS采用三层体系结构,集成多种传输工具,使得其可以用MPI通信原语,现有的传输工具进行文件并行传输,在环境条件确定的情况下,用户可以选择更好的传输工具来减少文件传输时间,最后给出了两种文件传输服务的并行实施。相似文献

17.

Lightweight monitoring of MPI programs in real time

German Florez Zhen Liu Susan M. Bridges Anthony Skjellum Rayford B. Vaughn 《Concurrency and Computation》2005,17(13):1547-1578

Current technologies allow efficient data collection by several sensors to determine an overall evaluation of the status of a cluster. However, no previous work of which we are aware analyzes the behavior of the parallel programs themselves in real time. In this paper, we perform a comparison of different artificial intelligence techniques that can be used to implement a lightweight monitoring and analysis system for parallel applications on a cluster of Linux workstations. We study the accuracy and performance of deterministic and stochastic algorithms when we observe the flow of both library‐function and operating‐system calls of parallel programs written with C and MPI. We demonstrate that monitoring of MPI programs can be achieved with high accuracy and in some cases with a false‐positive rate near 0% in real time, and we show that the added computational load on each node is small. As an example, the monitoring of function calls using a hidden Markov model generates less than 5% overhead. The proposed system is able to automatically detect deviations of a process from its expected behavior in any node of the cluster, and thus it can be used as an anomaly detector, for performance monitoring to complement other systems or as a debugging tool. Copyright © 2005 John Wiley & Sons, Ltd. 相似文献

18.

曙光机群系统并行调试器的设计与实现

陈勇何克东陆在朝鄢超《计算机工程》2004,30(9):50-52

介绍了为曙光机群系统设计实现的并行调试器DCDB。DCDB同时支持调试MPI或PVM、C或Fortran的并行应用程序，实现了记录／重放并行调试功能，支持循环调试，解决了并行调试时并行程序的不确定性问题。DCDB采用Client／server／Client结构，具有友好的图形用户界面，系统主要采用Java语言开发，具有良好的可移植性和可扩展性。相似文献

19.

Coordinating Heterogeneous Parallel Systems with Skeletons and Activity Graphs

Murray Cole Andrea Zavanella 《Journal of Systems Integration》2001,10(2):127-143

Large scale parallel programming projects may become heterogeneous in bothlanguage and architectural model. We propose that skeletal programmingtechniques can alleviate some of the costs involved in designing and portingsuch programs, illustrating our approach with a simple program which combinesshared memory and message passing code. We introduce Activity Graphs as asimple and practical means of capturing model independent aspects of theoperational semantics of skeletal parallel programs. They are independent oflow level details of parallel implementation and so can act as an intermediatelayer for compilation to diverse underlying models. Activity graphs providea notion of parallel activities, dependencies between activities, and theprocess groupings within which these take place. The compilation processuses a set of graph generators (templates) to derive the activity graph. Wedescribe simple schemes for transforming activity graphs into message passingprograms, targeting both MPI and BSP. 相似文献

20.

Parallel program debugging by specification

Simon Huband Chris McDonald 《Concurrency and Computation》2004,16(6):551-585

Most message passing parallel programs employ logical process topologies with regular characteristics to support their computation. Since process topologies define the relationship between processes, they present an excellent opportunity for debugging. The primary benefit is that process behaviours can be correlated, allowing expected behaviour to be abstracted and identified, and undesirable behaviour reported. However, topology support is inadequate in most message passing parallel programming environments, including the popular Message Passing Interface (MPI) and the Parallel Virtual Machine (PVM). Programmers are forced to implement topology support themselves, increasing the possibility of introducing errors. This paper proposes a trace‐ and topology‐based approach to parallel program debugging, driven by four distinct types of specifications. Trace specifications allow trace data from a variety of sources and message passing libraries to be interpreted in an abstract manner, and topology specifications address the lack of explicit topology knowledge, whilst also facilitating the construction of user‐consistent views of the debugging activity. Loop specifications express topology‐consistent patterns of expected trace events, allowing conformance testing of associated trace data, and error specifications specify undesirable event interactions, including mismatched message sizes and mismatched communication pairs. Both loop and error specifications are simplified by having knowledge of the actual topologies being debugged. The proposed debugging framework enables a wealth of potential debugging views and techniques. Copyright © 2004 John Wiley & Sons, Ltd. 相似文献