期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Performance metrics in a hybrid MPI–OpenMP based molecular dynamics simulation with short-range interactions

Anirban Pal Abhishek Agarwala Soumyendu Raha Baidurya Bhattacharya 《Journal of Parallel and Distributed Computing》2014

We discuss the computational bottlenecks in molecular dynamics (MD) and describe the challenges in parallelizing the computation-intensive tasks. We present a hybrid algorithm using MPI (Message Passing Interface) with OpenMP threads for parallelizing a generalized MD computation scheme for systems with short range interatomic interactions. The algorithm is discussed in the context of nano-indentation of Chromium films with carbon indenters using the Embedded Atom Method potential for Cr–Cr interaction and the Morse potential for Cr–C interactions. We study the performance of our algorithm for a range of MPI–thread combinations and find the performance to depend strongly on the computational task and load sharing in the multi-core processor. The algorithm scaled poorly with MPI and our hybrid schemes were observed to outperform the pure message passing scheme, despite utilizing the same number of processors or cores in the cluster. Speed-up achieved by our algorithm compared favorably with that achieved by standard MD packages. 相似文献

2.

Deadlock detection in MPI programs

Glenn R. Luecke Yan Zou James Coyle Jim Hoekstra Marina Kraeva 《Concurrency and Computation》2002,14(11):911-932

The Message‐Passing Interface (MPI) is commonly used to write parallel programs for distributed memory parallel computers. MPI‐CHECK is a tool developed to aid in the debugging of MPI programs that are written in free or fixed format Fortran 90 and Fortran 77. This paper presents the methods used in MPI‐CHECK 2.0 to detect many situations where actual and potential deadlocks occur when using blocking and non‐blocking point‐to‐point routines as well as when using collective routines. Copyright © 2002 John Wiley & Sons, Ltd. 相似文献

3.

基于MPI的背包问题并行程序设计与实现

张居晓《计算机光盘软件与应用》2011,(21)

MPI（Message Passing Interface）是消息传递并行程序设计的标准之一,概述了MPI的概念和组成,着重介绍了支持并行程序设计的消息传递接口（MPI）以及在MPI环境下的并行程序设计方法,并给出一个MPI并行程序设计实例,说明了MPI的程序设计流程和普通串行程序设计之间的关联。相似文献

4.

Nested parallelism for multi-core HPC systems using Java

Aamir Shafi Bryan Carpenter Mark Baker 《Journal of Parallel and Distributed Computing》2009

Since its introduction in 1993, the Message Passing Interface (MPI) has become a de facto standard for writing High Performance Computing (HPC) applications on clusters and Massively Parallel Processors (MPPs). The recent emergence of multi-core processor systems presents a new challenge for established parallel programming paradigms, including those based on MPI. This paper presents a new Java messaging system called MPJ Express. Using this system, we exploit multiple levels of parallelism–messaging and threading–to improve application performance on multi-core processors. We refer to our approach as nested parallelism. This MPI-like Java library can support nested parallelism by using Java or Java OpenMP (JOMP) threads within an MPJ Express process. Practicality of this approach is assessed by porting to Java a massively parallel structure formation code from Cosmology called Gadget-2. We introduce nested parallelism in the Java version of the simulation code and report good speed-ups. To the best of our knowledge it is the first time this kind of hybrid parallelism is demonstrated in a high performance Java application. 相似文献

5.

MPI for Python

《Journal of Parallel and Distributed Computing》2005,65(9):1108-1115

MPI for Python provides bindings of the Message Passing Interface (MPI) standard for the Python programming language and allows any Python program to exploit multiple processors. This package is constructed on top of the MPI-1 specification and defines an object-oriented interface which closely follows MPI-2 C++ bindings. It supports point-to-point (sends, receives) and collective (broadcasts, scatters, gathers) communications of general Python objects. Efficiency has been tested in a Beowulf class cluster and satisfying results were obtained. MPI for Python is open source and available for download on the web (http://www.cimec.org.ar/python). 相似文献

6.

QSATS: MPI-driven quantum simulations of atomic solids at zero temperature

Robert J. Hinde 《Computer Physics Communications》2011,182(11):2339-2349

We describe QSATS, a parallel code for performing variational path integral simulations of the quantum mechanical ground state of monatomic solids. QSATS is designed to treat Boltzmann quantum solids, in which individual atoms are permanently associated with distinguishable crystal lattice sites and undergo large-amplitude zero-point motions around these sites. We demonstrate the capabilities of QSATS by using it to compute the total energy and potential energy of hexagonal close packed solid ⁴He at the density .

Program summary

Program title:QSATSCatalogue identifier: AEJE_v1_0Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEJE_v1_0.htmlProgram obtainable from: CPC Program Library, Queen?s University, Belfast, N. IrelandLicensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.htmlNo. of lines in distributed program, including test data, etc.: 7329No. of bytes in distributed program, including test data, etc.: 61 685Distribution format: tar.gzProgramming language: Fortran 77.Computer: QSATS should execute on any distributed parallel computing system that has the Message Passing Interface (MPI) [1] libraries installed.Operating system: Unix or Linux.Has the code been vectorized or parallelized?: Yes, parallelized using MPI [1].RAM: The memory requirements of QSATS depend on both the number of atoms in the crystal and the number of replicas in the variational path integral chain. For parameter sets A and C (described in the long write-up), approximately 4.5 Mbytes and 12 Mbytes, respectively, are required for data storage by QSATS (exclusive of the executable code).Classification: 7.7, 16.13.External routines: Message Passing Interface (MPI) [1]Nature of problem: QSATS simulates the quantum mechanical ground state for a monatomic crystal characterized by large-amplitude zero-point motions of individual (distinguishable) atoms around their nominal lattice sites.Solution method: QSATS employs variational path integral quantum Monte Carlo techniques to project the system?s ground state wave function out of a suitably-chosen trial wave function.Restrictions: QSATS neglects quantum statistical effects associated with the exchange of identical particles. As distributed, QSATS assumes that the potential energy function for the crystal is a pairwise additive sum of atom–atom interactions.Additional comments: An auxiliary program, ELOC, is provided that uses the output generated by QSATS to compute both the crystal?s ground state energy and the expectation value of the crystal?s potential energy. End users can modify ELOC as needed to compute the expectation value of other coordinate-space observables.Running time: QSATS requires roughly 3 hours to run a simulation using parameter set A on a cluster of 12 Xeon processors with clock speed 2.8 GHz. Roughly 15 hours are needed to run a simulation using parameter set C on the same cluster.References:

[1]
For information about MPI, visit http://www.mcs.anl.gov/mpi/.

相似文献

7.

实时消息传递界面

郭东亮张立臣《微机发展》2004,14(10):31-33,36

实时系统消息传递界面(MPI/RT)是高性能计算中消息传递界面(MPI)在实时方面的扩展，可支持实时通信和实时系统的开发。文中介绍了MPI/RT的背景、开发的基本原理、重要技术和重要概念。描述了MPI/RT如何对双边、单边、零边3种类型的通信和时间驱动、事件驱动、优先级驱动3种实时典范以及这些典范的组合的支持。讨论了MPI/RT如何描述和满足高性能的实时系统的QoS要求。最后给出了一个用MPI/RT开发高性能的实时系统的一般过程。相似文献

8.

A comparative study of Java and C performance in two large‐scale parallel applications

Aamir Shafi Bryan Carpenter Mark Baker Aftab Hussain 《Concurrency and Computation》2009,21(15):1882-1906

In the 1990s the Message Passing Interface Forum defined MPI bindings for Fortran, C, and C++. With the success of MPI these relatively conservative languages have continued to dominate in the parallel computing community. There are compelling arguments in favour of more modern languages like Java. These include portability, better runtime error checking, modularity, and multi‐threading. But these arguments have not converted many HPC programmers, perhaps due to the scarcity of full‐scale scientific Java codes, and the lack of evidence for performance competitive with C or Fortran. This paper tries to redress this situation by porting two scientific applications to Java. Both of these applications are parallelized using our thread‐safe Java messaging system—MPJ Express. The first application is the Gadget‐2 code, which is a massively parallel structure formation code for cosmological simulations. The second application uses the finite‐domain time‐difference method for simulations in the area of computational electromagnetics. We evaluate and compare the performance of the Java and C versions of these two scientific applications, and demonstrate that the Java codes can achieve performance comparable with legacy applications written in conventional HPC languages. Copyright © 2009 John Wiley & Sons, Ltd. 相似文献

9.

Object-oriented analysis and design of the Message Passing Interface

Anthony Skjellum Diane G. Wooley Ziyang Lu Michael Wolf Purushotham V. Bangalore Andrew Lumsdaine Jeffrey M. Squyres Brian McCandless 《Concurrency and Computation》2001,13(4):245-292

The major contribution of this paper is the application of modern analysis techniques to the important Message Passing Interface standard, work done in order to obtain information useful in designing both application programmer interfaces for object-oriented languages, and message passing systems. Recognition of ‘Design Patterns’ within MPI is an important discernment of this work. A further contribution is a comparative discussion of the design and evolution of three actual object-oriented designs for the Message Passing Interface ( MPI-1SF ) application programmer interface (API), two of which have influenced the standardization of C++ explicit parallel programming with MPI-2, and which strongly indicate the value of a priori object-oriented design and analysis of such APIs. Knowledge of design patterns is assumed herein. Discussion provided here includes systems developed at Mississippi State University (MPI++), the University of Notre Dame (OOMPI), and the merger of these systems that results in a standard binding within the MPI-2 standard. Commentary concerning additional opportunities for further object-oriented analysis and design of message passing systems and APIs, such as MPI-2 and MPI/RT, are mentioned in conclusion. Connection of modern software design and engineering principles to high performance computing programming approaches is a new and important further contribution of this work. Copyright © 2001 John Wiley & Sons, Ltd. 相似文献

10.

A parallel Beowulf-based system for the detection of gravitational waves in interferometric detectors

P. Amico L. Bosi C. Cattuto L. Gammaitoni F. Marchesoni M. Punturo F. Travasso H. Vocca 《Computer Physics Communications》2003,153(2):179-189

The detection, in a modern interferometric detector like Virgo, of a gravitational wave signal from a coalescing binary stellar system is an intensive computational task both for the on-line and off-line computer systems. A parallel computing scheme using the Message Passing Interface (MPI) is described. Performance results on a small scale cluster are reported. 相似文献

11.

Parallel Implementation of 2-D Telegraphic Equation on MPI/PVM Cluster

Simon?Uzezi?Ewedafe Email author Rio?Hirowati?Shariffudin 《International journal of parallel programming》2011,39(2):202-231

In this paper, a parallel implementation of the Iterative Alternating Direction Explicit method by D’Yakonov (IADE-DY) to solve 2-D telegraphic problem on a distributed system using Message Passing Interface (MPI) and Parallel Virtue Machine (PVM) are presented. The parallelization of the program is implemented by a domain decomposition strategy. A Single Program Multiple Data (SPMD) model is employed for the implementation. The implementation is discussed in relation to means of the parallel performance strategies and analysis. The model enhances overlap communication and computation to avoid unnecessary synchronization, hence, the method yields significant speedup. The level of speedup observed from tables as the mesh increases are in the range of 5–10%. Improvement has been achieved by numbers of tables and figures in our experiment. We present some analyses that are helpful for speedup and efficiency. It is concluded that the efficiency is strongly dependent on the grid size, block numbers and the number of processors for both MPI and PVM. Different strategies to improve the computational efficiency are proposed. 相似文献

12.

Simulation of flowfields induced by wind blades based on a parallelized low-speed flow solver

Yang-Yao Niu Han-Wei Tang Lung-Cheng Lee T.I. Tseng 《Computers & Fluids》2011,45(1):249-253

In this study, a parallel computing technology is applied on the simulation of a wind turbine flow problem. A third-order Roe type flux limited splitting based on a pre-conditioning matrix with an explicit time marching method is used to solve the Navier–Stokes equations. The original FORTRAN code was parallelized with Message Passing Interface (MPI) language and tested on a 64-CPU IBM SP2 parallel computer. The test results show that a significant reduction of computing time in running the model and a super-linear speed up rate is achieved up to 32 CPUs at IBM SP2 processors. The speed up rate is as high as 49 for using IBM SP2 64 processors. The test shows very promising potential of parallel processing to provide prompt simulation of the current wind turbine problems. 相似文献

13.

基于MPI／RT的应用研究

下载免费PDF全文

鲁宏伟耿彦武浩《计算机工程与科学》2005,27(6):86-88

本文阐述了实时消息传递接口(MPI／RT)标准的相关内容。实时消息传递接口是一个通信层的中间件标准，此标准的主要目标是对高性能网络上的数据传输提供服务质量QoS的保证。相似文献

14.

MPI集群通信技术浅析

CHEN Yan HAO Li-rui 《数字社区&智能家居》2008,(23)

简要介绍了集群系统,指出其用于并行计算的工作原理,重点介绍MPI并行环境及其通信技术,并分析了MPI并行程序中的基本模式及其采用的通信技术。最后对构建MPI并行环境的集群系统进行了展望。相似文献

15.

The MOLDY short-range molecular dynamics package

G.J. Ackland K. D?Mellow S.L. Daraszewicz D.J. Hepburn M. Uhrin K. Stratford 《Computer Physics Communications》2011,182(12):2587-2604

We describe a parallelised version of the MOLDY molecular dynamics program. This Fortran code is aimed at systems which may be described by short-range potentials and specifically those which may be addressed with the embedded atom method. This includes a wide range of transition metals and alloys. MOLDY provides a range of options in terms of the molecular dynamics ensemble used and the boundary conditions which may be applied. A number of standard potentials are provided, and the modular structure of the code allows new potentials to be added easily. The code is parallelised using OpenMP and can therefore be run on shared memory systems, including modern multicore processors. Particular attention is paid to the updates required in the main force loop, where synchronisation is often required in OpenMP implementations of molecular dynamics. We examine the performance of the parallel code in detail and give some examples of applications to realistic problems, including the dynamic compression of copper and carbon migration in an iron–carbon alloy.

Program summary

Program title: MOLDYCatalogue identifier: AEJU_v1_0Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEJU_v1_0.htmlProgram obtainable from: CPC Program Library, Queen?s University, Belfast, N. IrelandLicensing provisions: GNU General Public License version 2No. of lines in distributed program, including test data, etc.: 382 881No. of bytes in distributed program, including test data, etc.: 6 705 242Distribution format: tar.gzProgramming language: Fortran 95/OpenMPComputer: AnyOperating system: AnyHas the code been vectorised or parallelized?: Yes. OpenMP is required for parallel executionRAM: 100 MB or moreClassification: 7.7Nature of problem: Moldy addresses the problem of many atoms (of order 10⁶) interacting via a classical interatomic potential on a timescale of microseconds. It is designed for problems where statistics must be gathered over a number of equivalent runs, such as measuring thermodynamic properities, diffusion, radiation damage, fracture, twinning deformation, nucleation and growth of phase transitions, sputtering etc. In the vast majority of materials, the interactions are non-pairwise, and the code must be able to deal with many-body forces.Solution method: Molecular dynamics involves integrating Newton?s equations of motion. MOLDY uses verlet (for good energy conservation) or predictor–corrector (for accurate trajectories) algorithms. It is parallelised using open MP. It also includes a static minimisation routine to find the lowest energy structure. Boundary conditions for surfaces, clusters, grain boundaries, thermostat (Nose), barostat (Parrinello–Rahman), and externally applied strain are provided. The initial configuration can be either a repeated unit cell or have all atoms given explictly. Initial velocities are generated internally, but it is also possible to specify the velocity of a particular atom. A wide range of interatomic force models are implemented, including embedded atom, Morse or Lennard-Jones. Thus the program is especially well suited to calculations of metals.Restrictions: The code is designed for short-ranged potentials, and there is no Ewald sum. Thus for long range interactions where all particles interact with all others, the order-N scaling will fail. Different interatomic potential forms require recompilation of the code.Additional comments: There is a set of associated open-source analysis software for postprocessing and visualisation. This includes local crystal structure recognition and identification of topological defects.Running time: A set of test modules for running time are provided. The code scales as order N. The parallelisation shows near-linear scaling with number of processors in a shared memory environment. A typical run of a few tens of nanometers for a few nanoseconds will run on a timescale of days on a multiprocessor desktop. 相似文献

16.

基于MPI的并行计算集群通信及应用 总被引：4，自引：0，他引：4

罗省贤李录明《计算机应用》2003,23(6):51-53

对能有效解大型稀疏矩阵方程的LSQR串行算法进行了并行化分析，并应用可移植消息传递标准MPI的集群通信机制在分布式存储并行系统上设计和实现了LSQR并行算法，该并行算法和程序在地震表层模型层析反演中得到了有效的应用。相似文献

17.

Parallel Implementation of a Low Order Algorithm for Dynamics of Multibody Systems on a Distributed Memory Computing System

S. Duan K.S. Anderson 《Engineering with Computers》2000,16(2):96-108

In this paper, a new hybrid parallelisable low order algorithm, developed by the authors for multibody dynamics analysis, is implemented numerically on a distributed memory parallel computing system. The presented implementation can currently accommodate the general spatial motion of chain systems, but key issues for its extension to general tree and closed loop systems are discussed. Explicit algebraic constraints are used to increase coarse grain parallelism, and to study the influence of the dimension of system constraint load equations on the computational efficiency of the algorithm for real parallel implementation using the Message Passing Interface (MPI). The equation formulation parallelism and linear system solution strategies which are used to reduce communication overhead are addressed. Numerical results indicate that the algorithm is scalable, that significant speed-up can be obtained, and that a quasi-logarithmic relation exists between time needed for a function call and numbers of processors used. This result agrees well with theoretical performance predictions. Numerical comparisons with results obtained from independently developed analysis codes have validated the correctness of the new hybrid parallelisable low order algorithm, and demonstrated certain computational advantages. 相似文献

18.

A New Parallel Skeleton for General Accumulative Computations

Hideya Iwasaki Zhenjiang Hu 《International journal of parallel programming》2004,32(5):389-414

Skeletal parallel programming enables programmers to build a parallel program from ready-made components (parallel primitives) for which efficient implementations are known to exist, making both the parallel program development and the parallelization process easier. Constructing efficient parallel programs is often difficult, however, due to difficulties in selecting a proper combination of parallel primitives and in implementing this combination without having unnecessary creations and exchanges of data among parallel primitives and processors. To overcome these difficulties, we propose a powerful and general parallel skeleton, accumulate, which can be used to naturally code efficient solutions to problems as well as be efficiently implemented in parallel using Message Passing Interface (MPI). 相似文献

19.

Distributed computing on cluster systems

《国际计算机数学杂志》2012,89(3):383-397

Message Passing Interface (MPI) allows a group of computers in a network to be specified as a cluster system. It provides the routines for task activation and communication. Writing programs for a cluster system is a difficult job. In this paper the Message-passing Interface is presented. Parallel programs using the WMPI, a version of MPI, to solve the pi(π) calculation the quick sort algorithm and the Torsion problem are presented. The programs are written and compiled in Microsoft Visual C+ +. 相似文献

20.

网络环境中MPI和PVM的分析与比较 总被引：2，自引：0，他引：2

赵晨李仕锋许小刚王萃寒《计算机工程与应用》2003,39(3):181-183

消息传递接口(MessagePassingInterface,MPI)和并行虚拟机(ParallelVirtualMachine,PVM)是两种广泛应用的网络分布式并行计算环境。论文首先介绍了消息传递接口和并行虚拟机两者各自的起源和特点,然后在此基础上从可移植性、点对点通信、集体通信、资源管理和容错能力等多方面分析和比较了这两者的功能特点。相似文献