共查询到20条相似文献,搜索用时 0 毫秒
1.
The use of coprocessors or accelerators such as graphics processing units (GPUs) has become popular in scientific computing applications due to their low cost, impressive floating-point capabilities, high memory bandwidth, and low electrical power requirements. Hybrid high-performance computers, defined as machines with nodes containing more than one type of floating-point processor (e.g. CPU and GPU), are now becoming more prevalent due to these advantages. Although there has been extensive research into methods to use accelerators efficiently to improve the performance of molecular dynamics (MD) codes employing pairwise potential energy models, little is reported in the literature for models that include many-body effects. 3-body terms are required for many popular potentials such as MEAM, Tersoff, REBO, AIREBO, Stillinger–Weber, Bond-Order Potentials, and others. Because the per-atom simulation times are much higher for models incorporating 3-body terms, there is a clear need for efficient algorithms usable on hybrid high performance computers. Here, we report a shared-memory force-decomposition for 3-body potentials that avoids memory conflicts to allow for a deterministic code with substantial performance improvements on hybrid machines. We describe modifications necessary for use in distributed memory MD codes and show results for the simulation of water with Stillinger–Weber on the hybrid Titan supercomputer. We compare performance of the 3-body model to the SPC/E water model when using accelerators. Finally, we demonstrate that our approach can attain a speedup of 5.1 with acceleration on Titan for production simulations to study water droplet freezing on a surface. 相似文献
2.
D.C. Rapaport 《Computer Physics Communications》2011,182(4):926-934
Design considerations for molecular dynamics algorithms capable of taking advantage of the computational power of a graphics processing unit (GPU) are described. Accommodating the constraints of scalable streaming-multiprocessor hardware necessitates a reformulation of the underlying algorithm. Performance measurements demonstrate the considerable benefit and cost-effectiveness of such an approach, which produces a factor of 2.5 speed improvement over previous work for the case of the soft-sphere potential. 相似文献
3.
4.
W. Michael Brown Peng Wang Steven J. Plimpton Arnold N. Tharrington 《Computer Physics Communications》2011,182(4):898-911
The use of accelerators such as graphics processing units (GPUs) has become popular in scientific computing applications due to their low cost, impressive floating-point capabilities, high memory bandwidth, and low electrical power requirements. Hybrid high-performance computers, machines with more than one type of floating-point processor, are now becoming more prevalent due to these advantages. In this work, we discuss several important issues in porting a large molecular dynamics code for use on parallel hybrid machines – (1) choosing a hybrid parallel decomposition that works on central processing units (CPUs) with distributed memory and accelerator cores with shared memory, (2) minimizing the amount of code that must be ported for efficient acceleration, (3) utilizing the available processing power from both multi-core CPUs and accelerators, and (4) choosing a programming model for acceleration. We present our solution to each of these issues for short-range force calculation in the molecular dynamics package LAMMPS, however, the methods can be applied in many molecular dynamics codes. Specifically, we describe algorithms for efficient short range force calculation on hybrid high-performance machines. We describe an approach for dynamic load balancing of work between CPU and accelerator cores. We describe the Geryon library that allows a single code to compile with both CUDA and OpenCL for use on a variety of accelerators. Finally, we present results on a parallel test cluster containing 32 Fermi GPUs and 180 CPU cores. 相似文献
5.
Trung Dac Nguyen Carolyn L. Phillips Joshua A. Anderson Sharon C. Glotzer 《Computer Physics Communications》2011,(11):2307-2313
Molecular dynamics (MD) methods compute the trajectory of a system of point particles in response to a potential function by numerically integrating Newton?s equations of motion. Extending these basic methods with rigid body constraints enables composite particles with complex shapes such as anisotropic nanoparticles, grains, molecules, and rigid proteins to be modeled. Rigid body constraints are added to the GPU-accelerated MD package, HOOMD-blue, version 0.10.0. The software can now simulate systems of particles, rigid bodies, or mixed systems in microcanonical (NVE), canonical (NVT), and isothermal-isobaric (NPT) ensembles. It can also apply the FIRE energy minimization technique to these systems. In this paper, we detail the massively parallel scheme that implements these algorithms and discuss how our design is tuned for the maximum possible performance. Two different case studies are included to demonstrate the performance attained, patchy spheres and tethered nanorods. In typical cases, HOOMD-blue on a single GTX 480 executes 2.5–3.6 times faster than LAMMPS executing the same simulation on any number of CPU cores in parallel. Simulations with rigid bodies may now be run with larger systems and for longer time scales on a single workstation than was previously even possible on large clusters. 相似文献
6.
I.V. Morozov A.M. Kazennov R.G. Bystryi G.E. Norman V.V. Pisarev V.V. Stegailov 《Computer Physics Communications》2011,(9):1974-1978
We report on simulation technique and benchmarks for molecular dynamics simulations of the relaxation processes in solids and liquids using the graphics processing units (GPUs). The implementation of a many-body potential such as the embedded atom method (EAM) on GPU is discussed. The benchmarks obtained by LAMMPS and HOOMD packages for simple Lennard-Jones liquids and metals using EAM potentials are presented for both Intel CPUs and Nvidia GPUs. As an example the crystallization rate of the supercooled Al melt is computed. 相似文献
7.
Modern graphics processing units (GPUs) provide impressive computing resources, which can be accessed conveniently through the CUDA programming interface. We describe how GPUs can be used to considerably speed up molecular dynamics (MD) simulations for system sizes ranging up to about 1 million particles. Particular emphasis is put on the numerical long-time stability in terms of energy and momentum conservation, and caveats on limited floating-point precision are issued. Strict energy conservation over 108 MD steps is obtained by double-single emulation of the floating-point arithmetic in accuracy-critical parts of the algorithm. For the slow dynamics of a supercooled binary Lennard-Jones mixture, we demonstrate that the use of single-floating point precision may result in quantitatively and even physically wrong results. For simulations of a Lennard-Jones fluid, the described implementation shows speedup factors of up to 80 compared to a serial implementation for the CPU, and a single GPU was found to compare with a parallelised MD simulation using 64 distributed cores. 相似文献
8.
Upakarasamy Lourderaj Rui Sun Swapnil C. Kohale George L. Barnes Wibe A. de Jong Theresa L. Windus William L. Hase 《Computer Physics Communications》2014
The interface for VENUS and NWChem, and the resulting software package for direct dynamics simulations are described. The coupling of the two codes is considered to be a tight coupling since the two codes are compiled and linked together and act as one executable with data being passed between the two codes through routine calls. The advantages of this type of coupling are discussed. The interface has been designed to have as little interference as possible with the core codes of both VENUS and NWChem. VENUS is the code that propagates the direct dynamics trajectories and, therefore, is the program that drives the overall execution of VENUS/NWChem. VENUS has remained an essentially sequential code, which uses the highly parallel structure of NWChem. Subroutines of the interface that accomplish the data transmission and communication between the two computer programs are described. Recent examples of the use of VENUS/NWChem for direct dynamics simulations are summarized. 相似文献
9.
Daniel Valdez-Balderas José M. Domínguez Benedict D. Rogers Alejandro J.C. Crespo 《Journal of Parallel and Distributed Computing》2013
Starting from the single graphics processing unit (GPU) version of the Smoothed Particle Hydrodynamics (SPH) code DualSPHysics, a multi-GPU SPH program is developed for free-surface flows. The approach is based on a spatial decomposition technique, whereby different portions (sub-domains) of the physical system under study are assigned to different GPUs. Communication between devices is achieved with the use of Message Passing Interface (MPI) application programming interface (API) routines. The use of the sorting algorithm radix sort for inter-GPU particle migration and sub-domain “halo” building (which enables interaction between SPH particles of different sub-domains) is described in detail. With the resulting scheme it is possible, on the one hand, to carry out simulations that could also be performed on a single GPU, but they can now be performed even faster than on one of these devices alone. On the other hand, accelerated simulations can be performed with up to 32 million particles on the current architecture, which is beyond the limitations of a single GPU due to memory constraints. A study of weak and strong scaling behaviour, speedups and efficiency of the resulting program is presented including an investigation to elucidate the computational bottlenecks. Last, possibilities for reduction of the effects of overhead on computational efficiency in future versions of our scheme are discussed. 相似文献
10.
偏微分方程数值解法(包括有限差分法、有限元法)以及大量的数学物理方程数值解法最终都会演变成求解大型线性方程组。因此,探讨快速、稳定、精确的大型线性方程组解法一直是数值计算领域不断深入研究的课题且具有特别重要的意义。在迭代法中,共轭斜量法(又称共轭梯度法)被公认为最好的方法之一。但是,该方法最大缺点是仅适用于线性方程组系数矩阵为对称正定矩阵的情况,而且常规的CPU算法实现非常耗时。为此,通过将线性方程组系数矩阵作转换成对称矩阵后实施基于GPU-CUDA的快速共轭斜量法来解决一般性大型线性方程组的求解问题。试验结果表明:在求解效率方面,基于GPU-CUDA的共轭斜量法运行效率高,当线性方程组阶数超过3000时,其加速比将超过14;在解的精确性与求解过程的稳定性方面,与高斯列主元消去法相当。基于GPU-CUDA的快速共轭斜量法是求解一般性大型线性方程组快速而非常有效的方法。 相似文献
11.
Dissipative particle dynamics (DPD) simulation is implemented on multiple GPUs by using NVIDIA’s Compute Unified Device Architecture (CUDA) in this paper. Data communication between each GPU is executed based on the POSIX thread. Compared with the single-GPU implementation, this implementation can provide faster computation speed and more storage space to perform simulations on a significant larger system. In benchmark, the performance of GPUs is compared with that of Material Studio running on a single CPU core. We can achieve more than 90x speedup by using three C2050 GPUs to perform simulations on an 80∗80∗80 system. This implementation is applied to the study on the dispersancy of lubricant succinimide dispersants. A series of simulations are performed on lubricant–soot–dispersant systems to study the impact factors including concentration and interaction with lubricant on the dispersancy, and the simulation results are agreed with the study in our present work. 相似文献
12.
Heterogeneous systems with nodes containing more than one type of computation units, e.g., central processing units (CPUs) and graphics processing units (GPUs), are becoming popular because of their low cost and high performance. In this paper, we have developed a Three-Level Parallelization Scheme (TLPS) for molecular dynamics (MD) simulation on heterogeneous systems. The scheme exploits multi-level parallelism combining (1) inter-node parallelism using spatial decomposition via message passing, (2) intra-node parallelism using spatial decomposition via dynamically scheduled multi-threading, and (3) intra-chip parallelism using multi-threading and short vector extension in CPUs, and employing multiple CUDA threads in GPUs. By using a hierarchy of parallelism with optimizations such as communication hiding intra-node, and memory optimizations in both CPUs and GPUs, we have implemented and evaluated a MD simulation on a petascale heterogeneous supercomputer TH-1A. The results show that MD simulations can be efficiently parallelized with our TLPS scheme and can benefit from the optimizations. 相似文献
13.
采用分子对接和分子动力学(MD)的模拟方法研究了第四代羟基为末端基团的树状大分子(PAMAM-G4-OH)与布洛芬的相互作用机理并对形成的复合物稳定性进行研究,结果发现:布洛芬插入PAMAM-G4-OH树状大分子空穴,羧基基团靠近核心;对接过程中分子间的范德华力和分子间的氢键贡献很大;将复合物体系进行2000 ps的MD模拟,发现前1000 ps体系的势能、总能量和RMSD值持续下降趋于平衡,最后分别在-1639 kcal/mol~-1701 kcal/mol,-457.009 kcal/mol~-475.809 kcal/mol,0.487A~0.607A范围内波动:复合物的结构起先较松散,经过2000 ps的模拟后逐渐紧凑。结论:树状大分子(PAMAM-G4-OH)与布洛芬主要作用力主为静电力,该静电力主要由布洛芬的羧基去质子化后形成的负电离子和树状大分子内部碱性的叔胺离子产生的;经过2000 ps的分子动力学模拟后,该体系达到了稳定状态。 相似文献
14.
Weiguo Liu Author Vitae Author Vitae Gerrit Voss Author Vitae Author Vitae 《Computer Physics Communications》2008,179(9):634-641
Molecular dynamics is an important computational tool to simulate and understand biochemical processes at the atomic level. However, accurate simulation of processes such as protein folding requires a large number of both atoms and time steps. This in turn leads to huge runtime requirements. Hence, finding fast solutions is of highest importance to research. In this paper we present a new approach to accelerate molecular dynamics simulations with inexpensive commodity graphics hardware. To derive an efficient mapping onto this type of computer architecture, we have used the new Compute Unified Device Architecture programming interface to implement a new parallel algorithm. Our experimental results show that the graphics card based approach allows speedups of up to factor nineteen compared to the corresponding sequential implementation. 相似文献
15.
Accelerating dissipative particle dynamics simulations on GPUs: Algorithms,numerics and applications
We present a scalable dissipative particle dynamics simulation code, fully implemented on the Graphics Processing Units (GPUs) using a hybrid CUDA/MPI programming model, which achieves 10–30 times speedup on a single GPU over 16 CPU cores and almost linear weak scaling across a thousand nodes. A unified framework is developed within which the efficient generation of the neighbor list and maintaining particle data locality are addressed. Our algorithm generates strictly ordered neighbor lists in parallel, while the construction is deterministic and makes no use of atomic operations or sorting. Such neighbor list leads to optimal data loading efficiency when combined with a two-level particle reordering scheme. A faster in situ generation scheme for Gaussian random numbers is proposed using precomputed binary signatures. We designed custom transcendental functions that are fast and accurate for evaluating the pairwise interaction. The correctness and accuracy of the code is verified through a set of test cases simulating Poiseuille flow and spontaneous vesicle formation. Computer benchmarks demonstrate the speedup of our implementation over the CPU implementation as well as strong and weak scalability. A large-scale simulation of spontaneous vesicle formation consisting of 128 million particles was conducted to further illustrate the practicality of our code in real-world applications. 相似文献
16.
We present implementations of the numerical integration of systems with long-range interactions on graphic processing units for three N-body models with long-range interactions of general interest: the Hamiltonian Mean Field, Ring and two-dimensional self-gravitating models. We discuss the algorithms, speedups and errors using one and two GPU units. Speedups can be as high as 140 compared to a serial code, and the overall relative error in the total energy is of the same order of magnitude as for the CPU code. The number of particles used in the tests range from 10,000 to 50,000,000 depending on the model. 相似文献
17.
以氯霉素(CAP)为模板分子,甲基丙烯酸(MAA)、丙烯酸(AA)、丙烯酰胺(AM)、甲基丙烯酸甲酯(MMA)为功能单体,采用分子动力学方法研究了不同预组装体系中CAP与功能单体的相互作用,考察了溶剂(氯仿、甲醇、乙腈)对预组装体系的影响,并采用试验方法验证模拟结果,最后研究了预组装体系中CAP与功能单体相互作用距离。结果表明:模拟结果与实验结果一致,在甲醇溶剂中,CAP与功能单体相互作用强弱顺序为MAA>AA>AM>MMA;乙腈溶剂中顺序为AM>MAA>AA>MMA。CAP与功能单体相互作用基团的距离在1.8~5.0,两者间可能存在较强的非共价结合作用。 相似文献
18.
19.
We present a molecular dynamics (MD) model system to quantitatively study nanoscopic wear of rough surfaces under two-body and three-body contact conditions with multiple abrasive particles. We describe how to generate a surface with a pseudo-random Gaussian topography which is periodically replicable, and we discuss the constraints on the abrasive particles that lead to certain wear conditions. We propose a post-processing scheme which, based on advection velocity, dynamically identifies the atoms in the simulation as either part of a wear particle, the substrate, or the sheared zone in-between. This scheme is then justified from a crystallographic order point of view. We apply a distance-based contact zone identification scheme and outline a clustering algorithm which can associate each contact atom with the abrasive particle causing the respective contact zone. Finally, we show how the knowledge of each atom’s zone affiliation and a time-resolved evaluation of the substrate topography leads to a break-down of the asperity volume reduction into its components: the pit fill-up volume, the individual wear particles, the shear zone, and the sub-surface substrate compression. As an example, we analyze the time and pressure dependence of the wear volume contributions for two-body and three-body wear processes of a rough iron surface with rigid spherical and cubic abrasive particles. 相似文献
20.
采用分子动力学模拟平台(ME),计算了H型水合物的性质及结构参数,并通过分析水合物晶体的最终构像、径向分布函数、均方位移、扩散系数、模拟体系势能等微观特征参数,考察了温度对H型水合物晶体结构稳定性的影响。模拟结果表明,随温度的升高,H型水合物的稳定性降低,笼型结构有分解之趋势。 相似文献