期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A space-time-ensemble parallel nudged elastic band algorithm for molecular kinetics simulation 总被引：1，自引：0，他引：1

Aiichiro Nakano 《Computer Physics Communications》2008,178(4):280-289

A scalable parallel algorithm has been designed to study long-time dynamics of many-atom systems based on the nudged elastic band method, which performs mutually constrained molecular dynamics simulations for a sequence of atomic configurations (or states) to obtain a minimum energy path between initial and final local minimum-energy states. A directionally heated nudged elastic band method is introduced to search for thermally activated events without the knowledge of final states, which is then applied to an ensemble of bands in a path ensemble method for long-time simulation in the framework of the transition state theory. The resulting molecular kinetics (MK) simulation method is parallelized with a space-time-ensemble parallel nudged elastic band (STEP-NEB) algorithm, which employs spatial decomposition within each state, while temporal parallelism across the states within each band and band-ensemble parallelism are implemented using a hierarchy of communicator constructs in the Message Passing Interface library. The STEP-NEB algorithm exhibits good scalability with respect to spatial, temporal and ensemble decompositions on massively parallel computers. The MK simulation method is used to study low strain-rate deformation of amorphous silica. 相似文献

2.

A scalable parallel algorithm for large-scale reactive force-field molecular dynamics simulations 总被引：1，自引：0，他引：1

Ken-ichi Nomura Priya Vashishta 《Computer Physics Communications》2008,178(2):73-87

A scalable parallel algorithm has been designed to perform multimillion-atom molecular dynamics (MD) simulations, in which first principles-based reactive force fields (ReaxFF) describe chemical reactions. Environment-dependent bond orders associated with atomic pairs and their derivatives are reused extensively with the aid of linked-list cells to minimize the computation associated with atomic n-tuple interactions (n?4 explicitly and ?6 due to chain-rule differentiation). These n-tuple computations are made modular, so that they can be reconfigured effectively with a multiple time-step integrator to further reduce the computation time. Atomic charges are updated dynamically with an electronegativity equalization method, by iteratively minimizing the electrostatic energy with the charge-neutrality constraint. The ReaxFF-MD simulation algorithm has been implemented on parallel computers based on a spatial decomposition scheme combined with distributed n-tuple data structures. The measured parallel efficiency of the parallel ReaxFF-MD algorithm is 0.998 on 131,072 IBM BlueGene/L processors for a 1.01 billion-atom RDX system. 相似文献

3.

Embedded divide-and-conquer algorithm on hierarchical real-space grids: parallel molecular dynamics simulation based on linear-scaling density functional theory

Fuyuki Shimojo Rajiv K. Kalia Priya Vashishta 《Computer Physics Communications》2005,167(3):151-164

A linear-scaling algorithm has been developed to perform large-scale molecular-dynamics (MD) simulations, in which interatomic forces are computed quantum mechanically in the framework of the density functional theory. A divide-and-conquer algorithm is used to compute the electronic structure, where non-additive contribution to the kinetic energy is included with an embedded cluster scheme. Electronic wave functions are represented on a real-space grid, which is augmented with coarse multigrids to accelerate the convergence of iterative solutions and adaptive fine grids around atoms to accurately calculate ionic pseudopotentials. Spatial decomposition is employed to implement the hierarchical-grid algorithm on massively parallel computers. A converged solution to the electronic-structure problem is obtained for a 32,768-atom amorphous CdSe system on 512 IBM POWER4 processors. The total energy is well conserved during MD simulations of liquid Rb, showing the applicability of this algorithm to first principles MD simulations. The parallel efficiency is 0.985 on 128 Intel Xeon processors for a 65,536-atom CdSe system. 相似文献

4.

A parallel implementation of the Wang-Landau algorithm

Lixin Zhan 《Computer Physics Communications》2008,179(5):339-344

The Wang-Landau algorithm is a flat-histogram Monte Carlo method that performs random walks in the configuration space of a system to obtain a close estimation of the density of states iteratively. It has been applied successfully to many research fields. In this paper, we propose a parallel implementation of the Wang-Landau algorithm on computers of shared memory architectures by utilizing the OpenMP API for distributed computing. This implementation is applied to Ising model systems with promising speedups. We also examine the effects on the running speed when using different strategies in accessing the shared memory space during the updating procedure. The allowance of data race is recommended in consideration of the simulation efficiency. Such treatment does not affect the accuracy of the final density of states obtained. 相似文献

5.

A hybrid multi-loop genetic-algorithm/simplex/spatial-grid method for locating the optimum orientation of an adsorbed protein on a solid surface

Tao Wei Aiichiro Nakano 《Computer Physics Communications》2009,180(5):669-12081

Atomistic simulation of protein adsorption on a solid surface in aqueous environment is computationally demanding, therefore the determination of preferred protein orientations on the solid surface usually serves as an initial step in simulation studies. We have developed a hybrid multi-loop genetic-algorithm/simplex/spatial-grid method to search for low adsorption-energy orientations of a protein molecule on a solid surface. In this method, the surface and the protein molecule are treated as rigid bodies, whereas the bulk fluid is represented by spatial grids. For each grid point, an effective interaction region in the surface is defined by a cutoff distance, and the possible interaction energy between an atom at the grid point and the surface is calculated and recorded in a database. In searching for the optimum position and orientation, the protein molecule is translated and rotated as a rigid body with the configuration obtained from a previous Molecular Dynamic simulation. The orientation-dependent protein-surface interaction energy is obtained using the generated database of grid energies. The hybrid search procedure consists of two interlinked loops. In the first loop A, a genetic algorithm (GA) is applied to identify promising regions for the global energy minimum and a local optimizer with the derivative-free Nelder-Mead simplex method is used to search for the lowest-energy orientation within the identified regions. In the second loop B, a new population for GA is generated and competitive solution from loop A is improved. Switching between the two loops is adaptively controlled by the use of similarity analysis. We test the method for lysozyme adsorption on a hydrophobic hydrogen-terminated silicon (110) surface in implicit water (i.e., a continuum distance-dependent dielectric constant). The results show that the hybrid search method has faster convergence and better solution accuracy compared with the conventional genetic algorithm. 相似文献

6.

An efficient parallel implementation of the smooth particle mesh Ewald method for molecular dynamics simulations

Kwang Jin Oh Yuefan Deng 《Computer Physics Communications》2007,177(5):426-431

This paper focuses on the implementation and the performance analysis of a smooth particle mesh Ewald method on several parallel computers. We present the details of the algorithms and our implementation that are used to optimize parallel efficiency on such parallel computers. 相似文献

7.

Applications of critical temperature in minimizing functions of continuous variables with simulated annealing algorithm

Weiwei Cai 《Computer Physics Communications》2010,181(1):11-5767

The simulated annealing (SA) algorithm has been recognized as a powerful technique for minimizing complicated functions. However, a critical disadvantage of the SA algorithm is its high computational cost. Therefore, it is the goal of this paper to investigate the use of the critical temperature in SA to reduce its computational cost. This paper presents a systematic study of the critical temperature and its applications in the minimization of functions of continuous variables with the SA algorithm. Based on this study, a new algorithm was developed to exploit the unique feature of the critical temperature in SA. The new algorithm combines SA and local search to determine global minimum effectively. Extensive tests on a variety of functions demonstrated that the new algorithm provides comparable performance to well-established SA techniques. Furthermore, the new algorithm also improves the determination of the starting temperature for the SA algorithm. The results obtained in this study are expected to be useful for improving the efficiency of SA algorithms, and for facilitating the development of temperature parallel SA algorithms. 相似文献

8.

Massively parallel quantum computer simulator

K. De Raedt H. De Raedt B. Trieu 《Computer Physics Communications》2007,176(2):121-136

We describe portable software to simulate universal quantum computers on massive parallel computers. We illustrate the use of the simulation software by running various quantum algorithms on different computer architectures, such as a IBM BlueGene/L, a IBM Regatta p690+, a Hitachi SR11000/J1, a Cray X1E, a SGI Altix 3700 and clusters of PCs running Windows XP. We study the performance of the software by simulating quantum computers containing up to 36 qubits, using up to 4096 processors and up to 1 TB of memory. Our results demonstrate that the simulator exhibits nearly ideal scaling as a function of the number of processors and suggest that the simulation software described in this paper may also serve as benchmark for testing high-end parallel computers. 相似文献

9.

Efficient implementation of parallel three-dimensional FFT on clusters of PCs

Daisuke Takahashi 《Computer Physics Communications》2003,152(2):144-150

In this paper, we propose a high-performance parallel three-dimensional fast Fourier transform (FFT) algorithm on clusters of PCs. The three-dimensional FFT algorithm can be altered into a block three-dimensional FFT algorithm to reduce the number of cache misses. We show that the block three-dimensional FFT algorithm improves performance by utilizing the cache memory effectively. We use the block three-dimensional FFT algorithm to implement the parallel three-dimensional FFT algorithm. We succeeded in obtaining performance of over 1.3 GFLOPS on an 8-node dual Pentium III 1 GHz PC SMP cluster. 相似文献

10.

Collision-free spatial hash functions for structural analysis of billion-vertex chemical bond networks

Cheng Zhang Paulo S. Branicio Rajiv K. Kalia Ashish Sharma Priya Vashishta 《Computer Physics Communications》2006,175(5):339-347

State-of-the-art molecular dynamics (MD) simulations generate massive datasets involving billion-vertex chemical bond networks, which makes data mining based on graph algorithms such as K-ring analysis a challenge. This paper proposes an algorithm to improve the efficiency of ring analysis of large graphs, exploiting properties of K-rings and spatial correlations of vertices in the graph. The algorithm uses dual-tree expansion (DTE) and spatial hash-function tagging (SHAFT) to optimize computation and memory access. Numerical tests show nearly perfect linear scaling of the algorithm. Also a parallel implementation of the DTE + SHAFT algorithm achieves high scalability. The algorithm has been successfully employed to analyze large MD simulations involving up to 500 million atoms. 相似文献

11.

Lattice Boltzmann schemes for quantum applications

Sauro Succi 《Computer Physics Communications》2002,146(3):317-323

We review the basic ideas behind the quantum lattice Boltzmann equation (LBE), and present a few thoughts on the possible use of such an equation for simulating quantum many-body problems on both (parallel) electronic and quantum computers. 相似文献

12.

Large scale atomistic polymer simulations using Monte Carlo methods for parallel vector processors

Alfred Uhlherr Stephen J. LeakNadia E. Adam Per E. Nyberg Manolis Doxastakis Vlasis G. Mavrantzas Doros N. Theodorou 《Computer Physics Communications》2002,144(1):1-22

In this paper we discuss the implementation of advanced variable connectivity Monte Carlo (MC) simulation methods for studying large (>10⁵ atom) polymer systems at the atomic level. Such codes are intrinsically difficult to optimize since they involve a mixture of many different elementary MC steps, such as reptation, flip, end rotation, concerted rotation and volume fluctuation moves. In particular, connectivity altering MC moves, such as the recently developed directed end bridging (DEB) algorithm, are required in order to vigorously sample the configuration space. Techniques for effective vector implementation of such moves are described. We also show how a simple domain decomposition method can provide a general and efficient means of parallelizing these complex MC protocols. Benchmarks are reported for a 192,000 atom simulation of polydisperse linear polyethylene with an average chain length C₆₀₀₀, for simulations using 1 to 8 processors and a variety of MC protocols. 相似文献

13.

Optimized multiple quantum MAS lineshape simulations in solid state NMR

William J. Brouwer Michael C. Davis Karl T. Mueller 《Computer Physics Communications》2009,180(10):1973-1982

相似文献

14.

Parallel hybrid particle/finite volume algorithm for transported PDF methods employing sub-time stepping

B. Rembold M. Grass 《Computers & Fluids》2008,37(3):181-193

A previously presented hybrid finite volume/particle method for the solution of the joint-velocity-frequency-composition probability density function (JPDF) transport equation in complex 3D geometries is extended for parallel computing. The parallelization strategy is based on domain decomposition. The finite volume method (FVM) and the particle method (PM) are parallelized separately and the algorithm is fully synchronous. For the FVM a standard method based on transferring data in ghost cells is used. Moreover, a subdomain interior decomposition algorithm to efficiently solve the implicit time integration for hyperbolic systems is described. The parallelization of the PM is more complicated due to the use of a sub-time stepping algorithm for the particle trajectory integration. Hereby, each particle obeys its local CFL criterion, and the covered distances per global time step can vary significantly. Therefore, an efficient algorithm which deals with this issue and has minimum communication effort was devised and implemented. Numerical tests to validate the parallel vs. the serial algorithm are presented, where also the effectiveness of the subdomain interior decomposition for the implicit time integration was investigated. A 3D dump-combustor configuration test case with about 2.5 × 10⁵ cells was used to demonstrate the good performance of the parallel algorithm. The hybrid algorithm scales well and the maximum speedup on 60 processors for this configuration was 50 (≈80% parallel efficiency). 相似文献

15.

A Mathematica program for the two-step twelfth-order method with multi-derivative for the numerical solution of a one-dimensional Schrödinger equation

Zhongcheng Wang Yonghua Ge Yongming Dai Deyin Zhao 《Computer Physics Communications》2004,160(1):23-45

In this paper, we present the detailed Mathematica symbolic derivation and the program which is used to integrate a one-dimensional Schrödinger equation by a new two-step numerical method. We add the fourth- and sixth-order derivatives to raise the precision of the traditional Numerov's method from fourth order to twelfth order, and to expand the interval of periodicity from (0,6) to the one of (0,9.7954) and (9.94792,55.6062). In the program we use an efficient algorithm to calculate the first-order derivative and avoid unnecessarily repeated calculation resulting from the multi-derivatives. We use the well-known Woods-Saxon's potential to test our method. The numerical test shows that the new method is not only superior to the previous lower order ones in accuracy, but also in the efficiency. This program is specially applied to the problem where a high accuracy or a larger step size is required.

Program summary

Title of program: ShdEq.nbCatalogue number: ADTTProgram summary URL:http://cpc.cs.qub.ac.uk/summaries/ADTTProgram obtainable from: CPC Program Library, Queen's University of Belfast, N. IrelandLicensing provisions: noneComputer for which the program is designed and others on which it has been tested: The program has been designed for the microcomputer and been tested on the microcomputer.Computers: IBM PCOperating systems under which the program has been tested: Windows XPProgramming language used: Mathematica 4.2Memory required to execute with typical data: 51 712 bytesNo. of bytes in distributed program, including test data, etc.: 45 381No. of lines in distributed program, including test data, etc.: 7311Distribution format: tar gzip fileCPC Program Library subprograms used: noNature of physical problem: Numerical integration of one-dimensional or radial Schrödinger equation to find the eigenvalues for a bound states and phase shift for a continuum state.Method of solution: Using a two-step method twelfth-order method to integrate a Schrödinger equation numerically from both two ends and the connecting conditions at the matching point, an eigenvalue for a bound state or a resonant state with a given phase shift can be found.Restrictions on the complexity of the problem: The analytic form of the potential function and its high-order derivatives must be known.Typical running time: Less than one second.Unusual features of the program: Take advantage of the high-order derivatives of the potential function and efficient algorithm, the program can provide all the numerical solution of a given Schrödinger equation, either a bound or a resonant state, with a very high precision and within a very short CPU time. The program can apply to a very broad range of problems because the method has a very large interval of periodicity.References: [1] T.E. Simos, Proc. Roy. Soc. London A 441 (1993) 283.[2] Z. Wang, Y. Dai, An eighth-order two-step formula for the numerical integration of the one-dimensional Schrödinger equation, Numer. Math. J. Chinese Univ. 12 (2003) 146.[3] Z. Wang, Y. Dai, An twelfth-order four-step formula for the numerical integration of the one-dimensional Schrödinger equation, Internat. J. Modern Phys. C 14 (2003) 1087. 相似文献

16.

A package of Linux scripts for the parallelization of Monte Carlo simulations

Andreu Badal Josep Sempau 《Computer Physics Communications》2006,175(6):440-450

相似文献

17.

Scalable and portable implementation of the fast multipole method on parallel computers

Shuji Ogata Rajiv K Kalia Aiichiro Nakano Priya Vashishta Satyavani Vemparala 《Computer Physics Communications》2003,153(3):445-461

A scalable and portable Fortran code is developed to calculate Coulomb interaction potentials of charged particles on parallel computers, based on the fast multipole method. The code has a unique feature to calculate microscopic stress tensors due to the Coulomb interactions, which is useful in constant-pressure simulations and local stress analyses. The code is applicable to various boundary conditions, including periodic boundary conditions in two and three dimensions, corresponding to slab and bulk systems, respectively. Numerical accuracy of the code is tested through comparison of its results with those obtained by the Ewald summation method and by direct calculations. Scalability tests show the parallel efficiency of 0.98 for 512 million charged particles on 512 IBM SP3 processors. The timing results on IBM SP3 are also compared with those on IBM SP4. 相似文献

18.

Large scale fractal aggregates using the tunable dimension cluster-cluster aggregation

Oliver Vormoor 《Computer Physics Communications》2002,144(2):121-129

The tunable dimension cluster-cluster aggregation (tdCCA) [R. Thouy, R. Jullien, J. Phys. A: Math. Gen. 27 (1994) 2953] provides a computational model for creating fractal aggregates with a tunable fractal dimension. A straightforward implementation of this model requires a computational effort scaling with O(N_total⁴) of the number of particles N_total. By applying two minor changes to the algorithm the computational effort can be reduced to O(N_total²) and allows an efficient parallel implementation of the tdCCA. On a modern parallel computer a fractal aggregate of one million particles has been built in less than 24 h. 相似文献

19.

A cell multipole based domain decomposition algorithm for molecular dynamics simulation of systems of arbitrary shape

Pasupulati Lakshminarasimhulu Jeffry D. Madura 《Computer Physics Communications》2002,144(2):141-153

A domain decomposition algorithm for molecular dynamics simulation of atomic and molecular systems with arbitrary shape and non-periodic boundary conditions is described. The molecular dynamics program uses cell multipole method for efficient calculation of long range electrostatic interactions and a multiple time step method to facilitate bigger time steps. The system is enclosed in a cube and the cube is divided into a hierarchy of cells. The deepest level cells are assigned to processors such that each processor has contiguous cells and static load balancing is achieved by redistributing the cells so that each processor has approximately same number of atoms. The resulting domains have irregular shape and may have more than 26 neighbors. Atoms constituting bond angles and torsion angles may straddle more than two processors. An efficient strategy is devised for initial assignment and subsequent reassignment of such multiple-atom potentials to processors. At each step, computation is overlapped with communication greatly reducing the effect of communication overhead on parallel performance. The algorithm is tested on a spherical cluster of water molecules, a hexasaccharide and an enzyme both solvated by a spherical cluster of water molecules. In each case a spherical boundary containing oxygen atoms with only repulsive interactions is used to prevent evaporation of water molecules. The algorithm shows excellent parallel efficiency even for small number of cells/atoms per processor. 相似文献

20.

Scalable and portable visualization of large atomistic datasets

Ashish Sharma Rajiv K. Kalia Aiichiro Nakano Priya Vashishta 《Computer Physics Communications》2004,163(1):53-64

A scalable and portable code named Atomsviewer has been developed to interactively visualize a large atomistic dataset consisting of up to a billion atoms. The code uses a hierarchical view frustum-culling algorithm based on the octree data structure to efficiently remove atoms outside of the user's field-of-view. Probabilistic and depth-based occlusion-culling algorithms then select atoms, which have a high probability of being visible. Finally a multiresolution algorithm is used to render the selected subset of visible atoms at varying levels of detail. Atomsviewer is written in C++ and OpenGL, and it has been tested on a number of architectures including Windows, Macintosh, and SGI. Atomsviewer has been used to visualize tens of millions of atoms on a standard desktop computer and, in its parallel version, up to a billion atoms.

Program summary

Title of program: AtomsviewerCatalogue identifier: ADUMProgram summary URL:http://cpc.cs.qub.ac.uk/summaries/ADUMProgram obtainable from: CPC Program Library, Queen's University of Belfast, N. IrelandComputer for which the program is designed and others on which it has been tested: 2.4 GHz Pentium 4/Xeon processor, professional graphics card; Apple G4 (867 MHz)/G5, professional graphics cardOperating systems under which the program has been tested: Windows 2000/XP, Mac OS 10.2/10.3, SGI IRIX 6.5Programming languages used: C++, C and OpenGLMemory required to execute with typical data: 1 gigabyte of RAMHigh speed storage required: 60 gigabytesNo. of lines in the distributed program including test data, etc.: 550 241No. of bytes in the distributed program including test data, etc.: 6 258 245Number of bits in a word: ArbitraryNumber of processors used: 1Has the code been vectorized or parallelized: NoDistribution format: tar gzip fileNature of physical problem: Scientific visualization of atomic systemsMethod of solution: Rendering of atoms using computer graphic techniques, culling algorithms for data minimization, and levels-of-detail for minimal renderingRestrictions on the complexity of the problem: NoneTypical running time: The program is interactive in its executionUnusual features of the program: NoneReferences: The conceptual foundation and subsequent implementation of the algorithms are found in [A. Sharma, A. Nakano, R.K. Kalia, P. Vashishta, S. Kodiyalam, P. Miller, W. Zhao, X.L. Liu, T.J. Campbell, A. Haas, Presence—Teleoperators and Virtual Environments 12 (1) (2003)]. 相似文献