首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 28 毫秒
1.
A scalable parallel algorithm has been designed to study long-time dynamics of many-atom systems based on the nudged elastic band method, which performs mutually constrained molecular dynamics simulations for a sequence of atomic configurations (or states) to obtain a minimum energy path between initial and final local minimum-energy states. A directionally heated nudged elastic band method is introduced to search for thermally activated events without the knowledge of final states, which is then applied to an ensemble of bands in a path ensemble method for long-time simulation in the framework of the transition state theory. The resulting molecular kinetics (MK) simulation method is parallelized with a space-time-ensemble parallel nudged elastic band (STEP-NEB) algorithm, which employs spatial decomposition within each state, while temporal parallelism across the states within each band and band-ensemble parallelism are implemented using a hierarchy of communicator constructs in the Message Passing Interface library. The STEP-NEB algorithm exhibits good scalability with respect to spatial, temporal and ensemble decompositions on massively parallel computers. The MK simulation method is used to study low strain-rate deformation of amorphous silica.  相似文献   

2.
A scalable parallel algorithm has been designed to perform multimillion-atom molecular dynamics (MD) simulations, in which first principles-based reactive force fields (ReaxFF) describe chemical reactions. Environment-dependent bond orders associated with atomic pairs and their derivatives are reused extensively with the aid of linked-list cells to minimize the computation associated with atomic n-tuple interactions (n?4 explicitly and ?6 due to chain-rule differentiation). These n-tuple computations are made modular, so that they can be reconfigured effectively with a multiple time-step integrator to further reduce the computation time. Atomic charges are updated dynamically with an electronegativity equalization method, by iteratively minimizing the electrostatic energy with the charge-neutrality constraint. The ReaxFF-MD simulation algorithm has been implemented on parallel computers based on a spatial decomposition scheme combined with distributed n-tuple data structures. The measured parallel efficiency of the parallel ReaxFF-MD algorithm is 0.998 on 131,072 IBM BlueGene/L processors for a 1.01 billion-atom RDX system.  相似文献   

3.
A linear-scaling algorithm has been developed to perform large-scale molecular-dynamics (MD) simulations, in which interatomic forces are computed quantum mechanically in the framework of the density functional theory. A divide-and-conquer algorithm is used to compute the electronic structure, where non-additive contribution to the kinetic energy is included with an embedded cluster scheme. Electronic wave functions are represented on a real-space grid, which is augmented with coarse multigrids to accelerate the convergence of iterative solutions and adaptive fine grids around atoms to accurately calculate ionic pseudopotentials. Spatial decomposition is employed to implement the hierarchical-grid algorithm on massively parallel computers. A converged solution to the electronic-structure problem is obtained for a 32,768-atom amorphous CdSe system on 512 IBM POWER4 processors. The total energy is well conserved during MD simulations of liquid Rb, showing the applicability of this algorithm to first principles MD simulations. The parallel efficiency is 0.985 on 128 Intel Xeon processors for a 65,536-atom CdSe system.  相似文献   

4.
The Wang-Landau algorithm is a flat-histogram Monte Carlo method that performs random walks in the configuration space of a system to obtain a close estimation of the density of states iteratively. It has been applied successfully to many research fields. In this paper, we propose a parallel implementation of the Wang-Landau algorithm on computers of shared memory architectures by utilizing the OpenMP API for distributed computing. This implementation is applied to Ising model systems with promising speedups. We also examine the effects on the running speed when using different strategies in accessing the shared memory space during the updating procedure. The allowance of data race is recommended in consideration of the simulation efficiency. Such treatment does not affect the accuracy of the final density of states obtained.  相似文献   

5.
Atomistic simulation of protein adsorption on a solid surface in aqueous environment is computationally demanding, therefore the determination of preferred protein orientations on the solid surface usually serves as an initial step in simulation studies. We have developed a hybrid multi-loop genetic-algorithm/simplex/spatial-grid method to search for low adsorption-energy orientations of a protein molecule on a solid surface. In this method, the surface and the protein molecule are treated as rigid bodies, whereas the bulk fluid is represented by spatial grids. For each grid point, an effective interaction region in the surface is defined by a cutoff distance, and the possible interaction energy between an atom at the grid point and the surface is calculated and recorded in a database. In searching for the optimum position and orientation, the protein molecule is translated and rotated as a rigid body with the configuration obtained from a previous Molecular Dynamic simulation. The orientation-dependent protein-surface interaction energy is obtained using the generated database of grid energies. The hybrid search procedure consists of two interlinked loops. In the first loop A, a genetic algorithm (GA) is applied to identify promising regions for the global energy minimum and a local optimizer with the derivative-free Nelder-Mead simplex method is used to search for the lowest-energy orientation within the identified regions. In the second loop B, a new population for GA is generated and competitive solution from loop A is improved. Switching between the two loops is adaptively controlled by the use of similarity analysis. We test the method for lysozyme adsorption on a hydrophobic hydrogen-terminated silicon (110) surface in implicit water (i.e., a continuum distance-dependent dielectric constant). The results show that the hybrid search method has faster convergence and better solution accuracy compared with the conventional genetic algorithm.  相似文献   

6.
This paper focuses on the implementation and the performance analysis of a smooth particle mesh Ewald method on several parallel computers. We present the details of the algorithms and our implementation that are used to optimize parallel efficiency on such parallel computers.  相似文献   

7.
The simulated annealing (SA) algorithm has been recognized as a powerful technique for minimizing complicated functions. However, a critical disadvantage of the SA algorithm is its high computational cost. Therefore, it is the goal of this paper to investigate the use of the critical temperature in SA to reduce its computational cost. This paper presents a systematic study of the critical temperature and its applications in the minimization of functions of continuous variables with the SA algorithm. Based on this study, a new algorithm was developed to exploit the unique feature of the critical temperature in SA. The new algorithm combines SA and local search to determine global minimum effectively. Extensive tests on a variety of functions demonstrated that the new algorithm provides comparable performance to well-established SA techniques. Furthermore, the new algorithm also improves the determination of the starting temperature for the SA algorithm. The results obtained in this study are expected to be useful for improving the efficiency of SA algorithms, and for facilitating the development of temperature parallel SA algorithms.  相似文献   

8.
We describe portable software to simulate universal quantum computers on massive parallel computers. We illustrate the use of the simulation software by running various quantum algorithms on different computer architectures, such as a IBM BlueGene/L, a IBM Regatta p690+, a Hitachi SR11000/J1, a Cray X1E, a SGI Altix 3700 and clusters of PCs running Windows XP. We study the performance of the software by simulating quantum computers containing up to 36 qubits, using up to 4096 processors and up to 1 TB of memory. Our results demonstrate that the simulator exhibits nearly ideal scaling as a function of the number of processors and suggest that the simulation software described in this paper may also serve as benchmark for testing high-end parallel computers.  相似文献   

9.
In this paper, we propose a high-performance parallel three-dimensional fast Fourier transform (FFT) algorithm on clusters of PCs. The three-dimensional FFT algorithm can be altered into a block three-dimensional FFT algorithm to reduce the number of cache misses. We show that the block three-dimensional FFT algorithm improves performance by utilizing the cache memory effectively. We use the block three-dimensional FFT algorithm to implement the parallel three-dimensional FFT algorithm. We succeeded in obtaining performance of over 1.3 GFLOPS on an 8-node dual Pentium III 1 GHz PC SMP cluster.  相似文献   

10.
State-of-the-art molecular dynamics (MD) simulations generate massive datasets involving billion-vertex chemical bond networks, which makes data mining based on graph algorithms such as K-ring analysis a challenge. This paper proposes an algorithm to improve the efficiency of ring analysis of large graphs, exploiting properties of K-rings and spatial correlations of vertices in the graph. The algorithm uses dual-tree expansion (DTE) and spatial hash-function tagging (SHAFT) to optimize computation and memory access. Numerical tests show nearly perfect linear scaling of the algorithm. Also a parallel implementation of the DTE + SHAFT algorithm achieves high scalability. The algorithm has been successfully employed to analyze large MD simulations involving up to 500 million atoms.  相似文献   

11.
We review the basic ideas behind the quantum lattice Boltzmann equation (LBE), and present a few thoughts on the possible use of such an equation for simulating quantum many-body problems on both (parallel) electronic and quantum computers.  相似文献   

12.
In this paper we discuss the implementation of advanced variable connectivity Monte Carlo (MC) simulation methods for studying large (>105 atom) polymer systems at the atomic level. Such codes are intrinsically difficult to optimize since they involve a mixture of many different elementary MC steps, such as reptation, flip, end rotation, concerted rotation and volume fluctuation moves. In particular, connectivity altering MC moves, such as the recently developed directed end bridging (DEB) algorithm, are required in order to vigorously sample the configuration space. Techniques for effective vector implementation of such moves are described. We also show how a simple domain decomposition method can provide a general and efficient means of parallelizing these complex MC protocols. Benchmarks are reported for a 192,000 atom simulation of polydisperse linear polyethylene with an average chain length C6000, for simulations using 1 to 8 processors and a variety of MC protocols.  相似文献   

13.
14.
A previously presented hybrid finite volume/particle method for the solution of the joint-velocity-frequency-composition probability density function (JPDF) transport equation in complex 3D geometries is extended for parallel computing. The parallelization strategy is based on domain decomposition. The finite volume method (FVM) and the particle method (PM) are parallelized separately and the algorithm is fully synchronous. For the FVM a standard method based on transferring data in ghost cells is used. Moreover, a subdomain interior decomposition algorithm to efficiently solve the implicit time integration for hyperbolic systems is described. The parallelization of the PM is more complicated due to the use of a sub-time stepping algorithm for the particle trajectory integration. Hereby, each particle obeys its local CFL criterion, and the covered distances per global time step can vary significantly. Therefore, an efficient algorithm which deals with this issue and has minimum communication effort was devised and implemented. Numerical tests to validate the parallel vs. the serial algorithm are presented, where also the effectiveness of the subdomain interior decomposition for the implicit time integration was investigated. A 3D dump-combustor configuration test case with about 2.5 × 105 cells was used to demonstrate the good performance of the parallel algorithm. The hybrid algorithm scales well and the maximum speedup on 60 processors for this configuration was 50 (≈80% parallel efficiency).  相似文献   

15.
In this paper, we present the detailed Mathematica symbolic derivation and the program which is used to integrate a one-dimensional Schrödinger equation by a new two-step numerical method. We add the fourth- and sixth-order derivatives to raise the precision of the traditional Numerov's method from fourth order to twelfth order, and to expand the interval of periodicity from (0,6) to the one of (0,9.7954) and (9.94792,55.6062). In the program we use an efficient algorithm to calculate the first-order derivative and avoid unnecessarily repeated calculation resulting from the multi-derivatives. We use the well-known Woods-Saxon's potential to test our method. The numerical test shows that the new method is not only superior to the previous lower order ones in accuracy, but also in the efficiency. This program is specially applied to the problem where a high accuracy or a larger step size is required.

Program summary

Title of program: ShdEq.nbCatalogue number: ADTTProgram summary URL:http://cpc.cs.qub.ac.uk/summaries/ADTTProgram obtainable from: CPC Program Library, Queen's University of Belfast, N. IrelandLicensing provisions: noneComputer for which the program is designed and others on which it has been tested: The program has been designed for the microcomputer and been tested on the microcomputer.Computers: IBM PCOperating systems under which the program has been tested: Windows XPProgramming language used: Mathematica 4.2Memory required to execute with typical data: 51 712 bytesNo. of bytes in distributed program, including test data, etc.: 45 381No. of lines in distributed program, including test data, etc.: 7311Distribution format: tar gzip fileCPC Program Library subprograms used: noNature of physical problem: Numerical integration of one-dimensional or radial Schrödinger equation to find the eigenvalues for a bound states and phase shift for a continuum state.Method of solution: Using a two-step method twelfth-order method to integrate a Schrödinger equation numerically from both two ends and the connecting conditions at the matching point, an eigenvalue for a bound state or a resonant state with a given phase shift can be found.Restrictions on the complexity of the problem: The analytic form of the potential function and its high-order derivatives must be known.Typical running time: Less than one second.Unusual features of the program: Take advantage of the high-order derivatives of the potential function and efficient algorithm, the program can provide all the numerical solution of a given Schrödinger equation, either a bound or a resonant state, with a very high precision and within a very short CPU time. The program can apply to a very broad range of problems because the method has a very large interval of periodicity.References: [1] T.E. Simos, Proc. Roy. Soc. London A 441 (1993) 283.[2] Z. Wang, Y. Dai, An eighth-order two-step formula for the numerical integration of the one-dimensional Schrödinger equation, Numer. Math. J. Chinese Univ. 12 (2003) 146.[3] Z. Wang, Y. Dai, An twelfth-order four-step formula for the numerical integration of the one-dimensional Schrödinger equation, Internat. J. Modern Phys. C 14 (2003) 1087.  相似文献   

16.
17.
A scalable and portable Fortran code is developed to calculate Coulomb interaction potentials of charged particles on parallel computers, based on the fast multipole method. The code has a unique feature to calculate microscopic stress tensors due to the Coulomb interactions, which is useful in constant-pressure simulations and local stress analyses. The code is applicable to various boundary conditions, including periodic boundary conditions in two and three dimensions, corresponding to slab and bulk systems, respectively. Numerical accuracy of the code is tested through comparison of its results with those obtained by the Ewald summation method and by direct calculations. Scalability tests show the parallel efficiency of 0.98 for 512 million charged particles on 512 IBM SP3 processors. The timing results on IBM SP3 are also compared with those on IBM SP4.  相似文献   

18.
The tunable dimension cluster-cluster aggregation (tdCCA) [R. Thouy, R. Jullien, J. Phys. A: Math. Gen. 27 (1994) 2953] provides a computational model for creating fractal aggregates with a tunable fractal dimension. A straightforward implementation of this model requires a computational effort scaling with O(Ntotal4) of the number of particles Ntotal. By applying two minor changes to the algorithm the computational effort can be reduced to O(Ntotal2) and allows an efficient parallel implementation of the tdCCA. On a modern parallel computer a fractal aggregate of one million particles has been built in less than 24 h.  相似文献   

19.
A domain decomposition algorithm for molecular dynamics simulation of atomic and molecular systems with arbitrary shape and non-periodic boundary conditions is described. The molecular dynamics program uses cell multipole method for efficient calculation of long range electrostatic interactions and a multiple time step method to facilitate bigger time steps. The system is enclosed in a cube and the cube is divided into a hierarchy of cells. The deepest level cells are assigned to processors such that each processor has contiguous cells and static load balancing is achieved by redistributing the cells so that each processor has approximately same number of atoms. The resulting domains have irregular shape and may have more than 26 neighbors. Atoms constituting bond angles and torsion angles may straddle more than two processors. An efficient strategy is devised for initial assignment and subsequent reassignment of such multiple-atom potentials to processors. At each step, computation is overlapped with communication greatly reducing the effect of communication overhead on parallel performance. The algorithm is tested on a spherical cluster of water molecules, a hexasaccharide and an enzyme both solvated by a spherical cluster of water molecules. In each case a spherical boundary containing oxygen atoms with only repulsive interactions is used to prevent evaporation of water molecules. The algorithm shows excellent parallel efficiency even for small number of cells/atoms per processor.  相似文献   

20.
A scalable and portable code named Atomsviewer has been developed to interactively visualize a large atomistic dataset consisting of up to a billion atoms. The code uses a hierarchical view frustum-culling algorithm based on the octree data structure to efficiently remove atoms outside of the user's field-of-view. Probabilistic and depth-based occlusion-culling algorithms then select atoms, which have a high probability of being visible. Finally a multiresolution algorithm is used to render the selected subset of visible atoms at varying levels of detail. Atomsviewer is written in C++ and OpenGL, and it has been tested on a number of architectures including Windows, Macintosh, and SGI. Atomsviewer has been used to visualize tens of millions of atoms on a standard desktop computer and, in its parallel version, up to a billion atoms.

Program summary

Title of program: AtomsviewerCatalogue identifier: ADUMProgram summary URL:http://cpc.cs.qub.ac.uk/summaries/ADUMProgram obtainable from: CPC Program Library, Queen's University of Belfast, N. IrelandComputer for which the program is designed and others on which it has been tested: 2.4 GHz Pentium 4/Xeon processor, professional graphics card; Apple G4 (867 MHz)/G5, professional graphics cardOperating systems under which the program has been tested: Windows 2000/XP, Mac OS 10.2/10.3, SGI IRIX 6.5Programming languages used: C++, C and OpenGLMemory required to execute with typical data: 1 gigabyte of RAMHigh speed storage required: 60 gigabytesNo. of lines in the distributed program including test data, etc.: 550 241No. of bytes in the distributed program including test data, etc.: 6 258 245Number of bits in a word: ArbitraryNumber of processors used: 1Has the code been vectorized or parallelized: NoDistribution format: tar gzip fileNature of physical problem: Scientific visualization of atomic systemsMethod of solution: Rendering of atoms using computer graphic techniques, culling algorithms for data minimization, and levels-of-detail for minimal renderingRestrictions on the complexity of the problem: NoneTypical running time: The program is interactive in its executionUnusual features of the program: NoneReferences: The conceptual foundation and subsequent implementation of the algorithms are found in [A. Sharma, A. Nakano, R.K. Kalia, P. Vashishta, S. Kodiyalam, P. Miller, W. Zhao, X.L. Liu, T.J. Campbell, A. Haas, Presence—Teleoperators and Virtual Environments 12 (1) (2003)].  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号