期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Scalable and portable implementation of the fast multipole method on parallel computers

Shuji Ogata Rajiv K Kalia Aiichiro Nakano Priya Vashishta Satyavani Vemparala 《Computer Physics Communications》2003,153(3):445-461

A scalable and portable Fortran code is developed to calculate Coulomb interaction potentials of charged particles on parallel computers, based on the fast multipole method. The code has a unique feature to calculate microscopic stress tensors due to the Coulomb interactions, which is useful in constant-pressure simulations and local stress analyses. The code is applicable to various boundary conditions, including periodic boundary conditions in two and three dimensions, corresponding to slab and bulk systems, respectively. Numerical accuracy of the code is tested through comparison of its results with those obtained by the Ewald summation method and by direct calculations. Scalability tests show the parallel efficiency of 0.98 for 512 million charged particles on 512 IBM SP3 processors. The timing results on IBM SP3 are also compared with those on IBM SP4. 相似文献

2.

The parallel implementation of the one-dimensional Fourier transformed Vlasov-Poisson system

Bengt Eliasson 《Computer Physics Communications》2005,170(2):205-230

A parallel implementation of an algorithm for solving the one-dimensional, Fourier transformed Vlasov-Poisson system of equations is documented, together with the code structure, file formats and settings to run the code. The properties of the Fourier transformed Vlasov-Poisson system is discussed in connection with the numerical solution of the system. The Fourier method in velocity space is used to treat numerical problems arising due the filamentation of the solution in velocity space. Outflow boundary conditions in the Fourier transformed velocity space removes the highest oscillations in velocity space. A fourth-order compact Padé scheme is used to calculate derivatives in the Fourier transformed velocity space, and spatial derivatives are calculated with a pseudo-spectral method. The parallel algorithms used are described in more detail, in particular the parallel solver of the tri-diagonal systems occurring in the Padé scheme.

Program summary

Title of program:vlasovCatalogue identifier:ADVQProgram summary URL:http://cpc.cs.qub.ac.uk/summaries/ADVQProgram obtainable from: CPC Program Library, Queen's University of Belfast, N. IrelandOperating system under which the program has been tested: Sun Solaris; HP-UX; Read Hat LinuxProgramming language used: FORTRAN 90 with Message Passing Interface (MPI)Computers: Sun Ultra Sparc; HP 9000/785; HP IPF (Itanium Processor Family) ia64 Cluster; PCs clusterNumber of lines in distributed program, including test data, etc.:3737Number of bytes in distributed program, including test data, etc.:18 772Distribution format: tar.gzNature of physical problem: Kinetic simulations of collisionless electron-ion plasmas.Method of solution: A Fourier method in velocity space, a pseudo-spectral method in space and a fourth-order Runge-Kutta scheme in time.Memory required to execute with typical data: Uses typically of the order 10⁵-10⁶ double precision numbers.Restriction on the complexity of the problem: The program uses periodic boundary conditions in space.Typical running time: Depends strongly on the problem size, typically few hours if only electron dynamics is considered and longer if both ion and electron dynamics is important.Unusual features of the program: No 相似文献

3.

A parallel implementation of the Wang-Landau algorithm

Lixin Zhan 《Computer Physics Communications》2008,179(5):339-344

The Wang-Landau algorithm is a flat-histogram Monte Carlo method that performs random walks in the configuration space of a system to obtain a close estimation of the density of states iteratively. It has been applied successfully to many research fields. In this paper, we propose a parallel implementation of the Wang-Landau algorithm on computers of shared memory architectures by utilizing the OpenMP API for distributed computing. This implementation is applied to Ising model systems with promising speedups. We also examine the effects on the running speed when using different strategies in accessing the shared memory space during the updating procedure. The allowance of data race is recommended in consideration of the simulation efficiency. Such treatment does not affect the accuracy of the final density of states obtained. 相似文献

4.

Massively parallel quantum computer simulator

K. De Raedt H. De Raedt B. Trieu 《Computer Physics Communications》2007,176(2):121-136

We describe portable software to simulate universal quantum computers on massive parallel computers. We illustrate the use of the simulation software by running various quantum algorithms on different computer architectures, such as a IBM BlueGene/L, a IBM Regatta p690+, a Hitachi SR11000/J1, a Cray X1E, a SGI Altix 3700 and clusters of PCs running Windows XP. We study the performance of the software by simulating quantum computers containing up to 36 qubits, using up to 4096 processors and up to 1 TB of memory. Our results demonstrate that the simulator exhibits nearly ideal scaling as a function of the number of processors and suggest that the simulation software described in this paper may also serve as benchmark for testing high-end parallel computers. 相似文献

5.

Embedded divide-and-conquer algorithm on hierarchical real-space grids: parallel molecular dynamics simulation based on linear-scaling density functional theory

Fuyuki Shimojo Rajiv K. Kalia Priya Vashishta 《Computer Physics Communications》2005,167(3):151-164

A linear-scaling algorithm has been developed to perform large-scale molecular-dynamics (MD) simulations, in which interatomic forces are computed quantum mechanically in the framework of the density functional theory. A divide-and-conquer algorithm is used to compute the electronic structure, where non-additive contribution to the kinetic energy is included with an embedded cluster scheme. Electronic wave functions are represented on a real-space grid, which is augmented with coarse multigrids to accelerate the convergence of iterative solutions and adaptive fine grids around atoms to accurately calculate ionic pseudopotentials. Spatial decomposition is employed to implement the hierarchical-grid algorithm on massively parallel computers. A converged solution to the electronic-structure problem is obtained for a 32,768-atom amorphous CdSe system on 512 IBM POWER4 processors. The total energy is well conserved during MD simulations of liquid Rb, showing the applicability of this algorithm to first principles MD simulations. The parallel efficiency is 0.985 on 128 Intel Xeon processors for a 65,536-atom CdSe system. 相似文献

6.

Pathfinder: A parallel search algorithm for concerted atomistic events

Aiichiro Nakano 《Computer Physics Communications》2007,176(4):292-299

An algorithm has been designed to search for the escape paths with the lowest activation barriers when starting from a local minimum-energy configuration of a many-atom system. The pathfinder algorithm combines: (1) a steered eigenvector-following method that guides a constrained escape from the convex region and subsequently climbs to a transition state tangentially to the eigenvector corresponding to the lowest negative Hessian eigenvalue; (2) discrete abstraction of the atomic configuration to systematically enumerate concerted events as linear combinations of atomistic events; (3) evolutionary control of the population dynamics of low activation-barrier events; and (4) hybrid task + spatial decompositions to implement massive search for complex events on parallel computers. The program exhibits good scalability on parallel computers and has been used to study concerted bond-breaking events in the fracture of alumina. 相似文献

7.

A scalable parallel algorithm for large-scale reactive force-field molecular dynamics simulations 总被引：1，自引：0，他引：1

Ken-ichi Nomura Priya Vashishta 《Computer Physics Communications》2008,178(2):73-87

A scalable parallel algorithm has been designed to perform multimillion-atom molecular dynamics (MD) simulations, in which first principles-based reactive force fields (ReaxFF) describe chemical reactions. Environment-dependent bond orders associated with atomic pairs and their derivatives are reused extensively with the aid of linked-list cells to minimize the computation associated with atomic n-tuple interactions (n?4 explicitly and ?6 due to chain-rule differentiation). These n-tuple computations are made modular, so that they can be reconfigured effectively with a multiple time-step integrator to further reduce the computation time. Atomic charges are updated dynamically with an electronegativity equalization method, by iteratively minimizing the electrostatic energy with the charge-neutrality constraint. The ReaxFF-MD simulation algorithm has been implemented on parallel computers based on a spatial decomposition scheme combined with distributed n-tuple data structures. The measured parallel efficiency of the parallel ReaxFF-MD algorithm is 0.998 on 131,072 IBM BlueGene/L processors for a 1.01 billion-atom RDX system. 相似文献

8.

A space-time-ensemble parallel nudged elastic band algorithm for molecular kinetics simulation 总被引：1，自引：0，他引：1

Aiichiro Nakano 《Computer Physics Communications》2008,178(4):280-289

A scalable parallel algorithm has been designed to study long-time dynamics of many-atom systems based on the nudged elastic band method, which performs mutually constrained molecular dynamics simulations for a sequence of atomic configurations (or states) to obtain a minimum energy path between initial and final local minimum-energy states. A directionally heated nudged elastic band method is introduced to search for thermally activated events without the knowledge of final states, which is then applied to an ensemble of bands in a path ensemble method for long-time simulation in the framework of the transition state theory. The resulting molecular kinetics (MK) simulation method is parallelized with a space-time-ensemble parallel nudged elastic band (STEP-NEB) algorithm, which employs spatial decomposition within each state, while temporal parallelism across the states within each band and band-ensemble parallelism are implemented using a hierarchy of communicator constructs in the Message Passing Interface library. The STEP-NEB algorithm exhibits good scalability with respect to spatial, temporal and ensemble decompositions on massively parallel computers. The MK simulation method is used to study low strain-rate deformation of amorphous silica. 相似文献

9.

Efficient sensitivity computations in 3D air quality models

Ioannis Kioutsioukis Dimitrios Melas Ioannis Ziomas 《Computer Physics Communications》2005,167(1):23-33

The prediction of ground level ozone for air quality monitoring and assessment is simulated through an integrated system of gridded models (meteorological, photochemical), where the atmosphere is represented with a three-dimensional grid that may include thousands of grid cells. The continuity equation solved by the Photochemical Air Quality Model (PAQM) reproduces the atmospheric processes (dynamical, physical, chemical and radiative), such as moving and mixing air parcels from one grid cell to another, calculating chemical reactions, injecting new emissions. The whole modeling procedure includes several sources of uncertainty, especially in the large data sets that describe the status of the domain (boundary conditions, emissions, chemical reaction rates and several others). The robustness of the photochemical simulation is addressed in this work through the deterministic approach of sensitivity analysis. The automatic differentiation tool ADIFOR is applied on the 3D PAQM CAMx and augments its Fortran 77 code by introducing new lines of code that additionally calculate, in only one run, the gradient of the solution vector with respect to its input parameters. The applicability of the approach is evaluated through a sensitivity study of the modeled concentrations to perturbations at the boundary conditions and the emissions, for three essentially dissimilar European Metropolises of the Auto-Oil II programme (Athens, Milan, and London). 相似文献

10.

Efficient data processing and quantum phenomena: Single-particle systems

H. De Raedt K. De Raedt S. Miyashita 《Computer Physics Communications》2006,174(10):803-817

We study the relation between the acquisition and analysis of data and quantum theory using a probabilistic and deterministic model for photon polarizers. We introduce criteria for efficient processing of data and then use these criteria to demonstrate that efficient processing of the data contained in single events is equivalent to the observation that Malus' law holds. A strictly deterministic process that also yields Malus' law is analyzed in detail. We present a performance analysis of the probabilistic and deterministic model of the photon polarizer. The latter is an adaptive dynamical system that has primitive learning capabilities. This additional feature has recently been shown to be sufficient to perform event-by-event simulations of interference phenomena, without using concepts of wave mechanics. We illustrate this by presenting results for a system of two chained Mach-Zehnder interferometers, suggesting that systems that perform efficient data processing and have learning capability are able to exhibit behavior that is usually attributed to quantum systems only. 相似文献

11.

The implementation of the Minimal Supersymmetric Standard Model in FeynArts and FormCalc

Thomas Hahn Christian Schappacher 《Computer Physics Communications》2002,143(1):54-68

We describe the implementation of the MSSM in the diagram generator FeynArts and the calculational tool FormCalc. This extension allows to perform loop calculations of MSSM processes almost fully automatically. The actual implementation has two aspects: The MSSM Feynman rules are specified in a new model file for FeynArts. The computation of the parameters in the MSSM Lagrangian from the input parameters is realized as a Fortran subroutine in the framework of FormCalc. The model file does not depend on the latter, however, and can be used even if one does not want to continue the calculation with FormCalc. The Feynman rules have been entered in a very generic way to allow, e.g., scenarios with complex parameters, and have been tested extensively by reproducing known results for several non-trivial scattering processes. 相似文献

12.

A fast level set framework for large three-dimensional topography simulations

Otmar Ertl Siegfried Selberherr 《Computer Physics Communications》2009,180(8):1242-1250

We present fast methods to describe the surface evolution of large three-dimensional structures. Based on the sparse field level set method and the hierarchical run-length encoding level set data structure optimal figures for the computation time and for the memory consumption are achieved. Furthermore, we introduce a new multi-level-set technique, which is able to incorporate multiple material regions, and which can also handle material specific surface speeds accurately. We also describe an optimal algorithm for the visibility check for unidirectional etching. The presented techniques are demonstrated on various examples. 相似文献

13.

CADNA: a library for estimating round-off error propagation

Fabienne Jézéquel Jean-Marie Chesneaux 《Computer Physics Communications》2008,178(12):933-955

The CADNA library enables one to estimate round-off error propagation using a probabilistic approach. With CADNA the numerical quality of any simulation program can be controlled. Furthermore by detecting all the instabilities which may occur at run time, a numerical debugging of the user code can be performed. CADNA provides new numerical types on which round-off errors can be estimated. Slight modifications are required to control a code with CADNA, mainly changes in variable declarations, input and output. This paper describes the features of the CADNA library and shows how to interpret the information it provides concerning round-off error propagation in a code.

Program summary

Program title:CADNACatalogue identifier:AEAT_v1_0Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEAT_v1_0.htmlProgram obtainable from:CPC Program Library, Queen's University, Belfast, N. IrelandLicensing provisions:Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.htmlNo. of lines in distributed program, including test data, etc.:53 420No. of bytes in distributed program, including test data, etc.:566 495Distribution format:tar.gzProgramming language:FortranComputer:PC running LINUX with an i686 or an ia64 processor, UNIX workstations including SUN, IBMOperating system:LINUX, UNIXClassification:4.14, 6.5, 20Nature of problem:A simulation program which uses floating-point arithmetic generates round-off errors, due to the rounding performed at each assignment and at each arithmetic operation. Round-off error propagation may invalidate the result of a program. The CADNA library enables one to estimate round-off error propagation in any simulation program and to detect all numerical instabilities that may occur at run time.Solution method:The CADNA library [1] implements Discrete Stochastic Arithmetic [2-4] which is based on a probabilistic model of round-off errors. The program is run several times with a random rounding mode generating different results each time. From this set of results, CADNA estimates the number of exact significant digits in the result that would have been computed with standard floating-point arithmetic.Restrictions:CADNA requires a Fortran 90 (or newer) compiler. In the program to be linked with the CADNA library, round-off errors on complex variables cannot be estimated. Furthermore array functions such as product or sum must not be used. Only the arithmetic operators and the abs, min, max and sqrt functions can be used for arrays.Running time:The version of a code which uses CADNA runs at least three times slower than its floating-point version. This cost depends on the computer architecture and can be higher if the detection of numerical instabilities is enabled. In this case, the cost may be related to the number of instabilities detected.References:

[1]: The CADNA library, URL address: http://www.lip6.fr/cadna.
[2]: J.-M. Chesneaux, L'arithmétique Stochastique et le Logiciel CADNA, Habilitation á diriger des recherches, Université Pierre et Marie Curie, Paris, 1995.
[3]: J. Vignes, A stochastic arithmetic for reliable scientific computation, Math. Comput. Simulation 35 (1993) 233-261.
[4]: J. Vignes, Discrete stochastic arithmetic for validating results of numerical software, Numer. Algorithms 37 (2004) 377-390.

相似文献

14.

Efficient fitting algorithms applied to analysis of coincidence γ-ray spectra

M. Morhá? J. Kliman M. Jandel V. Matoušek J.H. Hamilton 《Computer Physics Communications》2005,172(1):19-41

In the paper efficient nonlinear fitting algorithms without matrix inversion are described. The algorithms were applied to the analysis of two- and three-fold coincidence γ-ray spectra. They were used to process coincidence matrices from fission data from the multidetector GAMMASPHERE spectrometer. 相似文献

15.

Performance of the 3D FFT on the 6D network torus QCDOC parallel supercomputer

Bin Fang Glenn Martyna 《Computer Physics Communications》2007,176(8):531-538

QCDOC is a massively parallel supercomputer with tens of thousands of nodes distributed on a six-dimensional torus network. The 6D structure of the network provides the needed communication resources for many communication-intensive applications. In this paper, we present a parallel algorithm for three-dimensional Fast Fourier Transform and its implementation for a 4096-node QCDOC prototype. Two techniques have been used to increase its parallel performance: simultaneous multi-dimensional communication and communication-and-computation overlapping. Benchmarking experiments suggest that 3D FFTs of size 128×128×128 can scale well on such platforms up to 4096 nodes. Our performance results suggest stronger scalability on QCDOC than on IBM BlueGene/L supercomputer. 相似文献

16.

Efficient DNA sticker algorithms for NP-complete graph problems 总被引：1，自引：0，他引：1

Karl-Heinz Zimmermann 《Computer Physics Communications》2002,144(3):297-309

Adleman's successful solution of a seven-vertex instance of the NP-complete Hamiltonian directed path problem by a DNA algorithm initiated the field of biomolecular computing. We provide DNA algorithms based on the sticker model to compute all k-cliques, independent k-sets, Hamiltonian paths, and Steiner trees with respect to a given edge or vertex set. The algorithms determine not merely the existence of a solution but yield all solutions (if any). For an undirected graph with n vertices and m edges, the running time of the algorithms is linear in n+m. For this, the sticker algorithms make use of small combinatorial input libraries instead of commonly used large libraries. The described algorithms are entirely theoretical in nature. They may become very useful in practice, when further advances in biotechnology lead to an efficient implementation of the sticker model. 相似文献

17.

Performance of a Lattice Quantum Chromodynamics kernel on the Cell processor

J. Spray A. Trew 《Computer Physics Communications》2008,179(9):642-646

The implementation of a proof-of-concept Lattice Quantum Chromodynamics kernel on the Cell processor is described in detail, illustrating issues encountered in the porting process. The resulting code performs up to 45 GFlop/s per socket (without inter-node parallel communications), indicating that the Cell processor is likely to be a good platform for future Lattice QCD calculations. 相似文献

18.

An efficient parallel implementation of the smooth particle mesh Ewald method for molecular dynamics simulations

Kwang Jin Oh Yuefan Deng 《Computer Physics Communications》2007,177(5):426-431

This paper focuses on the implementation and the performance analysis of a smooth particle mesh Ewald method on several parallel computers. We present the details of the algorithms and our implementation that are used to optimize parallel efficiency on such parallel computers. 相似文献

19.

A finite-difference eigenvalue algorithm for calculating the band structure of a photonic crystal

Linfang ShenSailing He Sanshui Xiao 《Computer Physics Communications》2002,143(3):213-221

A new method based on a finite difference of the governing differential equation for the eigenvalue problem is introduced to calculate the band structure of a two-dimensional photonic crystal. The effective medium technique is also used in the method. The problem is reduced to a standard matrix eigenvalue problem. Compared to the conventional plane wave expansion method, the present method improves the convergence of the solution and thus is a fast and accurate algorithm for calculating the band structure of a photonic crystal. 相似文献

20.

Theory of dynamic entropy-operator systems and its applications

Yu. S. Popkov 《Automation and Remote Control》2006,67(6):900-926

The phenomenological and mathematical definitions of a class of dynamic systems with an entropy operator are formulated. Dynamic systems with an entropy operator are classified and the main theoretical results pertaining to the properties of entropy operators and these dynamic systems are studied within this classification. By way of examples, restoration of monochromatic images and modeling of labor market are examined. 相似文献