首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
We discuss the design and implementation of HYDRA_OMP a parallel implementation of the Smoothed Particle Hydrodynamics-Adaptive P3M (SPH-AP3M) code HYDRA. The code is designed primarily for conducting cosmological hydrodynamic simulations and is written in Fortran77+OpenMP. A number of optimizations for RISC processors and SMP-NUMA architectures have been implemented, the most important optimization being hierarchical reordering of particles within chaining cells, which greatly improves data locality thereby removing the cache misses typically associated with linked lists. Parallel scaling is good, with a minimum parallel scaling of 73% achieved on 32 nodes for a variety of modern SMP architectures. We give performance data in terms of the number of particle updates per second, which is a more useful performance metric than raw MFlops. A basic version of the code will be made available to the community in the near future.  相似文献   

2.
The search for turbulent-like patterns in nonlinear gravitational clustering has recently advanced due to N-body simulations based on the cold dark matter scenario. In this work we present a computational statistical analysis of the formation of galaxy halos by gravitational collapse in N-body simulation from the Virgo Consortium Data. We find that rescaled data points of gravitational energy for different redshifts collapse into similar patterns, well approximated by a Generalized Extreme Value (GEV) distribution. Once similar statistical behavior was found for chaotic advection, this result is discussed in the context of non-dissipative turbulent-like behavior. From our analysis the unstable gravity field itself behaves as a chaotic advecting flow where the particles (galaxies) can be interpreted as turbulent tracers.  相似文献   

3.
We present a parallel code for the analysis of sequences of light curves of magnetically active close binaries with brightness inhomogeneities on the surfaces of their component stars. The procedure allows us to search for the best values of the photometric parameters of the binary system as well as to obtain maps of the brightness inhomogeneities regularized by means of the Maximum Entropy and Tikhonov methods. The large amount of computational work is managed by means of a parallel application based on MPI. The code has been made available through the web-based portal Astrocomp (http://www.astrocomp.it) that allows a registered remote user to run it on a set of high-performance computing resources in a completely transparent manner.  相似文献   

4.
We have developed a new method for the fast computation of wavelengths and oscillator strengths for medium-Z atoms and ions, up to iron, at neutron star magnetic field strengths. The method is a parallelized Hartree-Fock approach in adiabatic approximation based on finite-element and B-spline techniques. It turns out that typically 15-20 finite elements are sufficient to calculate energies to within a relative accuracy of 10−5 in 4 or 5 iteration steps using B-splines of 6th order, with parallelization speed-ups of 20 on a 26-processor machine. Results have been obtained for the energies of the ground states and excited levels and for the transition strengths of astrophysically relevant atoms and ions in the range Z=2…26 in different ionization stages.

Program summary

Program title: HFFEMCatalogue identifier: AECC_v1_0Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AECC_v1_0.htmlProgram obtainable from: CPC Program Library, Queen's University, Belfast, N. IrelandLicensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.htmlNo. of lines in distributed program, including test data, etc.: 3845No. of bytes in distributed program, including test data, etc.: 27 989Distribution format: tar.gzProgramming language: MPI/Fortran 95 and PythonComputer: Cluster of 1-26 HP Compaq dc5750Operating system: Fedora 7Has the code been vectorised or parallelized?: YesRAM: 1 GByteClassification: 2.1External routines: MPI/GFortran, LAPACK, PyLab/MatplotlibNature of problem: Calculations of synthetic spectra [1] of strongly magnetized neutron stars are bedevilled by the lack of data for atoms in intense magnetic fields. While the behaviour of hydrogen and helium has been investigated in detail (see, e.g., [2]), complete and reliable data for heavier elements, in particular iron, are still missing. Since neutron stars are formed by the collapse of the iron cores of massive stars, it may be assumed that their atmospheres contain an iron plasma. Our objective is to fill the gap and to provide a program which allows users to calculate as comprehensively as possible energies, wavelengths, and oscillator strengths of medium-Z atoms and ions up to Z=26 in neutron star magnetic field strengths. Obviously, the method for achieving this goal must be highly efficient since for the calculation of synthetic spectra data of many thousands or even millions of atomic transitions may be required.Solution method: As in previous work on the problem (cf. [3,7]) we exploit the fact that a strong magnetic field results in an approximate decoupling of the dynamics of the electrons parallel and perpendicular to the field. In this adiabatic approximation the single-particle wave functions take the form: ψi(ρ,φ,z)=?nm(ρ,φ)⋅Pnmν(z), where ?nm(ρ,φ) are Landau wave functions, describing the (fast) motion perpendicular to the field, and the Pnmν(z) are the longitudinal wave functions, describing the (slow) bound motion along the direction of the field. The spins of the electrons are all aligned antiparallel to the magnetic field and need not be accounted for explicitly. The total N-electron wave function is constructed as a Slater determinant of the single-particle wave functions, and the unknown longitudinal wave functions are determined from the Hartree-Fock equations, which follow from inserting the total N-electron wave function into Schrödinger's variational principle for the total energy. The novel feature of our approach [8] is to use finite-element and B-spline techniques to solve the Hartree-Fock equations for atoms in strong magnetic fields. This is accomplished through the following steps: 1) decomposition of the z-axis into finite elements with quadratically widening element borders; 2) sixth-order B-spline expansion of the single-particle wave functions on the individual finite elements; 3) formulation of the variational principle equivalent to the Hartree-Fock equations in terms of the expansion coefficients. This leads to a simple system of linear equations for the expansion coefficients, which is solved numerically, and, since the direct and exchange interaction potential terms depend on the wave functions, in a self consistent way. The iteration procedure is initialized by distributing the electrons on magnetic sublevels according to the level scheme of the hydrogen atom in intense magnetic fields. To speed up the calculations, the code is parallelized. The parallelization strategy is: a) each processor calculates one or several electrons, depending on the total number of processors, b) single-particle wave functions are broadcast from each processor to every other processor. As the coefficient vectors in the B-spline basis are small (dim≈20-25), there is only little communication between the nodes. Typical speedups by a factor of 20 are obtained on a 26-processor cluster of HP Compaq dc57750.Running time: The test runs provided only require a few seconds using 2 processors.References:[1] K. Werner, S. Dreizler, The classical stellar atmosphere problem, in: H. Riffert, K. Werner (Eds.), Computational Astrophysics, Computational and Applied Mathematics, Elsevier, 1998.[2] H. Ruder, G. Wunner, H. Herold, F. Geyer, Atoms in Strong Magnetic Fields, Springer, Heidelberg, 1994.[3] P.B. Jones, Mon. Not. R. Astron. Soc. 216 (1985) 503.[4] D. Neuhauser, K. Langanke, S.E. Koonin, Phys. Rev. A 33 (1986) 2084.[5] M.C. Miller, D. Neuhauser, Mon. Not. R. Astron. Soc. 253 (1991) 107.[6] M. Rajagopal, R.W. Romani, M.C. Miller, Astrophys. J. 479 (1997) 347.[7] K. Mori, C.J. Hailey, Astrophys. J. 564 (2002) 914.[8] M. Klews, Discretization methods for the investigation of atoms in time dependent electric fields, and in extremely strong magnetic fields (in German), Doctoral Thesis, University of Tübingen, 2003, http://www.theo1.physik.unistuttgart. de/forschung/sfb382a15/klews2003.ps.gz.  相似文献   

5.
Laminar flows through channels, pipes and between two coaxial cylinders are of significant practical interest because they often appear in a wide range of industrial, environmental, and biological processes. Discrete particle modeling has increasingly been used in recent years and in this study we examined two of these methods: dissipative particle dynamics (DPD) and smoothed particle hydrodynamics (SPH) method when applied to (a) time-dependent, plane Poiseuille flow and (b) flow between two coaxial cylinders at low Reynolds numbers. The two examples presented in this paper give insight into different features of the two discrete particle methods. It was found that both methods give results with high accuracy, but CPU time is much larger (of order 102–103 in the second example) for DPD than for SPH model. This difference is due to the fact that the number of time steps for the DPD model is much greater than for the SPH model (since thermal fluctuations are taken into account in the DPD model).  相似文献   

6.
A program for calculating the semi-classic transport coefficients is described. It is based on a smoothed Fourier interpolation of the bands. From this analytical representation we calculate the derivatives necessary for the transport distributions. The method is compared to earlier calculations, which in principle should be exact within Boltzmann theory, and a very convincing agreement is found.

Program summary

Title of program:BoltzTraPCatalogue identifier:ADXU_v1_0Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADXU_v1_0Program obtainable from: CPC Program Library, Queen's University of Belfast, N. IrelandLicensing provisions:noneProgramming language used:Fortran 90Computer:The program should work on any system with a F90 compiler. The code has been tested with the Intel Fortran compilerOperating system:Unix/LinuxRAM:bytes up to 2 GB for low symmetry, small unit cell structuresNo. of lines in distributed program, including test data, etc.:1 534 213No. of bytes in distributed program, including test data, etc.:27 473 227Distribution format:tar.gzExternal routines:The LaPack and Blas libraries are neededNature of problem:Analytic expansion of energy-bands. Calculation of semi-classic integrals.Solution method:Smoothed Fourier expansion of bands.Running time:Up to 3 hours for low symmetry, small unit cell structures.  相似文献   

7.
Smoothed particle hydrodynamics (SPH) has become increasingly important during recent decades. Its meshless nature, inherent representation of convective transport and ability to simulate free surface flows make SPH particularly promising with regard to simulations of industrial mixing devices for high-viscous fluids, which often have complex rotating geometries and partially filled regions (e.g., twin-screw extruders). However, incorporating the required geometries remains a challenge in SPH since the most obvious and most common ways to model solid walls are based on particles (i.e., boundary particles and ghost particles), which leads to complications with arbitrarily-curved wall surfaces. To overcome this problem, we developed a systematic method for determining an adequate interaction between SPH particles and a continuous wall surface based on the underlying SPH equations. We tested our new approach by using the open-source particle simulator “LIGGGHTS” and comparing the velocity profiles to analytical solutions and SPH simulations with boundary particles. Finally, we followed the evolution of a tracer in a twin-cam mixer during the rotation, which was experimentally and numerically studied by several other authors, and ascertained good agreement with our results. This supports the validity of our newly-developed wall interaction method, which constitutes a step forward in SPH simulations of complex geometries.  相似文献   

8.
We describe a parallel lattice-Boltzmann code for efficient simulation of fluid flow in complex geometries. The lattice-Boltzmann model and the structure of the code are discussed. The fluid solver is highly optimized and the resulting computational core is very fast. Furthermore, communication is minimized and the novel topology-aware domain decomposition technique is shown to be very effective for large systems, allowing us to tune code execution in geographically distributed cross-site simulations. The benchmarks presented indicate that very high performance can be achieved.  相似文献   

9.
Smoothed particle hydrodynamics (SPH) is a fully Lagrangian meshless computational method for solving the fluid dynamics equations. In recent years, it has also been employed to solve the shallow water equations (SWEs) and promising results have been obtained. However, SPH models are computationally very demanding and the SPH-SWE models considered in this work have no exception. In this paper, the Graphic Processing Units (GPUs) are explored to accelerate an SPH-SWE model for wider applications. Unlike Central Processing Units (CPUs), GPUs are highly parallelized, which makes it suitable for accelerating scientific computing algorithms like SPH. The aim is to design a GPU-based SPH model for solving the two-dimensional SWEs with variable smoothing lengths. Furthermore, a quad-tree neighbour searching method is implemented to further optimize the model performance. An idealized benchmark test and two real-world dam-break cases have been simulated to demonstrate the superior performance of the current GPU-accelerated high-performance SPH-SWE model.  相似文献   

10.
Particle-in-cell simulations often suffer from load-imbalance on parallel machines due to the competing requirements of the field-solve and particle-push computations. We propose a new algorithm that balances the two computations independently. The grid for the field-solve computation is statically partitioned. The particles within a processor's sub-domain(s) are dynamically balanced by migrating spatially-compact groups of particles from heavily loaded processors to lightly loaded ones as needed. The algorithm has been implemented in the quicksilver electromagnetic particle-in-cell code. We provide details of the implementation and present performance results for quicksilver running models with up to a billion grid cells and particles on thousands of processors of a large distributed-memory parallel machine.  相似文献   

11.
A new modular code called BOUT++ is presented, which simulates 3D fluid equations in curvilinear coordinates. Although aimed at simulating Edge Localised Modes (ELMs) in tokamak x-point geometry, the code is able to simulate a wide range of fluid models (magnetised and unmagnetised) involving an arbitrary number of scalar and vector fields, in a wide range of geometries. Time evolution is fully implicit, and 3rd-order WENO schemes are implemented. Benchmarks are presented for linear and non-linear problems (the Orszag-Tang vortex) showing good agreement. Performance of the code is tested by scaling with problem size and processor number, showing efficient scaling to thousands of processors.Linear initial-value simulations of ELMs using reduced ideal MHD are presented, and the results compared to the ELITE linear MHD eigenvalue code. The resulting mode-structures and growth-rate are found to be in good agreement (γBOUT++=0.245ωA, γELITE=0.239ωA, with Alfvénic timescale 1/ωA=R/VA). To our knowledge, this is the first time dissipationless, initial-value simulations of ELMs have been successfully demonstrated.  相似文献   

12.
13.
A massively parallel simulation code, called dHybrid, has been developed to perform global scale studies of space plasma interactions. This code is based on an explicit hybrid model; the numerical stability and parallel scalability of the code are studied. A stabilization method for the explicit algorithm, for regions of near zero density, is proposed. Three-dimensional hybrid simulations of the interaction of the solar wind with unmagnetized artificial objects are presented, with a focus on the expansion of a plasma cloud into the solar wind, which creates a diamagnetic cavity and drives the Interplanetary Magnetic Field out of the expansion region. The dynamics of this system can provide insights into other similar scenarios, such as the interaction of the solar wind with unmagnetized planets.  相似文献   

14.
FLY is a parallel treecode which makes heavy use of the one-sided communication paradigm to handle the management of the tree structure. In its public version the code implements the equations for cosmological evolution, and can be run for different cosmological models.This reference guide describes the actual implementation of the algorithms of the public version of FLY, and suggests how to modify them to implement other types of equations (for instance, the Newtonian ones).

Program summary

Title of program:FLYCatalogue identifier: ADSCProgram summary URL:http://cpc.cs.qub.ac.uk/summaries/ADSCProgram obtainable from: CPC Program Library, Queen's University of Belfast, N. IrelandComputer for which the program is designed and others on which it has been tested: Cray T3E, Sgi Origin 3000, IBM SPOperating systems or monitors under which the program has been tested: Unicos 2.0.5.40, Irix 6.5.14, Aix 4.3.3Programming language used: Fortran 90, CMemory required to execute with typical data: about 100 Mwords with 2 million-particlesNumber of bits in a word: 32Number of processors used: parallel program. The user can select the number of processors ?1Has the code been vectorized or parallelized?: parallelizedNumber of bytes in distributed program, including test data, etc.: 4 615 604Distribution format: tar gzip fileKeywords: Parallel tree N-body code for cosmological simulationsNature of physical problem:FLY is a parallel collisionless N-body code for the calculation of the gravitational force.Method of solution: It is based on the hierarchical oct-tree domain decomposition introduced by Barnes and Hut (1986).Restrictions on the complexity of the program: The program uses the leapfrog integrator schema, but could be changed by the user.Typical running time: 50 seconds for each time-step, running a 2-million-particles simulation on an Sgi Origin 3800 system with 8 processors having 512 Mbytes RAM for each processor.Unusual features of the program:FLY uses the one-side communications libraries: the SHMEM library on the Cray T3E system and Sgi Origin system, and the LAPI library on IBM SP system  相似文献   

15.
Many two-dimensional incompressible inviscid vortex flows can be simulated very efficiently by means of the contour dynamics method. Several applications require the use of a hierarchical-element method (HEM), which is a modified version of the classical contour dynamics scheme based on the fast multipole method. The HEM can be used, for example, to study the large-scale motion of coherent structures in idealized geophysical fluid dynamics where the flow can be modelled as the motion in a thin layer of fluid in the presence of a non-uniform background rotation. Nevertheless, such simulations require a substantial computational effort, even when the HEM is used. In this article it is shown that the acceleration of contour dynamics simulations can be increased further by parallelizing the HEM algorithm. Speed-up, load balance and scalability are parallel performance features which are studied for several representative cases. The HEM has been parallelized using the OpenMP interface and is tested with up to 16 processors on an Origin 3800 CC-NUMA computer.  相似文献   

16.
This paper focuses on the implementation and the performance analysis of a smooth particle mesh Ewald method on several parallel computers. We present the details of the algorithms and our implementation that are used to optimize parallel efficiency on such parallel computers.  相似文献   

17.
We report on a package of routines for the computer algebra system Maple which supports the explicit determination of the geometric quantities, field equations, equations of motion, and conserved quantities of General Relativity in the post-Newtonian approximation. The package structure is modular and allows for an easy modification by the user. The set of routines can be used to verify hand calculations or to generate the input for further numerical investigations.

Program summary

Title of the program:ProcrustesCatalogue identifier:ADYH_v1_0Program summary URL:http://cpc.cs.qub.ac.uk/ADYH_v1_0Program obtainable from:CPC Program Library, Queen's University of Belfast, N. IrelandComputers:Platforms supported by the Maple computer algebra system (program was written under Maple 8, but also tested with Maple 9, 9.5, 10)Operating systems under which the program has been tested:Linux, Unix, Windows XPProgramming language used:Maple internal languageMemory required to execute typical problem:Dependent on problem (small ∼ couple of MBytes, large ∼ several GBytes)Classification:1.5 Relativity and Gravitation, 5 Computer AlgebraNo. bits in a word:Dependent on Maple distribution (supports 32 bit and 64 bit platforms)No. of processors used:1No. of lines in distributed program, including test data, etc.: 10 881No. of bytes in distributed program, including test data, etc.:47 743Distribution format:tar.gzNature of the physical problem:The post-Newtonian approximation represents an approximative scheme frequently used in General Relativity in which the gravitational potential is expanded into a series in inverse powers of the speed of light. Depending on the desired approximation level the field equations and equations of motion have to be determined up to given orders in the speed of light. This usually requires large algebraic computations due to the geometrical quantities entering the field equations and equations of motion.Method of solution:Automated computation using computer algebra techniques. Program has modular structure and only makes use of basic features of Maple to guarantee maximum compatibility and to allow for rapid extensions/modifications by the user.Typical running time:Dependent on problem (small ∼ couple of minutes, large ∼ couple of hours).Restrictions on the complexity of the problem:Sufficient amount of memory is the limiting factor for these calculations. The structure of the program allows one to handle large scale problems in an iterative manner to minimize the amount of memory required.  相似文献   

18.
The development of a basic scalable preprocessing tool is the key routine to accelerate the entire computational fluid dynamics (CFD) workflow toward the exascale computing era. In this work, a parallel preprocessing tool, called ParTransgrid, is developed to translate the general grid format like CFD General Notation System into an efficient distributed mesh data format for large-scale parallel computing. Through ParTransgrid, a flexible face-based parallel unstructured mesh data structure designed in Hierarchical Data Format can be obtained to support various cell-centered unstructured CFD solvers. The whole parallel preprocessing operations include parallel grid I/O, parallel mesh partition, and parallel mesh migration, which are linked together to resolve the run-time and memory consumption bottlenecks for increasingly large grid size problems. An inverted index search strategy combined with a multi-master-slave communication paradigm is proposed to improve the pairwise face matching efficiency and reduce the communication overhead when constructing the distributed sparse graph in the phase of parallel mesh partition. And we present a simplified owner update rule to fast the procedure of raw partition boundaries migration and the building of shared faces/nodes communication mapping list between new sub-meshes with an order of magnitude of speed-up. Experiment results reveal that ParTransgrid can be easily scaled to billion-level grid CFD applications, the preparation time for parallel computing with hundreds of thousands of cores is reduced to a few minutes.  相似文献   

19.
We present the Fortran code SuSpect version 2.3, which calculates the Supersymmetric and Higgs particle spectrum in the Minimal Supersymmetric Standard Model (MSSM). The calculation can be performed in constrained models with universal boundary conditions at high scales such as the gravity (mSUGRA), anomaly (AMSB) or gauge (GMSB) mediated supersymmetry breaking models, but also in the non-universal MSSM case with R-parity and CP conservation. Care has been taken to treat important features such as the renormalization group evolution of parameters between low and high energy scales, the consistent implementation of radiative electroweak symmetry breaking and the calculation of the physical masses of the Higgs bosons and supersymmetric particles taking into account the dominant radiative corrections. Some checks of important theoretical and experimental features, such as the absence of non-desired minima, large fine-tuning in the electroweak symmetry breaking condition, as well as agreement with precision measurements can be performed. The program is simple to use, self-contained and can easily be linked to other codes; it is rather fast and flexible, thus allowing scans of the parameter space with several possible options and choices for model assumptions and approximations.

Program summary

Title of program:SuSpectCatalogue identifier:ADYR_v1_0Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADYR_v1_0Program obtainable from: CPC Program Library, Queen's University of Belfast, N. IrelandLicensing provisions:noneProgramming language used:FORTRAN 77Computer:Unix machines, PCNo. of lines in distributed program, including test data, etc.:21 821No. of bytes in distributed program, including test data, etc.:249 657Distribution format:tar.gzOperating system:Unix (or Linux)RAM:approximately 2500 KbytesNumber of processors used:1 processorNature of problem:SuSpect calculates the supersymmetric and Higgs particle spectrum (masses and some other relevant parameters) in the unconstrained Minimal Supersymmetric Standard Model (MSSM), as well as in constrained models (cMSSMs) such as the minimal Supergravity (mSUGRA), the gauge mediated (GMSB) and anomaly mediated (AMSB) Supersymmetry breaking scenarii. The following features and ingredients are included: renormalization group evolution between low and high energy scales, consistent implementation of radiative electroweak symmetry breaking, calculation of the physical particle masses with radiative corrections at the one- and two-loop level.Solution method:The main methods used in the code are: (1) an (adaptative fourth-order) Runge-Kutta type algorithm (following a standard algorithm described in “Numerical Recipes”), used to solve numerically a set of coupled differential equations resulting from the renormalization group equations at the two-loop level of the perturbative expansions; (2) diagonalizations of mass matrices; (3) some mathematical (Spence, etc) functions resulting from the evaluation of one and two-loop integrals using the Feynman graphs techniques for radiative corrections to the particle masses; (4) finally, some fixed-point iterative algorithms to solve non-linear equations for some of the relevant output parameters.Restrictions:(1) The code is limited at the moment to real input parameters. (2) It also does not include flavor non-diagonal terms which are possible in the most general soft supersymmetry breaking Lagrangian. (3) There are some (mild) limitations on the possible range of values of input parameter, i.e. not any arbitrary values of some input parameters are allowed: these limitations are essentially based on physical rather than algorithmic issues, and warning flags and other protections are installed to avoid as much as possible execution failure if unappropriate input values are used.Running time:between 1 and 3 seconds depending on options, with a 1 GHz processor.  相似文献   

20.
A scalable parallel algorithm has been designed to perform multimillion-atom molecular dynamics (MD) simulations, in which first principles-based reactive force fields (ReaxFF) describe chemical reactions. Environment-dependent bond orders associated with atomic pairs and their derivatives are reused extensively with the aid of linked-list cells to minimize the computation associated with atomic n-tuple interactions (n?4 explicitly and ?6 due to chain-rule differentiation). These n-tuple computations are made modular, so that they can be reconfigured effectively with a multiple time-step integrator to further reduce the computation time. Atomic charges are updated dynamically with an electronegativity equalization method, by iteratively minimizing the electrostatic energy with the charge-neutrality constraint. The ReaxFF-MD simulation algorithm has been implemented on parallel computers based on a spatial decomposition scheme combined with distributed n-tuple data structures. The measured parallel efficiency of the parallel ReaxFF-MD algorithm is 0.998 on 131,072 IBM BlueGene/L processors for a 1.01 billion-atom RDX system.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号