首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
To complete the 2DRMP package an asymptotic program, such as FARM, is needed. The original version of FARM is designed to construct the physical R-matrix, R, from surface amplitudes contained in the H-file. However, in 2DRMP, R has already been constructed for each scattering energy during propagation. Therefore, this modified version of FARM, known as FARM_2DRMP, has been developed solely for use with 2DRMP.

New version program summary

Program title: FARM_2DRMPCatalogue identifier: ADAZ_v1_1Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADAZ_v1_1.htmlProgram obtainable from: CPC Program Library, Queen's University, Belfast, N. IrelandLicensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.htmlNo. of lines in distributed program, including test data, etc.: 13 806No. of bytes in distributed program, including test data, etc.: 134 462Distribution format: tar.gzProgramming language: Fortran 95 and MPIComputer: Tested on CRAY XT4 [1]; IBM eServer 575 [2]; Itanium II cluster [3]Operating system: Tested on UNICOS/lc [1]; IBM AIX [2]; Red Hat Linux Enterprise AS [3]Has the code been vectorized or parallelized?: Yes. 16 cores were used for the small test runClassification: 2.4External routines: BLAS, LAPACKDoes the new version supersede the previous version?: NoNature of problem: The program solves the scattering problem in the asymptotic region of R-matrix theory where exchange is negligible.Solution method: A radius is determined at which the wave function, calculated as a Gailitis expansion [4] with accelerated summing [5] over terms, converges. The R-matrix is propagated from the boundary of the internal region to this radius and the K-matrix calculated. Collision strengths or cross sections may be calculated.Reasons for new version: To complete the 2DRMP package [6] an asymptotic program, such as FARM [7], is needed. The original version of FARM is designed to construct the physical R-matrix, R, from surface amplitudes contained in the H-file. However, in 2DRMP, R, has already been constructed for each scattering energy during propagation and each R is stored in one of the RmatT files described in Fig. 8 of [6]. Therefore, this modified version of FARM, known as FARM_2DRMP, has been developed solely for use with 2DRMP. Instructions on its use and corresponding test data is provided with 2DRMP [6].Summary of revisions: FARM_2DRMP contains two codes, farm.f and farm_par.f90. The former is a serial code while the latter is a parallel F95 code that employs an MPI harness to enable the nenergy energies to be computed simultaneously across ncore cores, with each core processing either ⌊nenergy/ncore⌋ or ⌈nenergy/ncore⌉ energies. The input files, input.d and H, and the output file farm.out are as described in [7]. Both codes read R directly from RmatT.Restrictions: FARM_2DRMP is for use solely with 2DRMP and for a specified L,S and Π combination. The energy range specified in input.d must match that specified in energies.data.Running time: The wall clock running time for the small test run using 16 cores and performed on [3] is 9 secs.References:
  • [1] 
    HECToR, CRAY XT4 running UNICOS/lc, http://www.hector.ac.uk/, visited 22 July, 2009.
  • [2] 
    HPCx, IBM eServer 575 running IBM AIX, http://www.hpcx.ac.uk/, visited 22 July, 2009.
  • [3] 
    HP Cluster, Itanium II cluster running Red Hat Linux Enterprise AS, Queen's University Belfast, http://www.qub.ac.uk/directorates/InformationServices/Research/HighPerformanceComputing/Services/Hardware/HPResearch/, visited 22 July, 2009.
  • [4] 
    M. Gailitis, J. Phys. B 9 (1976) 843.
  • [5] 
    C.J. Noble, R.K. Nesbet, Comput. Phys. Comm. 33 (1984) 399.
  • [6] 
    N.S. Scott, M.P. Scott, P.G. Burke, T. Stitt, V. Faro-Maza, C. Denis, A. Maniopoulou, Comput. Phys. Comm. 180 (12) (2009) 2424–2449, this issue.
  • [7] 
    V.M. Burke, C.J. Noble, Comput. Phys. Comm. 85 (1995) 471.
  相似文献   

2.
3.
We discuss a program suite for simulating Quantum Chromodynamics on a 4-dimensional space–time lattice. The basic Hybrid Monte Carlo algorithm is introduced and a number of algorithmic improvements are explained. We then discuss the implementations of these concepts as well as our parallelisation strategy in the actual simulation code. Finally, we provide a user guide to compile and run the program.

Program summary

Program title: tmLQCDCatalogue identifier: AEEH_v1_0Program summary URL::http://cpc.cs.qub.ac.uk/summaries/AEEH_v1_0.htmlProgram obtainable from: CPC Program Library, Queen's University, Belfast, N. IrelandLicensing provisions: GNU General Public Licence (GPL)No. of lines in distributed program, including test data, etc.: 122 768No. of bytes in distributed program, including test data, etc.: 931 042Distribution format: tar.gzProgramming language: C and MPIComputer: anyOperating system: any with a standard C compilerHas the code been vectorised or parallelised?: Yes. One or optionally any even number of processors may be used. Tested with up to 32 768 processorsRAM: no typical values availableClassification: 11.5External routines: LAPACK [1] and LIME [2] libraryNature of problem: Quantum ChromodynamicsSolution method: Markov Chain Monte Carlo using the Hybrid Monte Carlo algorithm with mass preconditioning and multiple time scales [3]. Iterative solver for large systems of linear equations.Restrictions: Restricted to an even number of (not necessarily mass degenerate) quark flavours in the Wilson or Wilson twisted mass formulation of lattice QCD.Running time: Depending on the problem size, the architecture and the input parameters from a few minutes to weeks.References:
  • [1] 
    http://www.netlib.org/lapack/.
  • [2] 
    USQCD, http://usqcd.jlab.org/usqcd-docs/c-lime/.
  • [3] 
    C. Urbach, K. Jansen, A. Shindler, U. Wenger, Comput. Phys. Commun. 174 (2006) 87, hep-lat/0506011.
  相似文献   

4.
Non-charge conserving current collection algorithms for electromagnetic PIC plasma simulations may cause errors in Gauss's law. These errors arise from violations of the charge continuity equation, ∇ · J = −?π/?t, which is turn cause errors in the irrotational part of E.Two techniques for reducing these errors are examined and compared: a modified Marder correction which corrects electric fields locally and primarily affects short wavelengths, and a Boris divergence correction, which solves Poisson's equation to correct the electric fields so that Gauss's law is enforced globally. The effect of each method on the spectrum of the error is examined. Computational efficiency and accuracy of the two techniques are compared: neither method is clearly superior.Cases examined include corrections in electromagnetic relativistic beam simulations, and a hot thermal plasma. In addition, the spectral comparison provides insight into the behavior of the schemes applied.  相似文献   

5.
BSR is a general program to calculate atomic continuum processes using the B-spline R-matrix method, including electron-atom and electron-ion scattering, and radiative processes such as bound-bound transitions, photoionization and polarizabilities. The calculations can be performed in LS-coupling or in an intermediate-coupling scheme by including terms of the Breit-Pauli Hamiltonian.

New version program summary

Title of program: BSRCatalogue identifier: ADWYProgram summary URL:http://cpc.cs.qub.ac.uk/summaries/ADWYProgram obtainable from: CPC Program Library, Queen's University of Belfast, N. IrelandComputers on which the program has been tested: Microway Beowulf cluster; Compaq Beowulf cluster; DEC Alpha workstation; DELL PCOperating systems under which the new version has been tested: UNIX, Windows XPProgramming language used: FORTRAN 95Memory required to execute with typical data: Typically 256-512 Mwords. Since all the principal dimensions are allocatable, the available memory defines the maximum complexity of the problemNo. of bits in a word: 8No. of processors used: 1Has the code been vectorized or parallelized?: noNo. of lines in distributed program, including test data, etc.: 69 943No. of bytes in distributed program, including test data, etc.: 746 450Peripherals used: scratch disk store; permanent disk storeDistribution format: tar.gzNature of physical problem: This program uses the R-matrix method to calculate electron-atom and electron-ion collision processes, with options to calculate radiative data, photoionization, etc. The calculations can be performed in LS-coupling or in an intermediate-coupling scheme, with options to include Breit-Pauli terms in the Hamiltonian.Method of solution: The R-matrix method is used [P.G. Burke, K.A. Berrington, Atomic and Molecular Processes: An R-Matrix Approach, IOP Publishing, Bristol, 1993; P.G. Burke, W.D. Robb, Adv. At. Mol. Phys. 11 (1975) 143; K.A. Berrington, W.B. Eissner, P.H. Norrington, Comput. Phys. Comm. 92 (1995) 290].  相似文献   

6.
7.
FERM3D is a three-dimensional finite element program, for the elastic scattering of a low energy electron from a general polyatomic molecule, which is converted to a potential scattering problem. The code is based on tricubic polynomials in spherical coordinates. The electron-molecule interaction is treated as a sum of three terms: electrostatic, exchange, and polarization. The electrostatic term can be extracted directly from ab initio codes (GAUSSIAN 98 in the work described here), while the exchange term is approximated using a local density functional. A local polarization potential based on density functional theory [C. Lee, W. Yang, R.G. Parr, Phys. Rev. B 37 (1988) 785] describes the long range attraction to the molecular target induced by the scattering electron. Photoionization calculations are also possible and illustrated in the present work. The generality and simplicity of the approach is important in extending electron-scattering calculations to more complex targets than it is possible with other methods.

Program summary

Title of program:FERM3DCatalogue identifier:ADYL_v1_0Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADYL_v1_0Program obtainable from: CPC Program Library, Queen's University of Belfast, N. IrelandComputer for which the program is designed and others on which it has been tested:Intel Xeon, AMD Opteron 64 bit, Compaq AlphaOperating systems or monitors under which the program has been tested:HP Tru64 Unix v5.1, Red Hat Linux Enterprise 3Programming language used:Fortran 90Memory required to execute with typical data:900 MB (neutral CO2), 2.3 GB (ionic CO2), 1.4 GB (benzene)No. of bits in a word:32No. of processors used:1Has the code been vectorized?:NoNo. of lines in distributed program, including test data, etc.:58 383No. of bytes in distributed program, including test data, etc.:561 653Distribution format:tar.gzip fileCPC Program library subprograms used:ADDA, ACDPNature of physical problem:Scattering of an electron from a complex polyatomic molecular target.Method of solution:Solution of a partial differential equation using a finite element basis, and direct sparse linear solvers.Restrictions on the complexity of the problem:Memory constraints.Typical running time:2 hours.Unusual features of the program:
very extensive use of memory,
requires installation of Lapack, Blas, a direct sparse solver library (SuperLU, freely available, or Pardiso, which requires a license, but is free of charge for academic use), and optionally the Cernlib and Arpack libraries, freely available,
requires input from quantum chemistry programs (Gaussian, Molpro or PC Gamess).
  相似文献   

8.
We describe a code which utilizes partial-wave amplitudes to calculate a variety of physical quantities studied in electron-atom scattering. For elastic scattering and excitation of atoms with arbitrary angular momenta in collisions with spin-polarized and unpolarized electrons, the program can calculate angle-integrated and angle-differential cross sections, the spin polarization of scattered electrons, the spin left-right (up-down) asymmetry, generalized STU parameters, and the statistical tensors of the final atomic state, which determine polarization and correlation parameters in radiative and nonradiative decays of these states. In addition, the program transforms partial-wave scattering amplitudes into a representation of projections of the angular momenta in the natural and collision coordinate frames, thereby providing the possibility for a user to conveniently calculate any observable not explicitly included in the code. The program can be used directly as a final module after running the Belfast R-matrix codes in the Breit-Pauli mode.  相似文献   

9.
We discuss in this work a new software tool, named E-SpiReS (Electron Spin Resonance Simulations), aimed at the interpretation of dynamical properties of molecules in fluids from electron spin resonance (ESR) measurements. The code implements an integrated computational approach (ICA) for the calculation of relevant molecular properties that are needed in order to obtain spectral lines. The protocol encompasses information from atomistic level (quantum mechanical) to coarse grained level (hydrodynamical), and evaluates ESR spectra for rigid or flexible single or multi-labeled paramagnetic molecules in isotropic and ordered phases, based on a numerical solution of a stochastic Liouville equation.E-SpiReS automatically interfaces all the computational methodologies scheduled in the ICA in a way completely transparent for the user, who controls the whole calculation flow via a graphical interface.Parallelized algorithms are employed in order to allow running on calculation clusters, and a web applet Java has been developed with which it is possible to work from any operating system, avoiding the problems of recompilation.E-SpiReS has been used in the study of a number of different systems and two relevant cases are reported to underline the promising applicability of the ICA to complex systems and the importance of similar software tools in handling a laborious protocol.

Program summary

Program title: E-SpiReSCatalogue identifier: AEEM_v1_0Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEEM_v1_0.htmlProgram obtainable from: CPC Program Library, Queen's University, Belfast, N. IrelandLicensing provisions: GPL v2.0No. of lines in distributed program, including test data, etc.: 311 761No. of bytes in distributed program, including test data, etc.: 10 039 531Distribution format: tar.gzProgramming language: C (core programs) and Java (graphical interface)Computer: PC and MacintoshOperating system: Unix and WindowsHas the code been vectorized or parallelized?: YesRAM: 2 048 000 000Classification: 7.2External routines: Babel-1.1, CLAPACK, BLAS, CBLAS, SPARSEBLAS, CQUADPACK, LEVMARNature of problem:Ab initio simulation of cw-ESR spectra of radicals in solutionSolution method: E-SpiReS uses an hydrodynamic approach to calculate the diffusion tensor of the molecule, DFT methodologies to evaluate magnetic tensors and linear algebra techniques to solve numerically the stochastic Liouville equation to obtain an ESR spectrum.Running time: Variable depending on the task. It takes seconds for small molecules in the fast motional regime to hours for big molecules in viscous and/or ordered media.  相似文献   

10.
11.
12.
13.
PHON: A program to calculate phonons using the small displacement method   总被引:1,自引:0,他引:1  
The program phon calculates force constant matrices and phonon frequencies in crystals. From the frequencies it also calculates various thermodynamic quantities, like the Helmholtz free energy, the entropy, the specific heat and the internal energy of the harmonic crystal. The procedure is based on the small displacement method, and can be used in combination with any program capable to calculate forces on the atoms of the crystal. In order to examine the usability of the method, I present here two examples: metallic Al and insulating MgO. The phonons of these two materials are calculated using density functional theory. The small displacement method results are compared with those obtained using the linear response method. In the case of Al the method provides accurate phonon frequencies everywhere in the Brillouin Zone (BZ). In the case of MgO the longitudinal branch of the optical phonons near the centre of the BZ is incorrectly described as degenerate with the two transverse branches, because the non-analytical part of the dynamical matrix is ignored here; however, thermodynamic properties like the Helmholtz free are essentially unaffected.

Program summary

Program title: PHONCatalogue identifier: AEDP_v1_0Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEDP_v1_0.htmlProgram obtainable from: CPC Program Library, Queen's University, Belfast, N. IrelandLicensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.htmlNo. of lines in distributed program, including test data, etc.: 19 580No. of bytes in distributed program, including test data, etc.: 612 193Distribution format: tar.gzProgramming language: Fortran 90Computer: Any Unix, LinuxOperating system: UnixRAM: Depends on super-cell size, but usually negligibleClassification: 7.8External routines: Subprograms ZHEEV and DSYEV (Lapack); needs BLAS. A tutorial is provided with the distribution which requires the installation of the quantum-espresso package (http://www.quantum-espresso.org)Nature of problem: Stable crystals at low temperature can be well described by expanding the potential energy around the atomic equilibrium positions. The movements of the atoms around their equilibrium positions can then be described using harmonic theory, and is characterised by global vibrations called phonons, which can be identified by vectors in the Brillouin zone of the crystal, and there are 3 phonon branches for each atom in the primitive cell. The problem is to calculate the frequencies of these phonons for any arbitrary choice of q-vector in the Brillouin zone.Solution method: The small displacement method: each atom in the primitive cell is displaced by a small amount, and the forces induced on all the other atoms in the crystal are calculated and used to construct the force constant matrix. Supercells of ∼100 atoms are usually large enough to describe the force constant matrix up to the range where its elements have fallen to negligibly small values. The force constant matrix is then used to compute the dynamical matrix at any chosen q-vector in the Brillouin zone, and the diagonalisation of the dynamical matrix provides the squares of the phonon frequencies. The PHON code needs external programs to calculate these forces, and it can be used with any program capable of calculating forces in crystals. The most useful applications are obtained with codes based on density functional theory, but there is no restriction on what can be used.Running time: Negligible, typically a few seconds (or at most a few minutes) on a PC. It can take longer if very dense meshes of q-points are needed, for example, to compute very accurate phonon density of states.  相似文献   

14.
The Plato package allows both orthogonal and non-orthogonal tight-binding as well as density functional theory (DFT) calculations to be performed within a single framework. The package also provides extensive tools for analysing the results of simulations as well as a number of tools for creating input files. The code is based upon the ideas first discussed in Sankey and Niklewski (1989) [1] with extensions to allow high-quality DFT calculations to be performed. DFT calculations can utilise either the local density approximation or the generalised gradient approximation. Basis sets from minimal basis through to ones containing multiple radial functions per angular momenta and polarisation functions can be used. Illustrations of how the package has been employed are given along with instructions for its utilisation.

Program summary

Program title: PlatoCatalogue identifier: AEFC_v1_0Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEFC_v1_0.htmlProgram obtainable from: CPC Program Library, Queen's University, Belfast, N. IrelandLicensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.htmlNo. of lines in distributed program, including test data, etc.: 219 974No. of bytes in distributed program, including test data, etc.: 1 821 493Distribution format: tar.gzProgramming language: C/MPI and PERLComputer: Apple Macintosh, PC, Unix machinesOperating system: Unix, Linux and Mac OS XHas the code been vectorised or parallelised?: Yes, up to 256 processors testedRAM: Up to 2 Gbytes per processorClassification: 7.3External routines: LAPACK, BLAS and optionally ScaLAPACK, BLACS, PBLAS, FFTWNature of problem: Density functional theory study of electronic structure and total energies of molecules, crystals and surfaces.Solution method: Localised orbital based density functional theory.Restrictions: Tight-binding and density functional theory only, no exact exchange.Unusual features: Both atom centred and uniform meshes available. Can deal with arbitrary angular momenta for orbitals, whilst still retaining Slater–Koster tables for accuracy.Running time: Test cases will run in a few minutes, large calculations may run for several days.  相似文献   

15.
S U S Y 2     
This package deals with supersymmetric functions and with the algebra of supersymmetric operators in extended N = 2 as well as in nonextended N = 1 supersymmetry. It allows us to make a realization of the SuSy algebra of differential operators, compute the gradients of given SuSy Hamiltonians and to obtain the SuSy version of soliton equations using the SuSy Lax approach. There are also many additional procedures included that are also encountered in the SuSy soliton approach, as for example the conjugation of a given SuSy operator, the computation of a general form of SuSy Hamiltonians (up to SuSy divergence equivalence), and the checking of the validity of the Jacobi identity for some SuSy Hamiltonian operators.  相似文献   

16.
The software described in this paper uses the Maple algebraic computing environment to calculate an analytic form for the matrix element of the plane-wave Born approximation of the electron-impact ionisation of an atomic orbital, with arbitrary orbital and angular momentum quantum numbers. The atomic orbitals are approximated by Hartree-Fock Slater functions, and the ejected electron is modelled by a hydrogenic Coulomb wave, made orthogonal to all occupied orbitals of the target atom. Clenshaw-Curtis integration techniques are then used to calculate the total ionisation cross-section. For improved performance, the numerical integrations are performed using FORTRAN by automatically converting the analytic matrix element for each orbital into a FORTRAN subroutine. The results compare favourably with experimental data for a wide range of elements, including the transition metals, with excellent convergence at high energies.

Program summary

Title of program: BIXCatalogue identifier:ADRZProgram summary URL:http://www.cpc.cs.qub.ac.uk/cpc/summaries/ADRZProgram obtainable from:CPC Program Library, Queen's University of Belfast, N. IrelandComputers: Platform independentOperating systems: Tested on DEC Alpha Unix, Windows NT 4.0 and Windows XP Professional EditionProgramming language used: Maple V Release 5.1 and FORTRAN 90Memory required: 256 MBNo. of processors used: 1No. of bytes in distributed program, including test data, etc.:61754Distributed format:tar gzip fileKeywords: Born approximation, electron-impact ionisation cross-section, Maple, Hartree-FockNature of physical problem: Calculates the total electron impact ionisation cross-section for neutral and ionised atomic species using the first-Born approximation. The scattered electron is modelled by a plane wave, and the ejected electron is modelled by a hydrogenic Coulomb wave, which is made orthogonal to all occupied atomic orbitals, and the atomic orbitals are approximated by Hartree-Fock Slater functions.Method of solution: An analytic form of the matrix element is evaluated using the Maple algebraic computing software. The total ionisation cross-section is then calculated using a three-dimensional Clenshaw-Curtis numerical integration algorithm.Restrictions on the complexity of the problem: There is no theoretical limit on the quantum state of the target orbital that can be solved with this methodology, subject to the availability of Hartree-Fock coefficients. However, computing resource limitations will place a practical limit to, approximately, n?7 and l?4. The precision of results close to the ionisation threshold of larger atoms (< 1 eV for Z>48) is limited to ≈5%.Typical running time: 5 to 40 minutes for initial calculation for an atomic orbital, then 5 to 300 seconds for subsequent energies of the same orbital.Unusual features of the program: To reduce calculation time, FORTRAN source code is generated and compiled automatically by the Maple procedures, based upon the analytic form of the matrix element. Numerical evaluation is then passed to the FORTRAN executable and the results are retrieved automatically.  相似文献   

17.
On modern architectures, the performance of 32-bit operations is often at least twice as fast as the performance of 64-bit operations. By using a combination of 32-bit and 64-bit floating point arithmetic, the performance of many dense and sparse linear algebra algorithms can be significantly enhanced while maintaining the 64-bit accuracy of the resulting solution. The approach presented here can apply not only to conventional processors but also to other technologies such as Field Programmable Gate Arrays (FPGA), Graphical Processing Units (GPU), and the STI Cell BE processor. Results on modern processor architectures and the STI Cell BE are presented.

Program summary

Program title: ITER-REFCatalogue identifier: AECO_v1_0Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AECO_v1_0.htmlProgram obtainable from: CPC Program Library, Queen's University, Belfast, N. IrelandLicensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.htmlNo. of lines in distributed program, including test data, etc.: 7211No. of bytes in distributed program, including test data, etc.: 41 862Distribution format: tar.gzProgramming language: FORTRAN 77Computer: desktop, serverOperating system: Unix/LinuxRAM: 512 MbytesClassification: 4.8External routines: BLAS (optional)Nature of problem: On modern architectures, the performance of 32-bit operations is often at least twice as fast as the performance of 64-bit operations. By using a combination of 32-bit and 64-bit floating point arithmetic, the performance of many dense and sparse linear algebra algorithms can be significantly enhanced while maintaining the 64-bit accuracy of the resulting solution.Solution method: Mixed precision algorithms stem from the observation that, in many cases, a single precision solution of a problem can be refined to the point where double precision accuracy is achieved. A common approach to the solution of linear systems, either dense or sparse, is to perform the LU factorization of the coefficient matrix using Gaussian elimination. First, the coefficient matrix A is factored into the product of a lower triangular matrix L and an upper triangular matrix U. Partial row pivoting is in general used to improve numerical stability resulting in a factorization PA=LU, where P is a permutation matrix. The solution for the system is achieved by first solving Ly=Pb (forward substitution) and then solving Ux=y (backward substitution). Due to round-off errors, the computed solution, x, carries a numerical error magnified by the condition number of the coefficient matrix A. In order to improve the computed solution, an iterative process can be applied, which produces a correction to the computed solution at each iteration, which then yields the method that is commonly known as the iterative refinement algorithm. Provided that the system is not too ill-conditioned, the algorithm produces a solution correct to the working precision.Running time: seconds/minutes  相似文献   

18.
This paper describes an algorithm and a computer program which solves numerically (virtually exactly) equations of the restricted open-shell Hartree-Fock and Hartree-Fock-Slater model for diatomic molecules  相似文献   

19.
We present HONEI, an open-source collection of libraries offering a hardware oriented approach to numerical calculations. HONEI abstracts the hardware, and applications written on top of HONEI can be executed on a wide range of computer architectures such as CPUs, GPUs and the Cell processor. We demonstrate the flexibility and performance of our approach with two test applications, a Finite Element multigrid solver for the Poisson problem and a robust and fast simulation of shallow water waves. By linking against HONEI's libraries, we achieve a two-fold speedup over straight forward C++ code using HONEI's SSE backend, and additional 3–4 and 4–16 times faster execution on the Cell and a GPU. A second important aspect of our approach is that the full performance capabilities of the hardware under consideration can be exploited by adding optimised application-specific operations to the HONEI libraries. HONEI provides all necessary infrastructure for development and evaluation of such kernels, significantly simplifying their development.

Program summary

Program title: HONEICatalogue identifier: AEDW_v1_0Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEDW_v1_0.htmlProgram obtainable from: CPC Program Library, Queen's University, Belfast, N. IrelandLicensing provisions: GPLv2No. of lines in distributed program, including test data, etc.: 216 180No. of bytes in distributed program, including test data, etc.: 1 270 140Distribution format: tar.gzProgramming language: C++Computer: x86, x86_64, NVIDIA CUDA GPUs, Cell blades and PlayStation 3Operating system: LinuxRAM: at least 500 MB freeClassification: 4.8, 4.3, 6.1External routines: SSE: none; [1] for GPU, [2] for Cell backendNature of problem: Computational science in general and numerical simulation in particular have reached a turning point. The revolution developers are facing is not primarily driven by a change in (problem-specific) methodology, but rather by the fundamental paradigm shift of the underlying hardware towards heterogeneity and parallelism. This is particularly relevant for data-intensive problems stemming from discretisations with local support, such as finite differences, volumes and elements.Solution method: To address these issues, we present a hardware aware collection of libraries combining the advantages of modern software techniques and hardware oriented programming. Applications built on top of these libraries can be configured trivially to execute on CPUs, GPUs or the Cell processor. In order to evaluate the performance and accuracy of our approach, we provide two domain specific applications; a multigrid solver for the Poisson problem and a fully explicit solver for 2D shallow water equations.Restrictions: HONEI is actively being developed, and its feature list is continuously expanded. Not all combinations of operations and architectures might be supported in earlier versions of the code. Obtaining snapshots from http://www.honei.org is recommended.Unusual features: The considered applications as well as all library operations can be run on NVIDIA GPUs and the Cell BE.Running time: Depending on the application, and the input sizes. The Poisson solver executes in few seconds, while the SWE solver requires up to 5 minutes for large spatial discretisations or small timesteps.References:
  • [1] 
    http://www.nvidia.com/cuda.
  • [2] 
    http://www.ibm.com/developerworks/power/cell.
  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号