首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 567 毫秒
1.
Cosmological simulations of structures and galaxies formations have played a fundamental role in the study of the origin, formation and evolution of the Universe. These studies improved enormously with the use of supercomputers and parallel systems and, recently, grid based systems and Linux clusters. Now we present the new version of the tree N-body parallel code FLY that runs on a PC Linux Cluster using the one side communication paradigm MPI-2 and we show the performances obtained. FLY is included in the Computer Physics Communication Program Library. This new version was developed using the Linux Cluster of CINECA, an IBM Cluster with 1024 Intel Xeon Pentium IV 3.0 GHz. The results show that it is possible to run a 64 million particle simulation in less than 15 minutes for each time-step, and the code scalability with the number of processors is achieved. This leads us to propose FLY as a code to run very large N-body simulations with more than 109 particles with the higher resolution of a pure tree code. The FLY new version is available at the CPC Program Library, http://cpc.cs.qub.ac.uk/summaries/ADSC_v2_0.html [U. Becciani, M. Comparato, V. Antonuccio-Delogu, Comput Phys. Comm. 174 (2006) 605].  相似文献   

2.
《Parallel Computing》2007,33(3):159-173
We discuss the performance of direct summation codes used in the simulation of astrophysical stellar systems on highly distributed architectures. These codes compute the gravitational interaction among stars in an exact way and have an O(N2) scaling with the number of particles. They can be applied to a variety of astrophysical problems, like the evolution of star clusters, the dynamics of black holes, the formation of planetary systems, and cosmological simulations. The simulation of realistic star clusters with sufficiently high accuracy cannot be performed on a single workstation but may be possible on parallel computers or grids. We have implemented two parallel schemes for a direct N-body code and we study their performance on general purpose parallel computers and large computational grids. We present the results of timing analyzes conducted on the different architectures and compare them with the predictions from theoretical models. We conclude that the simulation of star clusters with up to a million particles will be possible on large distributed computers in the next decade. Simulating entire galaxies however will in addition require new hybrid methods to speedup the calculation.  相似文献   

3.
We consider three algorithms for solving linear least squares problems based upon the modified Huang algorithm (MHA) in the ABS class for linear systems recently introduced by Abaffy, Broyden and Spedicato. The first algorithm uses an explicit QR factorization of the coefficient matrixA computed by applying MHA to the matrixA T . The second and the third algorithm is based upon two representations of the Moore-Penrose pseudoinverse constructed with the use of MHA. The three algorithms are tested on a large set of problems and compared with the NAG code using QR factorization with Householder rotations. The comparison shows that the algorithms based on MHA are generally more accurate.  相似文献   

4.
We present a suite of Mathematica-based computer-algebra packages, termed “Kranc”, which comprise a toolbox to convert certain (tensorial) systems of partial differential evolution equations to parallelized C or Fortran code for solving initial boundary value problems. Kranc can be used as a “rapid prototyping” system for physicists or mathematicians handling very complicated systems of partial differential equations, but through integration into the Cactus computational toolkit we can also produce efficient parallelized production codes. Our work is motivated by the field of numerical relativity, where Kranc is used as a research tool by the authors. In this paper we describe the design and implementation of both the Mathematica packages and the resulting code, we discuss some example applications, and provide results on the performance of an example numerical code for the Einstein equations.

Program summary

Title of program: KrancCatalogue identifier: ADXS_v1_0Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADXS_v1_0Program obtainable from: CPC Program Library, Queen's University of Belfast, N. IrelandDistribution format: tar.gzComputer for which the program is designed and others on which it has been tested: General computers which run Mathematica (for code generation) and Cactus (for numerical simulations), tested under LinuxProgramming language used: Mathematica, C, Fortran 90Memory required to execute with typical data: This depends on the number of variables and gridsize, the included ADM example requires 4308 KBHas the code been vectorized or parallelized: The code is parallelized based on the Cactus framework.Number of bytes in distributed program, including test data, etc.: 1 578 142Number of lines in distributed program, including test data, etc.: 11 711Nature of physical problem: Solution of partial differential equations in three space dimensions, which are formulated as an initial value problem. In particular, the program is geared towards handling very complex tensorial equations as they appear, e.g., in numerical relativity. The worked out examples comprise the Klein-Gordon equations, the Maxwell equations, and the ADM formulation of the Einstein equations.Method of solution: The method of numerical solution is finite differencing and method of lines time integration, the numerical code is generated through a high level Mathematica interface.Restrictions on the complexity of the program: Typical numerical relativity applications will contain up to several dozen evolution variables and thousands of source terms, Cactus applications have shown scaling up to several thousand processors and grid sizes exceeding 5003.Typical running time: This depends on the number of variables and the grid size: the included ADM example takes approximately 100 seconds on a 1600 MHz Intel Pentium M processor.Unusual features of the program: based on Mathematica and Cactus  相似文献   

5.
Modern graphics processing units (GPUs) have been widely utilized in magnetohydrodynamic (MHD) simulations in recent years. Due to the limited memory of a single GPU, distributed multi-GPU systems are needed to be explored for large-scale MHD simulations. However, the data transfer between GPUs bottlenecks the efficiency of the simulations on such systems. In this paper we propose a novel GPU Direct–MPI hybrid approach to address this problem for overall performance enhancement. Our approach consists of two strategies: (1) We exploit GPU Direct 2.0 to speedup the data transfers between multiple GPUs in a single node and reduce the total number of message passing interface (MPI) communications; (2) We design Compute Unified Device Architecture (CUDA) kernels instead of using memory copy to speedup the fragmented data exchange in the three-dimensional (3D) decomposition. 3D decomposition is usually not preferable for distributed multi-GPU systems due to its low efficiency of the fragmented data exchange. Our approach has made a breakthrough to make 3D decomposition available on distributed multi-GPU systems. As a result, it can reduce the memory usage and computation time of each partition of the computational domain. Experiment results show twice the FLOPS comparing to common 2D decomposition MPI-only implementation method. The proposed approach has been developed in an efficient implementation for MHD simulations on distributed multi-GPU systems, called MGPU–MHD code. The code realizes the GPU parallelization of a total variation diminishing (TVD) algorithm for solving the multidimensional ideal MHD equations, extending our work from single GPU computation (Wong et al., 2011) to multiple GPUs. Numerical tests and performance measurements are conducted on the TSUBAME 2.0 supercomputer at the Tokyo Institute of Technology. Our code achieves 2 TFLOPS in double precision for the problem with 12003 grid points using 216 GPUs.  相似文献   

6.
This paper considers the regional bi-linear control problem of an important class of hyperbolic systems. The objective is to bring the state solutions at time T close to a desired observations w d only on a sub-region ω along the spatial domain Ω. We prove the existence of solution by minimizing sequence method. The adjoint system of this problem is introduced and used to characterize the optimal control. A numerical approach is developed and illustrated successfully by simulations.  相似文献   

7.
When reengineering software systems, maintainers should be able to assess and compare multiple change scenarios for a given goal, so as to choose the most pertinent one. Because they implicitly consider one single working copy, revision control systems do not scale up well to perform simultaneous analyses of multiple versions of systems. We designed Orion, an interactive prototyping tool for reengineering, to simulate changes and compare their impact on multiple versions of software source code models. Our approach offers an interactive simulation of changes, reuses existing assessment tools, and has the ability to hold multiple and branching versions simultaneously in memory. Specifically, we devise an infrastructure which optimizes memory usage of multiple versions for large models. This infrastructure uses an extension of the FAMIX source code meta-model but it is not limited to source code analysis tools since it can be applied to models in general. In this paper, we validate our approach by running benchmarks on memory usage and computation time of model queries on large models. Our benchmarks show that the Orion approach scales up well in terms of memory usage, while the current implementation could be optimized to lower its computation time. We also report on two large case studies on which we applied Orion.  相似文献   

8.
An algorithm for generating parity-check matrices of regular low-density paritycheck codes based on permutation matrices and Steiner triple systems S(v, 3, 2), v = 2 m ? 1, is proposed. Estimations of the rate, minimum distance, and girth for obtained code constructions are presented. Results of simulation of the obtained code constructions for an iterative “belief propagation” (Sum-Product) decoding algorithm applied in the case of transmission over a binary channel with additive Gaussian white noise and BPSK modulation are presented.  相似文献   

9.
GRAFCET is an advantageous modelling language for the specification of controllers in discrete event systems. It allows for hierarchically structuring a control program's specification based on the elements enclosing steps, partial-Grafcets and forcing orders. A method is already available for the automatic transformation of Grafcets1 into PLC code but this method cannot keep the hierarchical structures due to limitations of the PLC language SFC. In this contribution a systematic approach to automatically transform Grafcets into PLC code while retaining the hierarchical structures is described.  相似文献   

10.
Simulating natural phenomena at greater accuracy results in an explosive growth of data. Large‐scale simulations with particles currently involve ensembles consisting of between 106 and 109 particles, which cover 105–106 time steps. Thus, the data files produced in a single run can reach from tens of gigabytes to hundreds of terabytes. This data bank allows one to reconstruct the spatio‐temporal evolution of both the particle system as a whole and each particle separately. Realistically, for one to look at a large data set at full resolution at all times is not possible and, in fact, not necessary. We have developed an agglomerative clustering technique, based on the concept of a mutual nearest neighbor (MNN). This procedure can be easily adapted for efficient visualization of extremely large data sets from simulations with particles at various resolution levels. We present the parallel algorithm for MNN clustering and its timings on the IBM SP and SGI/Origin 3800 multiprocessor systems for up to 16 million fluid particles. The high efficiency obtained is mainly due to the similarity in the algorithmic structure of MNN clustering and particle methods. We show various examples drawn from MNN applications in visualization and analysis of the order of a few hundred gigabytes of data from discrete particle simulations, using dissipative particle dynamics and fluid particle models. Because data clustering is the first step in this concept extraction procedure, we may employ this clustering procedure to many other fields such as data mining, earthquake events and stellar populations in nebula clusters. Copyright © 2003 John Wiley & Sons, Ltd.  相似文献   

11.
The polynomial chaos (PC) method has been widely adopted as a computationally feasible approach for uncertainty quantification (UQ). Most studies to date have focused on non-stiff systems. When stiff systems are considered, implicit numerical integration requires the solution of a non-linear system of equations at every time step. Using the Galerkin approach the size of the system state increases from n to S × n, where S is the number of PC basis functions. Solving such systems with full linear algebra causes the computational cost to increase from O(n3) to O(S3n3). The S3-fold increase can make the computation prohibitive. This paper explores computationally efficient UQ techniques for stiff systems using the PC Galerkin, collocation, and collocation least-squares (LS) formulations. In the Galerkin approach, we propose a modification in the implicit time stepping process using an approximation of the Jacobian matrix to reduce the computational cost. The numerical results show a run time reduction with no negative impact on accuracy. In the stochastic collocation formulation, we propose a least-squares approach based on collocation at a low-discrepancy set of points. Numerical experiments illustrate that the collocation least-squares approach for UQ has similar accuracy with the Galerkin approach, is more efficient, and does not require any modification of the original code.  相似文献   

12.
We present a General-purpose computing on graphics processing units (GPGPU) based computational program and framework for the electronic dynamics of atomic systems under intense laser fields. We present our results using the case of hydrogen, however the code is trivially extensible to tackle problems within the single-active electron (SAE) approximation. Building on our previous work, we introduce the first available GPGPU based implementation of the Taylor, Runge–Kutta and Lanczos based methods created with strong field ab-initio simulations specifically in mind; CLTDSE. The code makes use of finite difference methods and the OpenCL framework for GPU acceleration. The specific example system used is the classic test system; Hydrogen. After introducing the standard theory, and specific quantities which are calculated, the code, including installation and usage, is discussed in-depth. This is followed by some examples and a short benchmark between an 8 hardware thread (i.e. logical core) Intel Xeon CPU and an AMD 6970 GPU, where the parallel algorithm runs 10 times faster on the GPU than the CPU.  相似文献   

13.
We report an O(N) parallel tight binding molecular dynamics simulation study of (10×10) structured carbon nanotubes (CNT) at 300 K. We converted a sequential O(N3) TBMD simulation program into an O(N) parallel code, utilizing the concept of parallel virtual machines (PVM). The code is tested in a distributed memory system consisting of a cluster with 8 PC's that run under Linux (Slackware 2.2.13 kernel). Our results on the speed up, efficiency and system size are given.  相似文献   

14.
15.
A code for the direct numerical simulation (DNS) of incompressible flows with one periodic direction has been developed. It provides a fairly good performance on both Beowulf clusters and supercomputers. Since the code is fully explicit, from a parallel point-of-view, the main bottleneck is the Poisson equation. To solve it, a Fourier diagonalization is applied in the periodic direction to decompose the original 3D system into a set of mutually independent 2D systems. Then, different strategies can be used to solved them. In the previous version of the code, that was conceived for low-cost PC clusters with poor network performance, a Direct Schur-complement Decomposition (DSD) algorithm was used to solve them. Such a method, that is very efficient for PC clusters, cannot be used with an arbitrarily large number of processors and mesh sizes, mainly due to the RAM memory requirements. To do so, a new version of the solver is presented in this paper. It is based on the DSD algorithm that is used as a preconditioner for a Conjugate Gradient method. Numerical experiments showing the scalability and the flexibility of the method on both the MareNostrum supercomputer and a PC cluster with a conventional 100 Mbits/s network are presented and discussed. Finally, illustrative DNS results of an air-filled differentially heated cavity at Ra = 1011 are also presented.  相似文献   

16.
Fireball is an ab initio technique for fast local orbital simulations of nanotechnological, solid state, and biological systems. We have implemented a convenient interface for new users and software architects in the platform-independent Java language to access Fireball's unique and powerful capabilities. The graphical user interface can be run directly from a web server or from within a larger framework such as the Computational Science and Engineering Online (CSE-Online) environment or the Distributed Analysis of Neutron Scattering Experiments (DANSE) framework. We demonstrate its use for high-throughput electronic structure calculations and a multi-100 atom quantum molecular dynamics (MD) simulation.

Program summary

Program title: FireballUICatalogue identifier: AECF_v1_0Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AECF_v1_0.htmlProgram obtainable from: CPC Program Library, Queen's University, Belfast, N. IrelandLicensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.htmlNo. of lines in distributed program, including test data, etc.: 279 784No. of bytes in distributed program, including test data, etc.: 12 836 145Distribution format: tar.gzProgramming language: JavaComputer: PC and workstationOperating system: The GUI will run under Windows, Mac and Linux. Executables for Mac and Linux are included in the package.RAM: 512 MBWord size: 32 or 64 bitsClassification: 4.14Nature of problem: The set up and running of many simulations (all of the same type), from the command line, is a slow process. But most research quality codes, including the ab initio tight-binding code FIREBALL, are designed to run from the command line. The desire is to have a method for quickly and efficiently setting up and running a host of simulations.Solution method: We have created a graphical user interface for use with the FIREBALL code. Once the user has created the files containing the atomic coordinates for each system that they are going to run a simulation on, the user can set up and start the computations of up to hundreds of simulations.Running time: 3 to 5 minutes on a 2 GHz Pentium IV processor.  相似文献   

17.
A new algorithm for the symbolic computation of polynomial conserved densities for systems of nonlinear evolution equations is presented. The algorithm is implemented inMathematica. The programcondens.mautomatically carries out the lengthy symbolic computations for the construction of conserved densities. The code is tested on several well-known partial differential equations from soliton theory. For systems with parameters,condens.mcan be used to determine the conditions on these parameters so that a sequence of conserved densities might exist. The existence of a large number of conservation laws is a predictor for integrability of the system.  相似文献   

18.
Particle-in-cell (PIC) simulations with Monte-Carlo collisions are used in plasma science to explore a variety of kinetic effects. One major problem is the long run-time of such simulations. Even on modern computer systems, PIC codes take a considerable amount of time for convergence. Most of the computations can be massively parallelized, since particles behave independently of each other within one time step. Current graphics processing units (GPUs) offer an attractive means for execution of the parallelized code. In this contribution we show a one-dimensional PIC code running on NVIDIA® GPUs using the CUDA environment. A distinctive feature of the code is that size of the cells that the code uses to sort the particles with respect to their coordinates is comparable to size of the grid cells used for discretization of the electric field. Hence, we call the corresponding algorithm “fine-sorting”. Implementation details and optimization of the code are discussed and the speed-up compared to classical CPU approaches is computed.  相似文献   

19.
Existing techniques for developing large scale complex engineering systems are predominantly software based and use Unified Modeling Language (UML). This leads to difficulties in model transformation, analysis, validation, verification and automatic code generation. Currently no general frameworks are available to bridge the concept-code gap rampant in design and development of complex, software-intensive mechatronic systems called cyber-physical systems. To fill this gap and provide an alternative approach to Object Management Group’s UML/SysML/OCL combination, we propose: Bond Graph based Unified Meta-Modeling Framework (BG-UMF). BG-UMF is a practical and viable alternative and uses a novel hybrid approach based on model unification and integration. The focus is on conceptual design and development of executable models for large systems. The viability of the framework is demonstrated through an application scenario: conceptual design and development of a navigation and control system for a rotor-craft UAV.  相似文献   

20.
Bounds on the rate of disjunctive codes   总被引:1,自引:0,他引:1  
A binary code is said to be a disjunctive (s, ?) cover-free code if it is an incidence matrix of a family of sets where the intersection of any ? sets is not covered by the union of any other s sets of this family. A binary code is said to be a list-decoding disjunctive of strength s with list size L if it is an incidence matrix of a family of sets where the union of any s sets can cover no more that L ? 1 other sets of this family. For L = ? = 1, both definitions coincide, and the corresponding binary code is called a disjunctive s-code. This paper is aimed at improving previously known and obtaining new bounds on the rate of these codes. The most interesting of the new results is a lower bound on the rate of disjunctive (s, ?) cover-free codes obtained by random coding over the ensemble of binary constant-weight codes; its ratio to the best known upper bound converges as s → ∞, with an arbitrary fixed ? ≥ 1, to the limit 2e ?2 = 0.271 ... In the classical case of ? = 1, this means that the upper bound on the rate of disjunctive s-codes constructed in 1982 by D’yachkov and Rykov is asymptotically attained up to a constant factor a, 2e ?2a ≤ 1.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号