首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 62 毫秒
We present a suite of algorithms for migrating Lagrangian data between processors in a parallel environment when the underlying mesh is Eulerian. The collection of algorithms applies to both uniform and adaptive meshes. The algorithms are implemented in, and distributed with, FLASH, a publicly available multiphysics simulation code. Migrating Lagrangian data on an Eulerian mesh is non-trivial because the Eulerian grid points are spatially fixed whereas Lagrangian entities move with the flow of a simulation. Thus, the movement of Lagrangian data cannot use the data migration methods associated with the Eulerian mesh. Additionally, when the mesh is adaptive, as the simulation progresses the grid resolution changes. The resulting regridding process can cause complex Lagrangian data migration.The algorithms presented in this paper describe Lagrangian data movement on a static uniform mesh and on an adaptive octree based block-structured mesh. Some of the algorithms are general enough to be applicable to any block structured mesh, while some others exploit the meta-data and structure of PARAMESH, the adaptive mesh refinement (AMR) package used in FLASH. We also present an analysis of the algorithms’ comparative performances in different parallel environments, and different flow characteristics.  相似文献   

We have performed benchmarks of two three-dimensional parallel Particle-In-Cell (PIC) codes that are similar but have quite different communication patterns on different computational Grids. An electrostatic code with only electrons based on the three-dimensional skeleton PIC code employs the FFT Poisson solver that uses collective communication patterns. Another is the TRISTAN (TRI-dimensional STATNford) code parallelized with MPI, an electromagnetic full particle code, which uses a field solver that only requires point-to-point neighbor communication patterns. We present the mpptest benchmarks on cluster-based computational Grids, where both the basic point-to-point communication patterns and the basic collective communication patterns used in these PIC codes are tested. The results of these benchmarks clearly allow us to quantify and understand the scalability of both communication patterns on the Grids. The present results show that the parallelized TRISTAN code (without all-to-all collective communication) is more scalable than the parallelized skeleton PIC code (with all-to-all collective communication), in cluster-based computational Grid systems where communication performances is poor.  相似文献   

We have developed a full paralleled 2D electrostatic Particle-in-Cell/Monte-Carlo Coupled (PIC-MCC) code for capacitively coupled plasma (CCP) simulations. In this code, we distributed the grid between processors along radial direction, and Poisson equation is solved accordingly paralleled. We applied a couple of numerical accelerating technologies: paralleled fast Poisson solver, assembler pushing code, particle sorting and so on. Theoretical analysis and numerical benchmark showed that this parallel framework had good efficiency and scalability. The framework of the code and the optimization technologies and algorithms are discussed, benchmarks and simulation results are also shown.  相似文献   

The present paper studies two particle management strategies for dynamically adaptive Cartesian grids at hands of a particle-in-cell code. One holds the particles within the grid cells, the other within the grid vertices. The fundamental challenge for the algorithmic strategies results from the fact that particles may run through the grid without velocity constraints. To facilitate this, we rely on multiscale grid representations. They allow us to lift and drop particles between different spatial resolutions. We call this cell-based strategy particle in tree (PIT). Our second approach assigns particles to vertices describing a dual grid (PIDT) and augments the lifts and drops with multiscale linked cells.Our experiments validate the two schemes at hands of an electrostatic particle-in-cell code by retrieving the dispersion relation of Langmuir waves in a thermal plasma. They reveal that different particle and grid characteristics favour different realisations. The possibility that particles can tunnel through an arbitrary number of grid cells implies that most data is exchanged between neighbouring ranks, while very few data is transferred non-locally. This constraints the scalability as the code potentially has to realise global communication. We show that the merger of an analysed tree grammar with PIDT allows us to predict particle movements among several levels and to skip parts of this global communication a priori. It is capable to outperform several established implementations based upon trees and/or space-filling curves.  相似文献   

In this work, we analyze the scalability of inexact two-level balancing domain decomposition by constraints (BDDC) preconditioners for Krylov subspace iterative solvers, when using a highly scalable asynchronous parallel implementation where fine and coarse correction computations are overlapped in time. This way, the coarse-grid problem can be fully overlapped by fine-grid computations (which are embarrassingly parallel) in a wide range of cases. Further, we consider inexact solvers to reduce the computational cost/complexity and memory consumption of coarse and local problems and boost the scalability of the solver. Out of our numerical experimentation, we conclude that the BDDC preconditioner is quite insensitive to inexact solvers. In particular, one cycle of algebraic multigrid (AMG) is enough to attain algorithmic scalability. Further, the clear reduction of computing time and memory requirements of inexact solvers compared to sparse direct ones makes possible to scale far beyond state-of-the-art BDDC implementations. Excellent weak scalability results have been obtained with the proposed inexact/overlapped implementation of the two-level BDDC preconditioner, up to 93,312 cores and 20 billion unknowns on JUQUEEN. Further, we have also applied the proposed setting to unstructured meshes and partitions for the pressure Poisson solver in the backward-facing step benchmark domain.  相似文献   

Adaptive grid refinement in Fortran (AGRIF) is a Fortran90 package for the integration of adaptive mesh refinement (AMR) features within existing finite difference codes. The package first provides model-independent Fortran90 procedures containing the different operations in an AMR process: time integration of grid hierarchy, clustering, interpolations, updates, etc. The package then creates the Fortran90 model-dependent part of the code based on an entry file written by the user.The basic idea of AGRIF is to make use of Fortran90 pointers to successively address the variables of the different grids of an AMR process. As pointers can be used exactly like other (static) variables in Fortran, most of the original code will remain unchanged.  相似文献   

A new method, namely, the parallel two-level hybrid (PTH) method, is developed to solve tridiagonal systems on parallel computers. PTH has two levels of parallelism. The first level is based on algorithms developed from the Sherman-Morrison modification formula, and the second level can choose different parallel tridiagonal solvers for different applications. By choosing different outer and inner solvers and by controlling its two-level partition, PTH can deliver better performance for different applications on different machine ensembles and problem sizes. In an extreme case, the two levels of parallelism can be merged into one, and PTH can be the best algorithm otherwise available. Theoretical analyses and numerical experiments indicate that PTH is significantly better than existing methods on massively parallel computers. For instance, using PTH in a fast Poisson solver results in a 2-folds speedup compared to a conventional parallel Poisson solver on a 512 nodes IBM machine. When only the tridiagonal solver is considered, PTH is over 10 times faster than the currently used implementation.  相似文献   

This work is devoted to the development of efficient parallel algorithms for the direct numerical simulation (DNS) of incompressible flows on modern supercomputers. In doing so, a Poisson equation needs to be solved at each time-step to project the velocity field onto a divergence-free space. Due to the non-local nature of its solution, this elliptic system is the part of the algorithm that is most difficult to parallelize. The Poisson solver presented here is restricted to problems with one uniform periodic direction. It is a combination of a block preconditioned Conjugate Gradient (PCG) and an FFT diagonalization. The latter decomposes the original system into a set of mutually independent 2D systems that are solved by means of the PCG algorithm. For the most ill-conditioned systems, that correspond to the lowest Fourier frequencies, the PCG is replaced by a direct Schur-complement based solver.The previous version of the Poisson solver was conceived for single-core (also dual-core) processors and therefore, the distributed memory model with message-passing interface (MPI) was used. The irruption of multi-core architectures motivated the use of a two-level hybrid MPI + OpenMP parallelization with the shared memory model on the second level. Advantages and implementation details for the additional OpenMP parallelization are presented and discussed in this paper. Numerical experiments show that, within its range of efficient scalability, the previous MPI-only parallelization is slightly outperformed by the MPI + OpenMP approach. But more importantly, the hybrid parallelization has allowed to significantly extend the range of efficient scalability. Here, the solver has been successfully tested up to 12800 CPU cores for meshes with up to 109 grid points. However, estimations based on the presented results show that this range can be potentially stretched up until 200,000 cores approximately. Finally, several examples of DNS simulations are briefly presented to illustrate some potential applications of the solver.  相似文献   

I describe a Poisson solver for the adaptive mesh magnetohydrodynamics (MHD) code NIRVANA using ADI techniques (ADI: Alternative Direction Implicit). The solver is fit to the mesh refinement framework of the code and utilizes its special block-structured design. The key part of the method is an algorithm for the intelligent clustering of subgrids which permits the application of numerical methods based on dimensional operator splitting like ADI. Test problems show the convergence of this ansatz.  相似文献   

We propose a geometric multilevel solver for efficiently solving linear systems arising from particle‐based methods. To apply this method to particle systems, we construct the hierarchy, establish the correspondence between solutions at the particle and grid levels, and coarsen simulation elements taking boundary conditions into account. In addition, we propose a new solid boundary handling method to solve a pressure Poisson equation in a unified manner. We demonstrate that our method can handle general fluid simulation scenarios including two‐way fluid‐solid coupling, and the computational cost of this new solver scales nearly linearly with respect to the number of unknowns, unlike previous solvers for particle‐based methods.  相似文献   

The Laplace–Beltrami system of nonlinear, elliptic, partial differential equations has utility in the generation of computational grids on complex and highly curved geometry. Discretization of this system using the finite-element method accommodates unstructured grids, but generates a large, sparse, ill-conditioned system of nonlinear discrete equations. The use of the Laplace–Beltrami approach, particularly in large-scale applications, has been limited by the scalability and efficiency of solvers. This paper addresses this limitation by developing two nonlinear solvers based on the Jacobian-Free Newton–Krylov (JFNK) methodology. A key feature of these methods is that the Jacobian is not formed explicitly for use by the underlying linear solver. Iterative linear solvers such as the Generalized Minimal RESidual (GMRES) method do not technically require the stand-alone Jacobian; instead its action on a vector is approximated through two nonlinear function evaluations. The preconditioning required by GMRES is also discussed. Two different preconditioners are developed, both of which employ existing Algebraic Multigrid (AMG) methods. Further, the most efficient preconditioner, overall, for the problems considered is based on a Picard linearization. Numerical examples demonstrate that these solvers are significantly faster than a standard Newton–Krylov approach; a speedup factor of approximately 26 was obtained for the Picard preconditioner on the largest grids studied here. In addition, these JFNK solvers exhibit good algorithmic scaling with increasing grid size.  相似文献   

A new code, named MAP, is written in FORTRAN language for magnetohydrodynamics (MHD) simulations with the adaptive mesh refinement (AMR) and Message Passing Interface (MPI) parallelization. There are several optional numerical schemes for computing the MHD part, namely, modified Mac Cormack Scheme (MMC), Lax–Friedrichs scheme (LF), and weighted essentially non-oscillatory (WENO) scheme. All of them are second-order, two-step, component-wise schemes for hyperbolic conservative equations. The total variation diminishing (TVD) limiters and approximate Riemann solvers are also equipped. A high resolution can be achieved by the hierarchical block-structured AMR mesh. We use the extended generalized Lagrange multiplier (EGLM) MHD equations to reduce the non-divergence free error produced by the scheme in the magnetic induction equation. The numerical algorithms for the non-ideal terms, e.g., the resistivity and the thermal conduction, are also equipped in the code. The details of the AMR and MPI algorithms are described in the paper.  相似文献   

Flow simulation in inlet ducts along with several turning vanes used in electrostatic precipitator (ESP) are analysed to understand the flow pattern at its exit locations. The geometry of inlet duct has been extracted from the Plant Design Manufacturing System (PDMS) and refined with turning vanes placed at several locations. The domain of duct geometry around turning vanes are decomposed with several volumes and filled with hexahedral elements with the help of stat-of-art mesh generator - Hypermesh. The resulting computational grid has been used in TASCflow solver to predict its flow pattern in the duct. Simulation for the specified conditions predicts uneven flow distribution in the ESP inlet duct. Due to large flow recirculation and turbulent losses in the duct, non-uniform averaged mass flow rates are noticed at duct exit locations. Simulation results suggest that the improvement of flow distribution in the duct through optimization can be tried by placing more turning/splitter vanes in the inlet duct. In order to ensure that the results obtained from TASCflow are meaningful and in right direction, in the absence of measurement data, simulation was benchmarked with other industry standard commercial flow solvers. The observations made from these popular solvers confirm the findings obtained using the TASCflow solver. The analysis with multiple solvers indicates that Fluent provides quick results, while better visualization can be made using CFX solver. The Star-CD solver, which captures the turbulent losses accurately takes more time for convergence provides alternatives.  相似文献   

This article studies the performance and scalability of a geometric multigrid solver implemented within the hierarchical hybrid grids (HHG) software package on current high performance computing clusters up to nearly 300,000 cores. HHG is based on unstructured tetrahedral finite elements that are regularly refined to obtain a block‐structured computational grid. One challenge is the parallel mesh generation from an unstructured input grid that roughly approximates a human head within a 3D magnetic resonance imaging data set. This grid is then regularly refined to create the HHG grid hierarchy. As test platforms, a BlueGene/P cluster located at Jülich supercomputing center and an Intel Xeon 5650 cluster located at the local computing center in Erlangen are chosen. To estimate the quality of our implementation and to predict runtime for the multigrid solver, a detailed performance and communication model is developed and used to evaluate the measured single node performance, as well as weak and strong scaling experiments on both clusters. Thus, for a given problem size, one can predict the number of compute nodes that minimize the overall runtime of the multigrid solver. Overall, HHG scales up to the full machines, where the biggest linear system solved on Jugene had more than one trillion unknowns. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

Newton-Krylov-FAC methods for problems discretized on locally refined grids   总被引:1,自引:0,他引:1  
Many problems in computational science and engineering are nonlinear and time-dependent. The solutions to these problems may include spatially localized features, such as boundary layers or sharp fronts, that require very fine grids to resolve. In many cases, it is impractical or prohibitively expensive to resolve these features with a globally fine grid, especially in three dimensions. Adaptive mesh refinement (AMR) is a dynamic gridding approach that employs a fine grid only where necessary to resolve such features. Numerous AMR codes exist for solving hyperbolic problems with explicit time stepping and some classes of linear elliptic problems. Researchers have paid much less attention to the development of AMR algorithms for the implicit solution of systems of nonlinear equations. Recent efforts encompassing a variety of applications demonstrate that Newton-Krylov methods are effective when combined with multigrid preconditioners. This suggests that hierarchical methods, such as the Fast Adaptive Composite grid (FAC) method of McCormick and Thomas, can provide effective preconditioning for problems discretized on locally refined grids. In this paper, we address algorithm and implementation issues for the use of Newton-Krylov-FAC methods on structured AMR grids. In our software infrastructure, we combine nonlinear solvers from KINSOL and PETSc with the SAMRAI AMR library, and include capabilities for implicit time stepping. We have obtained convergence rates independent of the number of grid refinement levels for simple, nonlinear, Poisson-like, problems. Additional efforts to employ this infrastructure in new applications are underway. Communicated by: G. Wittum  相似文献   

In this paper, we describe an array-based hierarchical mesh refinement capability through uniform refinement of unstructured meshes for efficient solution of PDE’s using finite element methods and multigrid solvers. A multi-degree, multi-dimensional and multi-level framework is designed to generate the nested hierarchies from an initial coarse mesh that can be used for a variety of purposes such as in multigrid solvers/preconditioners, to do solution convergence and verification studies and to improve overall parallel efficiency by decreasing I/O bandwidth requirements (by loading smaller meshes and in-memory refinement). We also describe a high-order boundary reconstruction capability that can be used to project the new points after refinement using high-order approximations instead of linear projection in order to minimize and provide more control on geometrical errors introduced by curved boundaries.The capability is developed under the parallel unstructured mesh framework “Mesh Oriented dAtaBase” (MOAB Tautges et al. (2004)). We describe the underlying data structures and algorithms to generate such hierarchies in parallel and present numerical results for computational efficiency and effect on mesh quality. We also present results to demonstrate the applicability of the developed capability to study convergence properties of different point projection schemes for various mesh hierarchies and to a multigrid finite-element solver for elliptic problems.  相似文献   

We present a fast high-order Poisson solver for implementation on parallel computers. The method uses deferred correction, such that high-order accuracy is obtained by solving a sequence of systems with a narrow stencil on the left-hand side. These systems are solved by a domain decomposition method. The method is direct in the sense that for any given order of accuracy, the number of arithmetic operations is fixed. Numerical experiments show that these high-order solvers easily outperform standard second-order ones. The very fast algorithm in combination with the coarser grid allowed for by the high-order method, also makes it quite possible to compete with adaptive methods and irregular grids for problems with solutions containing widely different scales.  相似文献   

NORMAN RAMSEY 《Software》1996,26(4):467-487
This paper presents a simple equation solver. The solver finds solutions for sets of linear equations extended with several nonlinear operators, including integer division and modulus, sign extension, and bit slicing. The solver uses a new technique called {\em balancing}, which can eliminate some nonlinear operators from a set of equations before applying Gaussian elimination. The solver's principal advantages are its simplicity and its ability to handle some nonlinear operators, including nonlinear functions of more than one variable. The solver is part of an application generator that provides encoding and decoding of machine instructions based on equational specifications. The solver is presented not as pseudo code but as a literate program, which guarantees that the code shown in the paper is the same code that is actually used. Using real code exposes more detail than using pseudocode, but literate-programming techniques help manage the detail. The detail should benefit readers who want to implement their own solvers based on the techniques presented here.  相似文献   

Peigin  S.  Epstein  B.  Rubin  T.  Seror  S. 《The Journal of supercomputing》2004,27(1):49-68
We present a highly scalable parallelization of a high-accuracy 3D serial multiblock Navier-Stokes solver. The code solves the full Navier-Stokes equations and is capable of performing large-scale computations for practical configurations in an industrial enviroment. The parallelization strategy is based on the geometrical domain decomposition principle, and on the overlapped communication and computation concept. The important advantage of the strategy is that the suggested type of message-passing ensures a very high scalability of the algorithm from the network point of view, because, on the average, the communication work per processor is not increased if the number of processors is increased. The parallel multiblock-structured Navier-Stokes solver based on the parallel virtual machine (PVM) routines was implemented on 106-processors distributed memory cluster managed by the MOSIX software package. Analysis of the results demonstrated a high level of parallel efficiency (speed up) of the computational algorithm. This allowed the reduction of the execution time for large-scale computations employing 10 million of grid points, from an estimated 46 days on the SGI ORIGIN 2000 computer (in the serial single-user mode) to 5–6 hours on 106-processors cluster. Thus, the parallel multiblock full Navier-Stokes code can be successfully used for large-scale practical aerodynamic simulations of a complete aircraft on millions-points grids on a daily basis, as needed in industry.  相似文献   

Managing complex data and geometry in parallel structured AMR applications   总被引:2,自引:0,他引:2  
Adaptive mesh refinement (AMR) is an increasingly important simulation methodology for many science and engineering problems. AMR has the potential to generate highly resolved simulations efficiently by dynamically refining the computational mesh near key numerical solution features. AMR requires more complex numerical algorithms and programming than uniform fixed mesh approaches. Software libraries that provide general AMR functionality can ease these burdens significantly. A major challenge for library developers is to achieve adequate flexibility to meet diverse and evolving application requirements. In this paper, we describe the design of software abstractions for general AMR data management and parallel communication operations in SAMRAI, an object-oriented C++ structured AMR (SAMR) library developed at Lawrence Livermore National Laboratory (LLNL). The SAMRAI infrastructure provides the foundation for a variety of diverse application codes at LLNL and elsewhere. We illustrate SAMRAI functionality by describing how its unique features are used in these codes which employ complex data structures and geometry. We highlight capabilities for moving and deforming meshes, coupling multiple SAMR mesh hierarchies, and immersed and embedded boundary methods for modeling complex geometrical features. We also describe how irregular data structures, such as particles and internal mesh boundaries, may be implemented using SAMRAI tools without excessive application programmer effort. This work was performed under the auspices of the US Department of Energy by University of California Lawrence Livermore National Laboratory under contract number W-7405-Eng-48 and is released under UCRL-JRNL-214559.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号