首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A three-dimensional parallel unstructured non-nested multigrid solver for solutions of unsteady incompressible viscous flow is developed and validated. The finite-volume Navier–Stokes solver is based on the artificial compressibility approach with a high-resolution method of characteristics-based scheme for handling convection terms. The unsteady flow is calculated with a matrix-free implicit dual time stepping scheme. The parallelization of the multigrid solver is achieved by multigrid domain decomposition approach (MG-DD), using single program multiple data (SPMD) and multiple instruction multiple data (MIMD) programming paradigm. There are two parallelization strategies proposed in this work, first strategy is a one-level parallelization strategy using geometric domain decomposition technique alone, second strategy is a two-level parallelization strategy that consists of a hybrid of both geometric domain decomposition and data decomposition techniques. Message-passing interface (MPI) and OpenMP standard are used to communicate data between processors and decompose loop iterations arrays, respectively. The parallel-multigrid code is used to simulate both steady and unsteady incompressible viscous flows over a circular cylinder and a lid-driven cavity flow. A maximum speedup of 22.5 could be achieved on 32 processors, for instance, the lid-driven cavity flow of Re = 1000. The results obtained agree well with numerical solutions obtained by other researchers as well as experimental measurements. A detailed study of the time step size and number of pseudo-sub-iterations per time step required for simulating unsteady flow are presented in this paper.  相似文献   

2.
We describe an implementation to solve Poisson?s equation for an isolated system on a unigrid mesh using FFTs. The method solves the equation globally on mesh blocks distributed across multiple processes on a distributed-memory parallel computer. Test results to demonstrate the convergence and scaling properties of the implementation are presented. The solver is offered to interested users as the library PSPFFT.

Program summary

Program title: PSPFFTCatalogue identifier: AEJK_v1_0Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEJK_v1_0.htmlProgram obtainable from: CPC Program Library, Queen?s University, Belfast, N. IrelandLicensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.htmlNo. of lines in distributed program, including test data, etc.: 110 243No. of bytes in distributed program, including test data, etc.: 16 332 181Distribution format: tar.gzProgramming language: Fortran 95Computer: Any architecture with a Fortran 95 compiler, distributed memory clustersOperating system: Linux, UnixHas the code been vectorized or parallelized?: Yes, using MPI. An arbitrary number of processors may be used (subject to some constraints). The program has been tested on from 1 up to ∼ 13 000 processors. RAM: Depends on the problem size, approximately 170 MBytes for 483 cells per process.Classification: 4.3, 6.5External routines: MPI (http://www.mcs.anl.gov/mpi/), FFTW (http://www.fftw.org), Silo (https://wci.llnl.gov/codes/silo/) (only necessary for running test problem).Nature of problem: Solving Poisson?s equation globally on unigrid mesh distributed across multiple processes on distributed memory system.Solution method: Numerical solution using multidimensional discrete Fourier Transform in a parallel Fortran 95 code.Unusual features: This code can be compiled as a library to be readily linked and used as a blackbox Poisson solver with other codes.Running time: Depends on the size of the problem, but typically less than 1 second per solve.  相似文献   

3.
We propose a model for describing and predicting the parallel performance of a broad class of parallel numerical software on distributed memory architectures. The purpose of this model is to allow reliable predictions to be made for the performance of the software on large numbers of processors of a given parallel system, by only benchmarking the code on small numbers of processors. Having described the methods used, and emphasized the simplicity of their implementation, the approach is tested on a range of engineering software applications that are built upon the use of multigrid algorithms. Despite their simplicity, the models are demonstrated to provide both accurate and robust predictions across a range of different parallel architectures, partitioning strategies and multigrid codes. In particular, the effectiveness of the predictive methodology is shown for a practical engineering software implementation of an elastohydrodynamic lubrication solver.  相似文献   

4.
The development and validation of a parallel unstructured tetrahedral non-nested multigrid (MG) method for simulation of unsteady 3D incompressible viscous flow is presented. The Navier-Stokes solver is based on the artificial compressibility method (ACM) and a higher-order characteristics-based finite-volume scheme on unstructured MG. Unsteady flow is calculated with an implicit dual time stepping scheme. The parallelization of the solver is achieved by a MG domain decomposition approach (MG-DD), using the Single Program Multiple Data (SPMD) programming paradigm. The Message-Passing Interface (MPI) Library is used for communication of data and loop arrays are decomposed using the OpenMP standard. The parallel codes using single grid and MG are used to simulate steady and unsteady incompressible viscous flows for a 3D lid-driven cavity flow for validation and performance evaluation purposes. The speedups and efficiencies obtained by both the parallel single grid and MG solvers are reasonably good for all test cases, using up to 32 processors on the SGI Origin 3400. The parallel results obtained agree well with those of serial solvers and with numerical solutions obtained by other researchers, as well as experimental measurements.  相似文献   

5.
In this paper, we present a class of preconditioning methods for a parallel solution of the three-dimensional Richards equation. The preconditioning methods Jacobi scaling, block-Jacobi, incomplete lower–upper, incomplete Cholesky and algebraic multigrid were applied in combination with a parallel conjugate gradient solver and tested for robustness and convergence using two model scenarios. The first scenario was an infiltration into initially dry, sandy soil discretised in 500,000 nodes. The second scenario comprised spatially distributed soil properties using 275,706 numerical nodes and atmospheric boundary conditions. Computational results showed a high efficiency of the nonlinear parallel solution procedure for both scenarios using up to 64 processors. Using 32 processors for the first scenario reduced the wall clock time to slightly more than 1% of the single processor run. For scenario 2 the use of 64 processors reduces the wall clock time to slightly more than 20% of the 8 processors wall clock time. The difference in the efficiency of the various preconditioning methods is moderate but not negligible. The use of the multigrid preconditioning algorithm is recommended, since on average it performed best for both scenarios.  相似文献   

6.

This paper presents methods used to perform discrete adjoint gradient evaluations for linear stress and vibration analysis. The methods are implemented within the framework of a discrete adjoint structural solver being developed for multidisciplinary adjoint optimizations of turbomachinery components. The code is differentiated using the algorithmic differentiation tool CoDiPack in tandem with manual treatment of the iterative solvers. Stress analysis leads to a linear system of equations that is typically solved by an iterative solver (e.g. GMRES). To ensure accuracy, the adjoint problem is formulated as a new linear system of equations to be solved. Vibration analysis results in a generalized eigenvalue problem that is also typically solved by an interative solver. The adjoint problem takes out the generalized eigenvalue solve and replaces it by one outer product per eigenfrequency, leading to significantly cheap eigenfrequency gradients for vibration analysis.

  相似文献   

7.
In this paper we present an application for a parallel multigrid solver in 3D to solve the Coulomb problem for the charge self interaction in a quantum-chemical program used to perform ab initio molecular dynamics. Techniques such as Mehrstellendiscretization and τ-extrapolation are used to improve the discretization error. The results show that the expected convergence rates and parallel performance of the multigrid solver are achieved. Within the applied Carr–Parrinello Molecular Dynamics scheme the quality of the solution also determines the accuracy in energy conservation. All forms of discretization employed lead to energy conserving dynamics. In order to test the applicability of our code to larger systems in a massively parallel environment, we investigated a 256 atom periodic supercell of bulk gallium nitride.  相似文献   

8.
Parallel computation for two-dimensional convective flows in cavities with adiabatic horizontal boundaries and driven by differential heating of the two vertical end walls are investigated using the Intel Paragon, Intel Touchstone Delta, Cray T3D and IBM SP2. The numerical scheme, including a parallel multigrid solver, and domain decomposition techniques for parallel computing are discussed in detail. Performance comparisons are made for the different parallel systems, and numerical results using various numbers of processors are discussed. © 1997 John Wiley & Sons, Ltd.  相似文献   

9.
FLASH is a multiphysics multiscale adaptive mesh refinement (AMR) code originally designed for simulation of reactive flows often found in Astrophysics. With its wide user base and flexible applications configuration capability, FLASH has a dual task of maintaining scalability and portability in all its solvers. The scalability of fully explicit solvers in the code is tied very closely to that of the underlying mesh. Others such as the Poisson solver based on a multigrid method have more complex scaling behavior. Multigrid methods suffer from processor starvation and dominating communication costs at coarser grids with increase in the number of processors. In this paper, we propose a combination of uniform grid mesh with AMR mesh, and the merger of two different sets of solvers to overcome the scalability limitation of the Poisson solver in FLASH. The principal challenge in the proposed merger is the efficiency of the communication algorithm to map the mesh back and forth between uniform grid and AMR. We present two different parallel mapping algorithms and also discuss results from performance studies of the two implementations. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

10.
The object of this paper is a parallel preconditioned conjugate gradient iterative solver for finite element problems with coarse-mesh/fine-mesh formulation. An efficient preconditioner is easily derived from the multigrid stiffness matrix. The method has been implemented, for the sake of comparison, both on a IBM-RISC590 and on a Quadrics-QH1, a massive parallel SIMD machine with 128 processors. Examples of solutions of simple linear elastic problems on rectangular grids are presented and convergence and parallel performance are discussed.  相似文献   

11.
Parallel computers are having a profound impact on computational science. Recently highly parallel machines have taken the lead as the fastest supercomputers, a trend that is likely to accelerate in the future. We describe some of these new computers, and issues involved in using them. We present elliptic PDE solutions currently running at 3.8 gigaflops, and an atmospheric dynamics model running at 1.7 gigaflops, on a 65 536-processor computer.

One intrinsic disadvantage of a parallel machine is the need to perform inter-processor communication. It is important to ensure that such communication time is maintained at a small fraction of computation time. We analyze standard multigrid algorithms in two and three dimensions from this point of view, indicating that performance efficiencies in excess of 95% are attainable under suitable conditions on moderately parallel machines. We also demonstrate that such performance is not attainable for multigrid on massively parallel computers, as indicated by an example of poor multigrid efficiency on 65 536 processors. The fundamental difficulty is the inability to keep 65 536 processors busy when operating on very coarse grids.

Most algorithms used for implementing applications on parallel machines have been derived directly from algorithms designed for serial machines. The previously mentioned multigrid example indicates that such ‘parallelized’ algorithms may not always be optimal. Parallel machines open the possibility of finding totally new approaches to solving standard tasks—intrinsically parallel algorithms. In particular, we present a class of superconvergent multiple scale methods that were motivated directly by massevely parallel machines. These methods differ from standard multigrid methods in an intrinsic way, and allow all processors to be used at all times, even when processing on the coarsest grid levels. Their serial versions are not sensible algorithms. The idea that parallel hardware—the Connection Machine in this case—can lead to discovery of new mathematical algorithms was surprising for us.  相似文献   


12.
The development of a two-dimensional time-accurate dual time step Navier-Stokes flow solver with time-derivative preconditioning and multigrid acceleration is described. The governing equations are integrated in time with both an explicit Runge-Kutta scheme and an implicit lower-upper symmetric-Gauss-Seidel scheme in a finite volume framework, yielding second-order accuracy in space and time. Issues concerning the implementation of multigrid for preconditioned, dual time step algorithms are discussed. Steady and unsteady computations were made of lid driven cavity flow, thermally driven cavity flow and pulsatile channel flow for a variety of conditions to validate the schemes and evaluate the effectiveness of multigrid for time-accurate simulations. Significant speedups were observed for steady and unsteady simulations. The speedups for unsteady simulations were problem dependent, a function of how rapidly the flow varied in time and the size of the allowable time step.  相似文献   

13.
The parallel version of the sheet metal forming semi-implicit finite element code ITAS3D has been developed using the domain decomposition method and direct solution methods at both subdomain and interface levels. IBM Message Passing Library is used for data communication between tasks of the parallel code. Solutions of some sheet metal forming problems on IBM SP2 computer show that the adopted DDM algorithm with the direct solver provides acceptable parallel efficiency using a moderate number of processors. The speedup 6.7 is achieved for the problem with 20000 degrees-of-freedom on the 8-processor configuration.  相似文献   

14.
15.
Yang-Yao Niu   《Computers & Fluids》2011,45(1):268-275
In this study, a three-dimensional fluid–structured parallelized solver is extended from the previous work (Niu et al., 2009 [1]) for moving body simulations. Based on the unified Eulerian and Lagrangian coordinate transformations, the unsteady three-dimensional incompressible Navier–Stokes equations with artificial compressibility (Chorin, 1967 [2]) in a dual-time stepping approach are first derived. To implement unsteady flow calculations, the dual-time stepping strategy including the LU decomposition method is used in the pseudo-time iteration and the second-order accurate backward difference is adopted to discretize the unsteady flow terms. Also, a third-order Roe type flux limited splitting is derived to evaluate the spatial difference of the convective fluxes. The original FORTRAN code is converted to the MPI code and tested on a 64-CPU IBM SP2. The parallel strategy here is based on the partitions of all do-loops in the original FORTRAN code and transferring the calculations inside the do-loop into different CPUs. The partition of the do-loop can be applied on the innermost loop, only or the last two inner loops depending on two-dimensional or three-dimensional problems. This kind of the parallel data partition of the loops is independent of what kind of the explicit or implicit type numerical algorithm used. Therefore, the current parallel approach can take advantage of the MPI language fully to transfer data efficiently among CPUs even for solving the governing equation implicitly. The test results show that a significant reduction of computing time in running the model and a near-linear speed up rate is achieved up to 32 CPUs at IBM SP2. The speed up rate is as high as 31 for using 64 IBM SP2 processors The test shows efficient parallel processing to provide prompt simulation of 3D cavity, unsteady dropping airfoil and blood flows in an aortic tube with a linear elastic modeling of wall motion is included here.  相似文献   

16.
Nowadays the state of the art Density Functional Theory (DFT) codes are based on local (LDA) or semilocal (GGA) energy functionals. Recently the theory of a truly nonlocal energy functional has been developed. It has been used mostly as a post-DFT calculation approach, i.e. by applying the functional to the charge density calculated using any standard DFT code, thus obtaining a new improved value for the total energy of the system. Nonlocal calculation is computationally quite expensive and scales as N2 where N is the number of points in which the density is defined, and a massively parallel calculation is welcome for a wider applicability of the new approach. In this article we present a code which accomplishes this goal.

Program summary

Program title: JuNoLoCatalogue identifier: AEFM_v1_0Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEFM_v1_0.htmlProgram obtainable from: CPC Program Library, Queen's University, Belfast, N. IrelandLicensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.htmlNo. of lines in distributed program, including test data, etc.: 176 980No. of bytes in distributed program, including test data, etc.: 2 126 072Distribution format: tar.gzProgramming language: Fortran 90Computer: any architecture with a Fortran 90 compilerOperating system: Linux, AIXHas the code been vectorised or parallelized?: Yes, from 1 to 65536 processors may be used.RAM: depends strongly on the problem's size.Classification: 7.3External routines:• FFTW (http://www.tw.org/)• MPI (http://www.mcs.anl.gov/research/projects/mpich2/ or http://www.lam-mpi.org/)Nature of problem: Obtaining the value of the nonlocal vdW-DF energy based on the charge density distribution obtained from some Density Functional Theory code.Solution method: Numerical calculation of the double sum is implemented in a parallel F90 code. Calculation of this sum yields the required nonlocal vdW-DF energy.Unusual features: Binds to virtually any DFT program.Additional comments: Excellent parallelization features.Running time: Depends strongly on the size of the problem and the number of CPUs used.  相似文献   

17.
Averbuch  A.  Epstein  B.  Ioffe  L.  Yavneh  I. 《The Journal of supercomputing》2000,17(2):123-142
We present an efficient parallelization strategy for speeding up the computation of a high-accuracy 3-dimensional serial Navier-Stokes solver that treats turbulent transonic high-Reynolds flows. The code solves the full compressible Navier-Stokes equations and is applicable to realistic large size aerodynamic configurations and as such requires huge computational resources in terms of computer memory and execution time. The solver can resolve the flow properly on relatively coarse grids. Since the serial code contains a complex infrastructure typical for industrial code (which ensures its flexibility and applicability to complex configurations), then the parallelization task is not straightforward. We get scalable implementation on massively parallel machines by maintaining efficiency at a fixed value by simultaneously increasing the number of processors and the size of the problem.The 3-D Navier-Stokes solver was implemented on three MIMD message-passing multiprocessors (a 64-processors IBM SP2, a 20-processors MOSIX, and a 64-processors Origin 2000). The same code written with PVM and MPI software packages was executed on all the above distinct computational platforms. The examples in the paper demonstrate that we can achieve efficiency of about 60% for as many as 64 processors on Origin 2000 on a full-size 3-D aerodynamic problem which is solved on realistic computational grids.  相似文献   

18.
《Computers & Structures》2007,85(11-14):749-762
The newly developed immersed object method (IOM) [Tai CH, Zhao Y, Liew KM. Parallel computation of unsteady incompressible viscous flows around moving rigid bodies using an immersed object method with overlapping grids. J Comput Phys 2005; 207(1): 151–72] is extended for 3D unsteady flow simulation with fluid–structure interaction (FSI), which is made possible by combining it with a parallel unstructured multigrid Navier–Stokes solver using a matrix-free implicit dual time stepping and finite volume method [Tai CH, Zhao Y, Liew KM. Parallel computation of unsteady three-dimensional incompressible viscous flow using an unstructured multigrid method. In: The second M.I.T. conference on computational fluid and solid mechanics, June 17–20, MIT, Cambridge, MA 02139, USA, 2003; Tai CH, Zhao Y, Liew KM. Parallel computation of unsteady three-dimensional incompressible viscous flow using an unstructured multigrid method, Special issue on “Preconditioning methods: algorithms, applications and software environments. Comput Struct 2004; 82(28): 2425–36]. This uniquely combined method is then employed to perform detailed study of 3D unsteady flows with complex FSI. In the IOM, a body force term F is introduced into the momentum equations during the artificial compressibility (AC) sub-iterations so that a desired velocity distribution V0 can be obtained on and within the object boundary, which needs not coincide with the grid, by adopting the direct forcing method. An object mesh is immersed into the flow domain to define the boundary of the object. The advantage of this is that bodies of almost arbitrary shapes can be added without grid restructuring, a procedure which is often time-consuming and computationally expensive. It has enabled us to perform complex and detailed 3D unsteady blood flow and blood–leaflets interaction in a mechanical heart valve (MHV) under physiological conditions.  相似文献   

19.
The adjoint method is a useful tool for finding gradients of design objectives with respect to system parameters for fluid dynamics simulations. But the utility of this method is hampered by the difficulty in writing an efficient implementation for the adjoint flow solver, especially one that scales to thousands of cores. This paper demonstrates a Python library, called adFVM, that can be used to construct an explicit unsteady flow solver and derive the corresponding discrete adjoint flow solver using automatic differentiation (AD). The library uses a two-level computational graph method for representing the structure of both solvers. The library translates this structure into a sequence of optimized kernels, significantly reducing its execution time and memory footprint. Kernels can be generated for heterogeneous architectures including distributed memory, shared memory and accelerator based systems. The library is used to write a finite volume based compressible flow solver. A wall clock time comparison between different flow solvers and adjoint flow solvers built using this library and state of the art graph based AD libraries is presented on a turbomachinery flow problem. Performance analysis of the flow solvers is carried out for CPUs and GPUs. Results of strong and weak scaling of the flow solver and its adjoint are demonstrated on subsonic flow in a periodic box.  相似文献   

20.

In this paper, we present several important details in the process of legacy code parallelization, mostly related to the problem of maintaining numerical output of a legacy code while obtaining a balanced workload for parallel processing. Since we maintained the non-uniform mesh imposed by the original finite element code, we have to develop a specially designed data distribution among processors so that data restrictions are met in the finite element method. In particular, we introduce a data distribution method that is initially used in shared memory parallel processing and obtain better performance than the previous parallel program version. Besides, this method can be extended to other parallel platforms such as distributed memory parallel computers. We present results including several problems related to performance profiling on different (development and production) parallel platforms. The use of new and old parallel computing architectures leads to different behavior of the same code, which in all cases provides better performance in multiprocessor hardware.

  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号