期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Parallel implementations of the trace minimization scheme TraceMIN for the sparse symmetric eigenvalue problem

A. Klinvex F. Saied A. Sameh 《Computers & Mathematics with Applications》2013,65(3):460-468

Eigenvalue problems arise in many computational science and engineering applications: in structural mechanics, nanoelectronics, and Google’s PageRank link analysis, for example. Often, the large size of these eigenvalue problems requires the development of eigensolvers that scale well on parallel computing platforms. In this paper, we compare the effectiveness and robustness of our eigensolver for the symmetric generalized eigenvalue problem, the trace minimization scheme TraceMIN–developed in the early 1980s–against today’s well-known sparse eigensolvers including: the LOBPCG and block Krylov–Schur implementations in Trilinos; ARPACK; and several methods in the PRIMME package such as the Jacobi–Davidson one. In addition, we demonstrate the parallel scalability of two variants of TraceMIN on multicore nodes as well as on large clusters of such nodes. Our results show that TraceMIN is more robust and has higher parallel scalability than the above-mentioned competing eigensolvers. 相似文献

2.

Sparse iterative algorithm software for large-scale MIMD machines: An initial discussion and implementation

John N. Shadid Ray S. Tuminaro 《Concurrency and Computation》1992,4(6):481-497

The parallelization of sophisticated applications has dramatically increased in recent years. As machine capabilities rise, greater emphasis on modeling complex phenomena can be expected. Many of these applications require the solution of large sparse matrix equations which approximate systems of partial differential equations (PDEs). Therefore we consider parallel iterative solvers for large sparse non-symmetric systems and issues related to parallel sparse matrix software. We describe a collection of parallel iterative solvers which use a distributed sparse matrix format that facilitates the interface between specific applications and a variety of Krylov subspace techniques and multigrid methods. These methods have been used to solve a number of linear and non-linear PDE problems on a 1024-processor NCUBE 2 hypercube. Over 1 Gflop sustained computation rates are achieved with many of these solvers, demonstrating that high performance can be attained even when using sparse matrix data structures. 相似文献

3.

The communication-hiding pipelined BiCGstab method for the parallel solution of large unsymmetric linear systems

《Parallel Computing》2017

A High Performance Computing alternative to traditional Krylov subspace methods, pipelined Krylov subspace solvers offer better scalability in the strong scaling limit compared to standard Krylov subspace methods for large and sparse linear systems. The typical synchronization bottleneck is mitigated by overlapping time-consuming global communication phases with local computations in the algorithm. This paper describes a general framework for deriving the pipelined variant of any Krylov subspace algorithm. The proposed framework was implicitly used to derive the pipelined Conjugate Gradient (p-CG) method in Hiding global synchronization latency in the preconditioned Conjugate Gradient algorithm by P. Ghysels and W. Vanroose, Parallel Computing, 40(7):224–238, 2014. The pipelining framework is subsequently illustrated by formulating a pipelined version of the BiCGStab method for the solution of large unsymmetric linear systems on parallel hardware. A residual replacement strategy is proposed to account for the possible loss of attainable accuracy and robustness by the pipelined BiCGStab method. It is shown that the pipelined algorithm improves scalability on distributed memory machines, leading to significant speedups compared to standard preconditioned BiCGStab. 相似文献

4.

Efficient nonlinear solvers for Laplace–Beltrami smoothing of three-dimensional unstructured grids

Markus Berndt J. David Moulton Glen Hansen 《Computers & Mathematics with Applications》2008,55(12):2791-2806

The Laplace–Beltrami system of nonlinear, elliptic, partial differential equations has utility in the generation of computational grids on complex and highly curved geometry. Discretization of this system using the finite-element method accommodates unstructured grids, but generates a large, sparse, ill-conditioned system of nonlinear discrete equations. The use of the Laplace–Beltrami approach, particularly in large-scale applications, has been limited by the scalability and efficiency of solvers. This paper addresses this limitation by developing two nonlinear solvers based on the Jacobian-Free Newton–Krylov (JFNK) methodology. A key feature of these methods is that the Jacobian is not formed explicitly for use by the underlying linear solver. Iterative linear solvers such as the Generalized Minimal RESidual (GMRES) method do not technically require the stand-alone Jacobian; instead its action on a vector is approximated through two nonlinear function evaluations. The preconditioning required by GMRES is also discussed. Two different preconditioners are developed, both of which employ existing Algebraic Multigrid (AMG) methods. Further, the most efficient preconditioner, overall, for the problems considered is based on a Picard linearization. Numerical examples demonstrate that these solvers are significantly faster than a standard Newton–Krylov approach; a speedup factor of approximately 26 was obtained for the Picard preconditioner on the largest grids studied here. In addition, these JFNK solvers exhibit good algorithmic scaling with increasing grid size. 相似文献

5.

Solution of the Dirac equation in lattice QCD using a domain decomposition method

Martin Lüscher 《Computer Physics Communications》2004,156(3):209-220

Efficient algorithms for the solution of partial differential equations on parallel computers are often based on domain decomposition methods. Schwarz preconditioners combined with standard Krylov space solvers are widely used in this context, and such a combination is shown here to perform very well in the case of the Wilson-Dirac equation in lattice QCD. In particular, with respect to even-odd preconditioned solvers, the communication overhead is significantly reduced, which allows the computational work to be distributed over a large number of processors with only small parallelization losses. 相似文献

6.

A Parallel Domain Decomposition Method for 3D Unsteady Incompressible Flows at High Reynolds Number

Rongliang Chen Yuqi Wu Zhengzheng Yan Yubo Zhao Xiao-Chuan Cai 《Journal of scientific computing》2014,58(2):275-289

Numerical simulation of three-dimensional incompressible flows at high Reynolds number using the unsteady Navier–Stokes equations is challenging. In order to obtain accurate simulations, very fine meshes are necessary, and such simulations are increasingly important for modern engineering practices, such as understanding the flow behavior around high speed trains, which is the target application of this research. To avoid the time step size constraint imposed by the CFL number and the fine spacial mesh size, we investigate some fully implicit methods, and focus on how to solve the large nonlinear system of equations at each time step on large scale parallel computers. In most of the existing implicit Navier–Stokes solvers, segregated velocity and pressure treatment is employed. In this paper, we focus on the Newton–Krylov–Schwarz method for solving the monolithic nonlinear system arising from the fully coupled finite element discretization of the Navier–Stokes equations on unstructured meshes. In the subdomain, LU or point-block ILU is used as the local solver. We test the algorithm for some three-dimensional complex unsteady flows, including flows passing a high speed train, on a supercomputer with thousands of processors. Numerical experiments show that the algorithm has superlinear scalability with over three thousand processors for problems with tens of millions of unknowns. 相似文献

7.

Matrix-free algorithm for the optimization of multidisciplinary systems

Alp Dener Jason E. Hicken 《Structural and Multidisciplinary Optimization》2017,56(6):1429-1446

Multidisciplinary engineering systems are usually modeled by coupling software components that were developed for each discipline independently. The use of disparate solvers complicates the optimization of multidisciplinary systems and has been a long-standing motivation for optimization architectures that support modularity. The individual discipline feasible (IDF) formulation is particularly attractive in this respect. IDF achieves modularity by introducing optimization variables and constraints that effectively decouple the disciplinary solvers during each optimization iteration. Unfortunately, the number of variables and constraints can be significant, and the IDF constraint Jacobian required by most conventional optimization algorithms is prohibitively expensive to compute. Furthermore, limited-memory quasi-Newton approximations, commonly used for large-scale problems, exhibit linear convergence rates that can struggle with the large number of design variables introduced by the IDF formulation. In this work, we show that these challenges can be overcome using a reduced-space inexact-Newton-Krylov algorithm. The proposed algorithm avoids the need for the explicit constraint Jacobian and Hessian by using a Krylov iterative method to solve the Newton steps. The Krylov method requires matrix-vector products, which can be evaluated in a matrix-free manner using second-order adjoints. The Krylov method also needs to be preconditioned, and a key contribution of this work is a novel and effective preconditioner that is based on approximating a monolithic solution of the (linearized) multidisciplinary system. We demonstrate the efficacy of the algorithm by comparing it with the popular multidisciplinary feasible formulation on two test problems. 相似文献

8.

Hierarchical Krylov and nested Krylov methods for extreme-scale computing

Lois Curfman McInnes Barry Smith Hong Zhang Richard Tran Mills 《Parallel Computing》2014

The solution of large, sparse linear systems is often a dominant phase of computation for simulations based on partial differential equations, which are ubiquitous in scientific and engineering applications. While preconditioned Krylov methods are widely used and offer many advantages for solving sparse linear systems that do not have highly convergent, geometric multigrid solvers or specialized fast solvers, Krylov methods encounter well-known scaling difficulties for over 10,000 processor cores because each iteration requires at least one vector inner product, which in turn requires a global synchronization that scales poorly because of internode latency. To help overcome these difficulties, we have developed hierarchical Krylov methods and nested Krylov methods in the PETSc library that reduce the number of global inner products required across the entire system (where they are expensive), though freely allow vector inner products across smaller subsets of the entire system (where they are inexpensive) or use inner iterations that do not invoke vector inner products at all. 相似文献

9.

GMRES implementations and residual smoothing techniques for solving ill-posed linear systems

M. MatinfarH. Zareamoghaddam M. EslamiM. Saeidy 《Computers & Mathematics with Applications》2012,63(1):1-13

There are verities of useful Krylov subspace methods to solve nonsymmetric linear system of equations. GMRES is one of the best Krylov solvers with several different variants to solve large sparse linear systems. Any GMRES implementation has some advantages. As the solution of ill-posed problems are important. In this paper, some GMRES variants are discussed and applied to solve these kinds of problems. Residual smoothing techniques are efficient ways to accelerate the convergence speed of some iterative methods like CG variants. At the end of this paper, some residual smoothing techniques are applied for different GMRES methods to test the influence of these techniques on GMRES implementations. 相似文献

10.

On the scalability of inexact balancing domain decomposition by constraints with overlapped coarse/fine corrections

《Parallel Computing》2015

In this work, we analyze the scalability of inexact two-level balancing domain decomposition by constraints (BDDC) preconditioners for Krylov subspace iterative solvers, when using a highly scalable asynchronous parallel implementation where fine and coarse correction computations are overlapped in time. This way, the coarse-grid problem can be fully overlapped by fine-grid computations (which are embarrassingly parallel) in a wide range of cases. Further, we consider inexact solvers to reduce the computational cost/complexity and memory consumption of coarse and local problems and boost the scalability of the solver. Out of our numerical experimentation, we conclude that the BDDC preconditioner is quite insensitive to inexact solvers. In particular, one cycle of algebraic multigrid (AMG) is enough to attain algorithmic scalability. Further, the clear reduction of computing time and memory requirements of inexact solvers compared to sparse direct ones makes possible to scale far beyond state-of-the-art BDDC implementations. Excellent weak scalability results have been obtained with the proposed inexact/overlapped implementation of the two-level BDDC preconditioner, up to 93,312 cores and 20 billion unknowns on JUQUEEN. Further, we have also applied the proposed setting to unstructured meshes and partitions for the pressure Poisson solver in the backward-facing step benchmark domain. 相似文献

11.

Non-stationary iterative solvers on a PC cluster

《Advances in Engineering Software》2005,36(6):393-400

In this study, we introduce cost effective strategies and algorithms for parallelizing the Krylov subspace based non-stationary iterative solvers such as Bi-CGM and Bi-CGSTAB for distributed computing on a cluster of PCs using ANULIB message passing libraries. We investigate the effectiveness of the parallel solvers on the linear systems resulting in numerical solution of some 2D and 3D nonlinear partial differential equations governing heat convection process by finite element, finite difference and wavelet based numerical schemes. Largely Bi-CGM is found to give better performance measured in terms of speedup factors. 相似文献

12.

Parallel methods for optimality criteria-based topology optimization

《Computer Methods in Applied Mechanics and Engineering》2005,194(34-35):3637-3667

Topology optimization problems require the repeated solution of finite element problems that are often extremely ill-conditioned due to highly heterogeneous material distributions. This makes the use of iterative linear solvers inefficient unless appropriate preconditioning is used. Even then, the solution time for topology optimization problems is typically very high. These problems are addressed by considering the use of non-overlapping domain decomposition-based parallel methods for the solution of topology optimization problems. The parallel algorithms presented here are based on the solid isotropic material with penalization (SIMP) formulation of the topology optimization problem and use the optimality criteria method for iterative optimization. We consider three parallel linear solvers to solve the equilibrium problem at each step of the iterative optimization procedure. These include two preconditioned conjugate gradient (PCG) methods: one using a diagonal preconditioner and one using an incomplete LU factorization preconditioner with a drop tolerance. A third substructuring solver that employs a hybrid of direct and iterative (PCG) techniques is also studied. This solver is found to be the most effective of the three solvers studied, both in terms of parallel efficiency and in terms of its ability to mitigate the effects of ill-conditioning. In addition to examining parallel linear solvers, we consider the parallelization of the iterative optimality criteria method. To tackle checkerboarding and mesh dependence, we propose a multi-pass filtering technique that limits the number of “ghost” elements that need to be exchanged across interprocessor boundaries. 相似文献

13.

Advances in Iterative Methods and Preconditioners for the Helmholtz Equation 总被引：2，自引：0，他引：2

Yogi A. Erlangga 《Archives of Computational Methods in Engineering》2008,15(1):37-66

In this paper we survey the development of fast iterative solvers aimed at solving 2D/3D Helmholtz problems. In the first half of the paper, a survey on some recently developed methods is given. The second half of the paper focuses on the development of the shifted Laplacian preconditioner used to accelerate the convergence of Krylov subspace methods applied to the Helmholtz equation. Numerical examples are given for some difficult problems, which had not been solved iteratively before. 相似文献

14.

A Jacobi–Davidson method for nonlinear and nonsymmetric eigenproblems

《Computers & Structures》2007,85(17-18):1284-1292

For the nonlinear eigenvalue problem T(λ)x = 0 we consider a Jacobi–Davidson type iterative projection method for computing a few eigenvalues close to a given parameter. We discuss the numerical solution of the projected eigenvalue problems in particular for nonsymmetric systems. We present methods how to prevent the algorithm from converging to the same eigenpair repeatedly. To verify the Jacobi–Davidson method it is applied to a rational eigenvalue problem governing damped vibrations of a structure and to a damped gyroscopic eigenvalue problem. 相似文献

15.

Preconditioned Krylov solvers on GPUs

《Parallel Computing》2017

In this paper, we study the effect of enhancing GPU-accelerated Krylov solvers with preconditioners. We consider the BiCGSTAB, CGS, QMR, and IDR(s) Krylov solvers. For a large set of test matrices, we assess the impact of Jacobi and incomplete factorization preconditioning on the solvers’ numerical stability and time-to-solution performance. We also analyze how the use of a preconditioner impacts the choice of the fastest solver. 相似文献

16.

Practical Implementation of Krylov Subspace Spectral Methods

James V. Lambers 《Journal of scientific computing》2007,32(3):449-476

Krylov subspace spectral methods have been shown to be high-order accurate in time and more stable than explicit time-stepping methods, but also more difficult to implement efficiently. This paper describes how these methods can be fashioned into practical solvers by exploiting the simple structure of differential operators Numerical results concerning accuracy and efficiency are presented for parabolic problems in one and two space dimensions. 相似文献

17.

Scalable large‐scale fluid–structure interaction solvers in the Uintah framework via hybrid task‐based parallelism algorithms

Qingyu Meng Martin Berzins 《Concurrency and Computation》2014,26(7):1388-1407

Uintah is a software framework that provides an environment for solving fluid–structure interaction problems on structured adaptive grids for large‐scale science and engineering problems involving the solution of partial differential equations. Uintah uses a combination of fluid flow solvers and particle‐based methods for solids, together with adaptive meshing and a novel asynchronous task‐based approach with fully automated load balancing. When applying Uintah to fluid–structure interaction problems, the combination of adaptive meshing and the movement of structures through space present a formidable challenge in terms of achieving scalability on large‐scale parallel computers. The Uintah approach to the growth of the number of core counts per socket together with the prospect of less memory per core is to adopt a model that uses MPI to communicate between nodes and a shared memory model on‐node so as to achieve scalability on large‐scale systems. For this approach to be successful, it is necessary to design data structures that large numbers of cores can simultaneously access without contention. This scalability challenge is addressed here for Uintah, by the development of new hybrid runtime and scheduling algorithms combined with novel lock‐free data structures, making it possible for Uintah to achieve excellent scalability for a challenging fluid–structure problem with mesh refinement on as many as 260K cores. Copyright © 2013 John Wiley & Sons, Ltd. 相似文献

18.

Accelerating iterative linear solvers using multiple graphical processing units

Zhangxin Chen Bo Yang 《国际计算机数学杂志》2015,92(7):1422-1438

In this paper, we develop, study and implement iterative linear solvers and preconditioners using multiple graphical processing units (GPUs). Techniques for accelerating sparse matrix–vector (SpMV) multiplication, linear solvers and preconditioners are presented. Four Krylov subspace solvers, a Neumann polynomial preconditioner and a domain decomposition preconditioner are implemented. Our numerical tests with NVIDIA C2050 GPUs show that the SpMV kernel can be sped over 40 times faster using four GPUs. Our linear solvers and preconditioners have similar speedup. 相似文献

19.

Preconditioned Krylov subspace methods for solving radiative transfer problems with scattering and reflection

M.A. Badri P. Jolivet B. Rousseau Y. Favennec 《Computers & Mathematics with Applications》2019,77(6):1453-1465

Two Krylov subspace methods, the GMRES and the BiCGSTAB, are analyzed for solving the linear systems arising from the mixed finite element discretization of the discrete ordinates radiative transfer equation. To increase their convergence rate and stability, the Jacobi and block Jacobi methods are used as preconditioners for both Krylov subspace methods. Numerical experiments, designed to test the effectiveness of the (preconditioned) GMRES and the BiCGSTAB, are performed on various radiative transfer problems: (i) transparent, (ii) absorption dominant, (iii) scattering dominant, and (iv) with specular reflection. It is observed that the BiCGSTAB is superior to the GMRES, with lower iteration counts, solving times, and memory consumption. In particular, the BiCGSTAB preconditioned by the block Jacobi method performed best amongst the set of other solvers. To better understand the discrete systems for radiative problems (i) to (iv), an eigenvalue spectrum analysis has also been performed. It revealed that the linear system conditioning deteriorates for scattering media problems in comparison to absorbing or transparent media problems. This conditioning further deteriorates when reflection is involved. 相似文献

20.

Fast Nonlinear Least Squares Optimization of Large-Scale Semi-Sparse Problems

M. Fratarcangeli D. Bradley A. Gruber G. Zoss T. Beeler 《Computer Graphics Forum》2020,39(2):247-259

Many problems in computer graphics and vision can be formulated as a nonlinear least squares optimization problem, for which numerous off-the-shelf solvers are readily available. Depending on the structure of the problem, however, existing solvers may be more or less suitable, and in some cases the solution comes at the cost of lengthy convergence times. One such case is semi-sparse optimization problems, emerging for example in localized facial performance reconstruction, where the nonlinear least squares problem can be composed of hundreds of thousands of cost functions, each one involving many of the optimization parameters. While such problems can be solved with existing solvers, the computation time can severely hinder the applicability of these methods. We introduce a novel iterative solver for nonlinear least squares optimization of large-scale semi-sparse problems. We use the nonlinear Levenberg-Marquardt method to locally linearize the problem in parallel, based on its first-order approximation. Then, we decompose the linear problem in small blocks, using the local Schur complement, leading to a more compact linear system without loss of information. The resulting system is dense but its size is small enough to be solved using a parallel direct method in a short amount of time. The main benefit we get by using such an approach is that the overall optimization process is entirely parallel and scalable, making it suitable to be mapped onto graphics hardware (GPU). By using our minimizer, results are obtained up to one order of magnitude faster than other existing solvers, without sacrificing the generality and the accuracy of the model. We provide a detailed analysis of our approach and validate our results with the application of performance-based facial capture using a recently-proposed anatomical local face deformation model. 相似文献