首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 671 毫秒
1.
The solution of large, sparse linear systems is often a dominant phase of computation for simulations based on partial differential equations, which are ubiquitous in scientific and engineering applications. While preconditioned Krylov methods are widely used and offer many advantages for solving sparse linear systems that do not have highly convergent, geometric multigrid solvers or specialized fast solvers, Krylov methods encounter well-known scaling difficulties for over 10,000 processor cores because each iteration requires at least one vector inner product, which in turn requires a global synchronization that scales poorly because of internode latency. To help overcome these difficulties, we have developed hierarchical Krylov methods and nested Krylov methods in the PETSc library that reduce the number of global inner products required across the entire system (where they are expensive), though freely allow vector inner products across smaller subsets of the entire system (where they are inexpensive) or use inner iterations that do not invoke vector inner products at all.  相似文献   

2.
It is well known that the block Krylov subspace solvers work efficiently for some cases of the solution of differential equations with multiple right-hand sides. In lattice QCD calculation of physical quantities on a given configuration demands us to solve the Dirac equation with multiple sources. We show that a new block Krylov subspace algorithm recently proposed by the authors reduces the computational cost significantly without losing numerical accuracy for the solution of the O(a)-improved Wilson-Dirac equation.  相似文献   

3.
In this work, we attempt to answer the question posed in Amir O., Sigmund O.: On reducing computational effort in topology optimization: how far can we go? (Struct. Multidiscip. Optim. 44(1):25–29 2011). Namely, we are interested in assessing how inaccurately we can solve the governing equations during the course of a topology optimization process while still obtaining accurate results. We consider this question from a “PDE-based” angle, using a posteriori residual estimates to gain insight into the behavior of the residuals over the course of Krylov solver iterations. Our main observation is that the residual estimates are dominated by discretization error after only a few iterations of an iterative solver. This provides us with a quantitative measure for early termination of iterative solvers. We illustrate this approach using benchmark examples from linear elasticity and demonstrate that the number of Krylov solver iterations can be significantly reduced, even when compared to previous heuristic recommendations, although each Krylov iteration becomes considerably more expensive.  相似文献   

4.
Modern graphics hardware is designed for highly parallel numerical tasks and promises significant cost and performance benefits for many scientific applications. One such application is lattice quantum chromodynamics (lattice QCD), where the main computational challenge is to efficiently solve the discretized Dirac equation in the presence of an SU(3) gauge field. Using NVIDIA's CUDA platform we have implemented a Wilson-Dirac sparse matrix-vector product that performs at up to 40, 135 and 212 Gflops for double, single and half precision respectively on NVIDIA's GeForce GTX 280 GPU. We have developed a new mixed precision approach for Krylov solvers using reliable updates which allows for full double precision accuracy while using only single or half precision arithmetic for the bulk of the computation. The resulting BiCGstab and CG solvers run in excess of 100 Gflops and, in terms of iterations until convergence, perform better than the usual defect-correction approach for mixed precision.  相似文献   

5.
In the numerical solution of large‐scale eigenvalue problems, Davidson‐type methods are an increasingly popular alternative to Krylov eigensolvers. The main motivation is to avoid the expensive factorizations that are often needed by Krylov solvers when the problem is generalized or interior eigenvalues are desired. In Davidson‐type methods, the factorization is replaced by iterative linear solvers that can be accelerated by a smart preconditioner. Jacobi–Davidson is one of the most effective variants. However, parallel implementations of this method are not widely available, particularly for non‐symmetric problems. We present a parallel implementation that has been included in SLEPc, the Scalable Library for Eigenvalue Problem Computations, and test it in the context of a highly scalable plasma turbulence simulation code. We analyze its parallel efficiency and compare it with a Krylov–Schur eigensolver. Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   

6.
A High Performance Computing alternative to traditional Krylov subspace methods, pipelined Krylov subspace solvers offer better scalability in the strong scaling limit compared to standard Krylov subspace methods for large and sparse linear systems. The typical synchronization bottleneck is mitigated by overlapping time-consuming global communication phases with local computations in the algorithm. This paper describes a general framework for deriving the pipelined variant of any Krylov subspace algorithm. The proposed framework was implicitly used to derive the pipelined Conjugate Gradient (p-CG) method in Hiding global synchronization latency in the preconditioned Conjugate Gradient algorithm by P. Ghysels and W. Vanroose, Parallel Computing, 40(7):224–238, 2014. The pipelining framework is subsequently illustrated by formulating a pipelined version of the BiCGStab method for the solution of large unsymmetric linear systems on parallel hardware. A residual replacement strategy is proposed to account for the possible loss of attainable accuracy and robustness by the pipelined BiCGStab method. It is shown that the pipelined algorithm improves scalability on distributed memory machines, leading to significant speedups compared to standard preconditioned BiCGStab.  相似文献   

7.
In this paper, we develop, study and implement iterative linear solvers and preconditioners using multiple graphical processing units (GPUs). Techniques for accelerating sparse matrix–vector (SpMV) multiplication, linear solvers and preconditioners are presented. Four Krylov subspace solvers, a Neumann polynomial preconditioner and a domain decomposition preconditioner are implemented. Our numerical tests with NVIDIA C2050 GPUs show that the SpMV kernel can be sped over 40 times faster using four GPUs. Our linear solvers and preconditioners have similar speedup.  相似文献   

8.
In this study, we introduce cost effective strategies and algorithms for parallelizing the Krylov subspace based non-stationary iterative solvers such as Bi-CGM and Bi-CGSTAB for distributed computing on a cluster of PCs using ANULIB message passing libraries. We investigate the effectiveness of the parallel solvers on the linear systems resulting in numerical solution of some 2D and 3D nonlinear partial differential equations governing heat convection process by finite element, finite difference and wavelet based numerical schemes. Largely Bi-CGM is found to give better performance measured in terms of speedup factors.  相似文献   

9.
The parallelization of sophisticated applications has dramatically increased in recent years. As machine capabilities rise, greater emphasis on modeling complex phenomena can be expected. Many of these applications require the solution of large sparse matrix equations which approximate systems of partial differential equations (PDEs). Therefore we consider parallel iterative solvers for large sparse non-symmetric systems and issues related to parallel sparse matrix software. We describe a collection of parallel iterative solvers which use a distributed sparse matrix format that facilitates the interface between specific applications and a variety of Krylov subspace techniques and multigrid methods. These methods have been used to solve a number of linear and non-linear PDE problems on a 1024-processor NCUBE 2 hypercube. Over 1 Gflop sustained computation rates are achieved with many of these solvers, demonstrating that high performance can be attained even when using sparse matrix data structures.  相似文献   

10.
Multidisciplinary engineering systems are usually modeled by coupling software components that were developed for each discipline independently. The use of disparate solvers complicates the optimization of multidisciplinary systems and has been a long-standing motivation for optimization architectures that support modularity. The individual discipline feasible (IDF) formulation is particularly attractive in this respect. IDF achieves modularity by introducing optimization variables and constraints that effectively decouple the disciplinary solvers during each optimization iteration. Unfortunately, the number of variables and constraints can be significant, and the IDF constraint Jacobian required by most conventional optimization algorithms is prohibitively expensive to compute. Furthermore, limited-memory quasi-Newton approximations, commonly used for large-scale problems, exhibit linear convergence rates that can struggle with the large number of design variables introduced by the IDF formulation. In this work, we show that these challenges can be overcome using a reduced-space inexact-Newton-Krylov algorithm. The proposed algorithm avoids the need for the explicit constraint Jacobian and Hessian by using a Krylov iterative method to solve the Newton steps. The Krylov method requires matrix-vector products, which can be evaluated in a matrix-free manner using second-order adjoints. The Krylov method also needs to be preconditioned, and a key contribution of this work is a novel and effective preconditioner that is based on approximating a monolithic solution of the (linearized) multidisciplinary system. We demonstrate the efficacy of the algorithm by comparing it with the popular multidisciplinary feasible formulation on two test problems.  相似文献   

11.
The Laplace–Beltrami system of nonlinear, elliptic, partial differential equations has utility in the generation of computational grids on complex and highly curved geometry. Discretization of this system using the finite-element method accommodates unstructured grids, but generates a large, sparse, ill-conditioned system of nonlinear discrete equations. The use of the Laplace–Beltrami approach, particularly in large-scale applications, has been limited by the scalability and efficiency of solvers. This paper addresses this limitation by developing two nonlinear solvers based on the Jacobian-Free Newton–Krylov (JFNK) methodology. A key feature of these methods is that the Jacobian is not formed explicitly for use by the underlying linear solver. Iterative linear solvers such as the Generalized Minimal RESidual (GMRES) method do not technically require the stand-alone Jacobian; instead its action on a vector is approximated through two nonlinear function evaluations. The preconditioning required by GMRES is also discussed. Two different preconditioners are developed, both of which employ existing Algebraic Multigrid (AMG) methods. Further, the most efficient preconditioner, overall, for the problems considered is based on a Picard linearization. Numerical examples demonstrate that these solvers are significantly faster than a standard Newton–Krylov approach; a speedup factor of approximately 26 was obtained for the Picard preconditioner on the largest grids studied here. In addition, these JFNK solvers exhibit good algorithmic scaling with increasing grid size.  相似文献   

12.
In this paper we intend to establish fast numerical approaches to solve a class of initial-boundary problem of time-space fractional convection–diffusion equations. We present a new unconditionally stable implicit difference method, which is derived from the weighted and shifted Grünwald formula, and converges with the second-order accuracy in both time and space variables. Then, we show that the discretizations lead to Toeplitz-like systems of linear equations that can be efficiently solved by Krylov subspace solvers with suitable circulant preconditioners. Each time level of these methods reduces the memory requirement of the proposed implicit difference scheme from \({\mathcal {O}}(N^2)\) to \({\mathcal {O}}(N)\) and the computational complexity from \({\mathcal {O}}(N^3)\) to \({\mathcal {O}}(N\log N)\) in each iterative step, where N is the number of grid nodes. Extensive numerical examples are reported to support our theoretical findings and show the utility of these methods over traditional direct solvers of the implicit difference method, in terms of computational cost and memory requirements.  相似文献   

13.
In this work, we analyze the scalability of inexact two-level balancing domain decomposition by constraints (BDDC) preconditioners for Krylov subspace iterative solvers, when using a highly scalable asynchronous parallel implementation where fine and coarse correction computations are overlapped in time. This way, the coarse-grid problem can be fully overlapped by fine-grid computations (which are embarrassingly parallel) in a wide range of cases. Further, we consider inexact solvers to reduce the computational cost/complexity and memory consumption of coarse and local problems and boost the scalability of the solver. Out of our numerical experimentation, we conclude that the BDDC preconditioner is quite insensitive to inexact solvers. In particular, one cycle of algebraic multigrid (AMG) is enough to attain algorithmic scalability. Further, the clear reduction of computing time and memory requirements of inexact solvers compared to sparse direct ones makes possible to scale far beyond state-of-the-art BDDC implementations. Excellent weak scalability results have been obtained with the proposed inexact/overlapped implementation of the two-level BDDC preconditioner, up to 93,312 cores and 20 billion unknowns on JUQUEEN. Further, we have also applied the proposed setting to unstructured meshes and partitions for the pressure Poisson solver in the backward-facing step benchmark domain.  相似文献   

14.
Efficient algorithms for the solution of partial differential equations on parallel computers are often based on domain decomposition methods. Schwarz preconditioners combined with standard Krylov space solvers are widely used in this context, and such a combination is shown here to perform very well in the case of the Wilson-Dirac equation in lattice QCD. In particular, with respect to even-odd preconditioned solvers, the communication overhead is significantly reduced, which allows the computational work to be distributed over a large number of processors with only small parallelization losses.  相似文献   

15.
In this paper we survey the development of fast iterative solvers aimed at solving 2D/3D Helmholtz problems. In the first half of the paper, a survey on some recently developed methods is given. The second half of the paper focuses on the development of the shifted Laplacian preconditioner used to accelerate the convergence of Krylov subspace methods applied to the Helmholtz equation. Numerical examples are given for some difficult problems, which had not been solved iteratively before.  相似文献   

16.
Noisy optimization is the optimization of objective functions corrupted by noise. A portfolio of solvers is a set of solvers equipped with an algorithm selection tool for distributing the computational power among them. Portfolios are widely and successfully used in combinatorial optimization. In this work, we study portfolios of noisy optimization solvers. We obtain mathematically proved performance (in the sense that the portfolio performs nearly as well as the best of its solvers) by an ad hoc portfolio algorithm dedicated to noisy optimization. A somehow surprising result is that it is better to compare solvers with some lag, i.e., propose the current recommendation of best solver based on their performance earlier in the run. An additional finding is a principled method for distributing the computational power among solvers in the portfolio.  相似文献   

17.
The overlap operator in lattice QCD requires the computation of the sign function of a matrix, which is non-Hermitian in the presence of a quark chemical potential. In previous work we introduced an Arnoldi-based Krylov subspace approximation, which uses long recurrences. Even after the deflation of critical eigenvalues, the low efficiency of the method restricts its application to small lattices. Here we propose new short-recurrence methods which strongly enhance the efficiency of the computational method. Using rational approximations to the sign function we introduce two variants, based on the restarted Arnoldi process and on the two-sided Lanczos method, respectively, which become very efficient when combined with multishift solvers. Alternatively, in the variant based on the two-sided Lanczos method the sign function can be evaluated directly. We present numerical results which compare the efficiencies of a restarted Arnoldi-based method and the direct two-sided Lanczos approximation for various lattice sizes. We also show that our new methods gain substantially when combined with deflation.  相似文献   

18.
19.
There are verities of useful Krylov subspace methods to solve nonsymmetric linear system of equations. GMRES is one of the best Krylov solvers with several different variants to solve large sparse linear systems. Any GMRES implementation has some advantages. As the solution of ill-posed problems are important. In this paper, some GMRES variants are discussed and applied to solve these kinds of problems. Residual smoothing techniques are efficient ways to accelerate the convergence speed of some iterative methods like CG variants. At the end of this paper, some residual smoothing techniques are applied for different GMRES methods to test the influence of these techniques on GMRES implementations.  相似文献   

20.
Krylov subspace spectral methods have been shown to be high-order accurate in time and more stable than explicit time-stepping methods, but also more difficult to implement efficiently. This paper describes how these methods can be fashioned into practical solvers by exploiting the simple structure of differential operators Numerical results concerning accuracy and efficiency are presented for parabolic problems in one and two space dimensions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号