首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Scalability of Krylov subspace methods suffers from costly global synchronization steps that arise in dot-products and norm calculations on parallel machines. In this work, a modified preconditioned Conjugate Gradient (CG) method is presented that removes the costly global synchronization steps from the standard CG algorithm by only performing a single non-blocking reduction per iteration. This global communication phase can be overlapped by the matrix–vector product, which typically only requires local communication. The resulting algorithm will be referred to as pipelined CG. An alternative pipelined method, mathematically equivalent to the Conjugate Residual (CR) method that makes different trade-offs with regard to scalability and serial runtime is also considered. These methods are compared to a recently proposed asynchronous CG algorithm by Gropp. Extensive numerical experiments demonstrate the numerical stability of the methods. Moreover, it is shown that hiding the global synchronization step improves scalability on distributed memory machines using the message passing paradigm and leads to significant speedups compared to standard preconditioned CG.  相似文献   

2.
The solution of large, sparse linear systems is often a dominant phase of computation for simulations based on partial differential equations, which are ubiquitous in scientific and engineering applications. While preconditioned Krylov methods are widely used and offer many advantages for solving sparse linear systems that do not have highly convergent, geometric multigrid solvers or specialized fast solvers, Krylov methods encounter well-known scaling difficulties for over 10,000 processor cores because each iteration requires at least one vector inner product, which in turn requires a global synchronization that scales poorly because of internode latency. To help overcome these difficulties, we have developed hierarchical Krylov methods and nested Krylov methods in the PETSc library that reduce the number of global inner products required across the entire system (where they are expensive), though freely allow vector inner products across smaller subsets of the entire system (where they are inexpensive) or use inner iterations that do not invoke vector inner products at all.  相似文献   

3.
Exascale computers are expected to have highly hierarchical architectures with nodes composed by multiple core processors (CPU; central processing unit) and accelerators (GPU; graphics processing unit). The different programming levels generate new difficult algorithm issues. In particular when solving extremely large linear systems, new programming paradigms of Krylov methods should be defined and evaluated with respect to modern state of the art of scientific methods. Iterative Krylov methods involve linear algebra operations such as dot product, norm, addition of vectors and sparse matrix–vector multiplication. These operations are computationally expensive for large size matrices. In this paper, we aim to focus on the best way to perform effectively these operations, in double precision, on GPU in order to make iterative Krylov methods more robust and therefore reduce the computing time. The performance of our algorithms is evaluated on several matrices arising from engineering problems. Numerical experiments illustrate the robustness and accuracy of our implementation compared to the existing libraries. We deal with different preconditioned Krylov methods: Conjugate Gradient for symmetric positive-definite matrices, and Generalized Conjugate Residual, Bi-Conjugate Gradient Conjugate Residual, transpose-free Quasi Minimal Residual, Stabilized BiConjugate Gradient and Stabilized BiConjugate Gradient (L) for the solution of sparse linear systems with non symmetric matrices. We consider and compare several sparse compressed formats, and propose a way to implement effectively Krylov methods on GPU and on multicore CPU. Finally, we give strategies to faster algorithms by auto-tuning the threading design, upon the problem characteristics and the hardware changes. As a conclusion, we propose and analyse hybrid sub-structuring methods that should pave the way to exascale hybrid methods.  相似文献   

4.
This paper presents theoretical foundations of global Krylov subspace methods for model order reductions. This method is an extension of the standard Krylov subspace method for multiple-inputs multiple-outputs (MIMO) systems. By employing the congruence transformation with global Krylov subspaces, both one-sided Arnoldi and two-sided Lanczos oblique projection methods are explored for both single expansion point and multiple expansion points. In order to further reduce the computational complexity for multiple expansion points, adaptive-order multiple points moment matching algorithms, or the so-called rational Krylov space method, are also studied. Two algorithms, including the adaptive-order rational global Arnoldi (AORGA) algorithm and the adaptive-order global Lanczos (AOGL) algorithm, are developed in detail. Simulations of practical dynamical systems will be conducted to illustrate the feasibility and the efficiency of proposed methods.  相似文献   

5.
通过改变CR算法的计算次序。提出了一种改进的共轭剩余(ICR)算法.对比CR算法。ICR算法的数值稳定性和CR算法相同,几乎没有增加计算量。但考虑了在MIMD并行机上实现时并行算法的性能,其同步开销减少为CR算法的一半,并且所有内积计算以及矩阵向量乘是独立的,没有数据相关性。可以进行计算与通信的重叠.从理论和实验两个角度来讨论ICR算法的性能,当处理机台数较多时ICR算法的计算速度快于CR算法.在64台处理机机群上进行的数值实验表明,并行ICR算法的计算速度大约比CR算法快30%.  相似文献   

6.
The main purpose of this paper is to develop stable versions of some Krylov subspace methods for solving the linear systems of equations Ax = b which arise in the difference solution of 2-D nonstationary Navier-Stokes equations using implicit scheme and to determine a good value of the time step. Our algorithms are based on the conjugate-gradient method with a suitable preconditioner for solving the symmetric positive definite system and preconditioned GMRES, Orthomin(K), QMR methods for solving the nonsymmetric and (in)definite system. The performance of these methods is compared. In addition, we show that by using the condition number of the first nonsymmetric coefficient matrix, it is possible to determine a good value of the time step.  相似文献   

7.
We experimentally study how reordering techniques affect the rate of convergence of preconditioned Krylov subspace methods for non-symmetric sparse linear systems, where the preconditioner is a sparse approximate inverse. In addition, we show how the reordering reduces the number of entries in the approximate inverse and thus, the amount of storage and computation required for a given accuracy. These properties are illustrated with several numerical experiments taken from the discretization of PDEs by a finite element method and from a standard matrix collection.  相似文献   

8.
For the structured systems of linear equations arising from the Galerkin finite element discretizations of elliptic PDE-constrained optimization problems, some preconditioners are proposed to accelerate the convergence rate of Krylov subspace methods such as GMRES for both cases of the Tikhonov parameter β not very small (equal or greater than 1e?6) and sufficiently small (less than 1e?6), respectively. We derive the explicit expressions for the eigenvalues and eigenvectors of the corresponding preconditioned matrices. Numerical results show that the corresponding preconditioned GMRES methods perform and match well with the theoretical results.  相似文献   

9.
A fast preconditioned penalty method is developed for a system of parabolic linear complementarity problems (LCPs) involving tempered fractional order partial derivatives governing the price of American options whose underlying asset follows a geometry Lévy process with multi-state regime switching. By means of the penalty method, the system of LCPs is approximated with a penalty term by a system of nonlinear tempered fractional partial differential equations (TFPDEs) coupled by a finite-state Markov chain. The system of nonlinear TFPDEs is discretized with the shifted Grünwald approximation by an upwind finite difference scheme which is shown to be unconditionally stable. Semi-smooth Newton’s method is utilized to solve the finite difference scheme as an outer iterative method in which the Jacobi matrix is found to possess Toeplitz-plus-diagonal structure. Consequently, the resulting linear system can be fast solved by the Krylov subspace method as an inner iterative method via fast Fourier transform (FFT). Furthermore, a novel preconditioner is proposed to speed up the convergence rate of the inner Krylov subspace iteration with theoretical analysis. With the above-mentioned preconditioning technique via FFT, under some mild conditions, the operation cost in each Newton’s step can be expected to be \(\mathcal{O}(N\mathrm{log}N)\), where N is the size of the coefficient matrix. Numerical examples are given to demonstrate the accuracy and efficiency of our proposed fast preconditioned penalty method.  相似文献   

10.
Two Krylov subspace methods, the GMRES and the BiCGSTAB, are analyzed for solving the linear systems arising from the mixed finite element discretization of the discrete ordinates radiative transfer equation. To increase their convergence rate and stability, the Jacobi and block Jacobi methods are used as preconditioners for both Krylov subspace methods. Numerical experiments, designed to test the effectiveness of the (preconditioned) GMRES and the BiCGSTAB, are performed on various radiative transfer problems: (i) transparent, (ii) absorption dominant, (iii) scattering dominant, and (iv) with specular reflection. It is observed that the BiCGSTAB is superior to the GMRES, with lower iteration counts, solving times, and memory consumption. In particular, the BiCGSTAB preconditioned by the block Jacobi method performed best amongst the set of other solvers. To better understand the discrete systems for radiative problems (i) to (iv), an eigenvalue spectrum analysis has also been performed. It revealed that the linear system conditioning deteriorates for scattering media problems in comparison to absorbing or transparent media problems. This conditioning further deteriorates when reflection is involved.  相似文献   

11.
Zhong-Zhi Bai 《Computing》2011,91(4):379-395
For the structured systems of linear equations arising from the Galerkin finite-element discretizations of the distributed control problems, we construct block-counter-diagonal and block-counter-tridiagonal preconditioning matrices to precondition the Krylov subspace methods such as GMRES. We derive explicit expressions for the eigenvalues and eigenvectors of the corresponding preconditioned matrices. Numerical implementations show that these structured preconditioners may lead to satisfactory experimental results of the preconditioned GMRES methods when the regularization parameter is suitably small.  相似文献   

12.
In this paper, we introduce and analyze a modification of the Hermitian and skew-Hermitian splitting iteration method for solving a broad class of complex symmetric linear systems. We show that the modified Hermitian and skew-Hermitian splitting (MHSS) iteration method is unconditionally convergent. Each iteration of this method requires the solution of two linear systems with real symmetric positive definite coefficient matrices. These two systems can be solved inexactly. We consider acceleration of the MHSS iteration by Krylov subspace methods. Numerical experiments on a few model problems are used to illustrate the performance of the new method.  相似文献   

13.
In this paper, a fast preconditioned Krylov subspace iterative algorithm is proposed for the electromagnetic scattering from a rectangular large open cavity embedded in an infinite ground plane. The scattering problem is described by the Helmholtz equation with a nonlocal artificial boundary condition on the aperture of the cavity and Dirichlet boundary conditions on the walls of the cavity. Compact fourth order finite difference schemes are employed to discretize the bounded domain problem. A much smaller interface discrete system is reduced by introducing the discrete Fourier transformation in the horizontal and a Gaussian elimination in the vertical direction, presented in Bao and Sun (SIAM J. Sci. Comput. 27:553, 2005). An effective preconditioner is developed for the Krylov subspace iterative solver to solve this interface system. Numerical results demonstrate the remarkable efficiency and accuracy of the proposed method.  相似文献   

14.
It is well known that the block Krylov subspace solvers work efficiently for some cases of the solution of differential equations with multiple right-hand sides. In lattice QCD calculation of physical quantities on a given configuration demands us to solve the Dirac equation with multiple sources. We show that a new block Krylov subspace algorithm recently proposed by the authors reduces the computational cost significantly without losing numerical accuracy for the solution of the O(a)-improved Wilson-Dirac equation.  相似文献   

15.
提出了一种预条件的平方Smith算法求解大型连续Sylvester矩阵方程,该算法利用交替方向隐式迭代(ADI)来构造预条件算子,将原方程转换为非对称Stein方程,并在Krylov子空间中应用平方Smith法迭代产生低秩逼近解。数值实验表明,与已知的Jacobi迭代法等算法相比,该算法有更好的迭代效率和收敛精度。  相似文献   

16.
针对孔隙介质中地下水流动问题提出了一种并行数值计算方法,并基于此设计了一套专用于求解大规模三维地下水流动方程的并行计算模块。计算模块基于区域分解的方法实现对模型区域的并行求解,采用了分布式内存和压缩矩阵技术解决大规模稀疏矩阵的存储及其计算,整合多种并行Krylov子空间方法和预条件子技术迭代求解大规模线性方程组。在Linux集群系统上进行了数值模拟实验,性能测试结果表明,程序具有良好的加速比和可扩展性。  相似文献   

17.
The parallelizable block ILU (incomplete LU) factorization preconditioners for a block-tridiagonal matrix have been recently proposed by the author. In this paper, we describe a parallelization of Krylov subspace methods with the block ILU factorization preconditioners on distributed-memory computers such as the Cray T3E, and then parallel performance results of a preconditioned Krylov subspace method are provided to evaluate the effectiveness and efficiency of the block ILU preconditioners on the Cray T3E.  相似文献   

18.
The main idea of this paper is in determination of the pattern of nonzero elements of the LU factors of a given matrix A. The idea is based on taking the powers of the Boolean matrix derived from A. This powers of a Boolean matrix strategy (PBS) is an efficient, effective, and inexpensive approach. Construction of an ILU preconditioner using PBS is described and used in solving large nonsymmetric sparse linear systems. Effectiveness of the proposed ILU preconditioner in solving large nonsymmetric sparse linear systems by the GMRES method is also shown. Numerical experiments are performed which show that it is possible to considerably reduce the number of GMRES iterations when the ILU preconditioner constructed here is used. In numerical examples, the influence of k, the dimension of the Krylov subspace, on the performance of the GMRES method using an ILU preconditioner is tested. For all the tests carried out, the best value for k is found to be 10.  相似文献   

19.
J. M. Martínez 《Computing》1987,39(4):307-325
We introduce a new method for solving Nonlinear Least Squares problems when the Jacobian matrix of the system is large and sparse. The main features of the new method are the following:
  1. The Gauss-Newton equation is “partially” solved at each iteration using a preconditioned Conjugate Gradient algorithm.
  2. The new point is obtained using a two-dimensional trust region scheme, similar to the one introduced by Bulteau and Vial.
We prove global and local convergence results and we present some numerical experiments.  相似文献   

20.
In this paper, we consider the solution of the saddle point linear systems arising from the finite element discretization of the time-harmonic Maxwell equations in mixed form. Two types of block triangular Schur complement-free preconditioners used with Krylov subspace methods are proposed, involving the choice of the parameter. Furthermore, we give the optimal parameter in practice. Theoretical analysis shows that all eigenvalues of the preconditioned matrices are strongly clustered. Finally, numerical experiments that validate the analysis are presented.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号