期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Hiding global synchronization latency in the preconditioned Conjugate Gradient algorithm

P. Ghysels W. Vanroose 《Parallel Computing》2014

Scalability of Krylov subspace methods suffers from costly global synchronization steps that arise in dot-products and norm calculations on parallel machines. In this work, a modified preconditioned Conjugate Gradient (CG) method is presented that removes the costly global synchronization steps from the standard CG algorithm by only performing a single non-blocking reduction per iteration. This global communication phase can be overlapped by the matrix–vector product, which typically only requires local communication. The resulting algorithm will be referred to as pipelined CG. An alternative pipelined method, mathematically equivalent to the Conjugate Residual (CR) method that makes different trade-offs with regard to scalability and serial runtime is also considered. These methods are compared to a recently proposed asynchronous CG algorithm by Gropp. Extensive numerical experiments demonstrate the numerical stability of the methods. Moreover, it is shown that hiding the global synchronization step improves scalability on distributed memory machines using the message passing paradigm and leads to significant speedups compared to standard preconditioned CG. 相似文献

2.

Hierarchical Krylov and nested Krylov methods for extreme-scale computing

Lois Curfman McInnes Barry Smith Hong Zhang Richard Tran Mills 《Parallel Computing》2014

The solution of large, sparse linear systems is often a dominant phase of computation for simulations based on partial differential equations, which are ubiquitous in scientific and engineering applications. While preconditioned Krylov methods are widely used and offer many advantages for solving sparse linear systems that do not have highly convergent, geometric multigrid solvers or specialized fast solvers, Krylov methods encounter well-known scaling difficulties for over 10,000 processor cores because each iteration requires at least one vector inner product, which in turn requires a global synchronization that scales poorly because of internode latency. To help overcome these difficulties, we have developed hierarchical Krylov methods and nested Krylov methods in the PETSc library that reduce the number of global inner products required across the entire system (where they are expensive), though freely allow vector inner products across smaller subsets of the entire system (where they are inexpensive) or use inner iterations that do not invoke vector inner products at all. 相似文献

3.

Auto-tuned Krylov methods on cluster of graphics processing unit

Frédéric Magoulès Abal-Kassim Cheik Ahamed Roman Putanowicz 《国际计算机数学杂志》2015,92(6):1222-1250

Exascale computers are expected to have highly hierarchical architectures with nodes composed by multiple core processors (CPU; central processing unit) and accelerators (GPU; graphics processing unit). The different programming levels generate new difficult algorithm issues. In particular when solving extremely large linear systems, new programming paradigms of Krylov methods should be defined and evaluated with respect to modern state of the art of scientific methods. Iterative Krylov methods involve linear algebra operations such as dot product, norm, addition of vectors and sparse matrix–vector multiplication. These operations are computationally expensive for large size matrices. In this paper, we aim to focus on the best way to perform effectively these operations, in double precision, on GPU in order to make iterative Krylov methods more robust and therefore reduce the computing time. The performance of our algorithms is evaluated on several matrices arising from engineering problems. Numerical experiments illustrate the robustness and accuracy of our implementation compared to the existing libraries. We deal with different preconditioned Krylov methods: Conjugate Gradient for symmetric positive-definite matrices, and Generalized Conjugate Residual, Bi-Conjugate Gradient Conjugate Residual, transpose-free Quasi Minimal Residual, Stabilized BiConjugate Gradient and Stabilized BiConjugate Gradient (L) for the solution of sparse linear systems with non symmetric matrices. We consider and compare several sparse compressed formats, and propose a way to implement effectively Krylov methods on GPU and on multicore CPU. Finally, we give strategies to faster algorithms by auto-tuning the threading design, upon the problem characteristics and the hardware changes. As a conclusion, we propose and analyse hybrid sub-structuring methods that should pave the way to exascale hybrid methods. 相似文献

4.

Model-order reductions for MIMO systems using global Krylov subspace methods

Chia-Chi Chu Ming-Hong Lai Wu-Shiung Feng 《Mathematics and computers in simulation》2008

This paper presents theoretical foundations of global Krylov subspace methods for model order reductions. This method is an extension of the standard Krylov subspace method for multiple-inputs multiple-outputs (MIMO) systems. By employing the congruence transformation with global Krylov subspaces, both one-sided Arnoldi and two-sided Lanczos oblique projection methods are explored for both single expansion point and multiple expansion points. In order to further reduce the computational complexity for multiple expansion points, adaptive-order multiple points moment matching algorithms, or the so-called rational Krylov space method, are also studied. Two algorithms, including the adaptive-order rational global Arnoldi (AORGA) algorithm and the adaptive-order global Lanczos (AOGL) algorithm, are developed in detail. Simulations of practical dynamical systems will be conducted to illustrate the feasibility and the efficiency of proposed methods. 相似文献

5.

一种改进的适合并行计算的共轭剩余算法

刘杰刘兴平迟利华胡庆丰《计算机学报》2006,29(3):495-499

通过改变CR算法的计算次序。提出了一种改进的共轭剩余（ICR）算法．对比CR算法。ICR算法的数值稳定性和CR算法相同,几乎没有增加计算量。但考虑了在MIMD并行机上实现时并行算法的性能,其同步开销减少为CR算法的一半,并且所有内积计算以及矩阵向量乘是独立的,没有数据相关性。可以进行计算与通信的重叠．从理论和实验两个角度来讨论ICR算法的性能,当处理机台数较多时ICR算法的计算速度快于CR算法．在64台处理机机群上进行的数值实验表明,并行ICR算法的计算速度大约比CR算法快30％．相似文献

6.

Determination of a good value of the time step and preconditioned Krylov subspace methods for the Navier-Stokes equations

《Computers & Mathematics with Applications》2005,49(7-8):1045-1057

The main purpose of this paper is to develop stable versions of some Krylov subspace methods for solving the linear systems of equations Ax = b which arise in the difference solution of 2-D nonstationary Navier-Stokes equations using implicit scheme and to determine a good value of the time step. Our algorithms are based on the conjugate-gradient method with a suitable preconditioner for solving the symmetric positive definite system and preconditioned GMRES, Orthomin(K), QMR methods for solving the nonsymmetric and (in)definite system. The performance of these methods is compared. In addition, we show that by using the condition number of the first nonsymmetric coefficient matrix, it is possible to determine a good value of the time step. 相似文献

7.

The effect of orderings on sparse approximate inverse preconditioners for non-symmetric problems

《Advances in Engineering Software》2002,33(7-10):611-619

We experimentally study how reordering techniques affect the rate of convergence of preconditioned Krylov subspace methods for non-symmetric sparse linear systems, where the preconditioner is a sparse approximate inverse. In addition, we show how the reordering reduces the number of entries in the approximate inverse and thus, the amount of storage and computation required for a given accuracy. These properties are illustrated with several numerical experiments taken from the discretization of PDEs by a finite element method and from a standard matrix collection. 相似文献

8.

Some preconditioners for elliptic PDE-constrained optimization problems

Yi-Fen Ke Chang-Feng Ma 《Computers & Mathematics with Applications》2018,75(8):2795-2813

For the structured systems of linear equations arising from the Galerkin finite element discretizations of elliptic PDE-constrained optimization problems, some preconditioners are proposed to accelerate the convergence rate of Krylov subspace methods such as GMRES for both cases of the Tikhonov parameter

β

not very small (equal or greater than 1e?6) and sufficiently small (less than 1e?6), respectively. We derive the explicit expressions for the eigenvalues and eigenvectors of the corresponding preconditioned matrices. Numerical results show that the corresponding preconditioned GMRES methods perform and match well with the theoretical results. 相似文献

9.

Supercloseness of Continuous Interior Penalty Methods on Shishkin Triangular Meshes and Hybrid Meshes for Singularly Perturbed Problems with Characteristic Layers

Jin Zhang Xiaowei Liu 《Journal of scientific computing》2018,75(3):1633-1655

A fast preconditioned penalty method is developed for a system of parabolic linear complementarity problems (LCPs) involving tempered fractional order partial derivatives governing the price of American options whose underlying asset follows a geometry Lévy process with multi-state regime switching. By means of the penalty method, the system of LCPs is approximated with a penalty term by a system of nonlinear tempered fractional partial differential equations (TFPDEs) coupled by a finite-state Markov chain. The system of nonlinear TFPDEs is discretized with the shifted Grünwald approximation by an upwind finite difference scheme which is shown to be unconditionally stable. Semi-smooth Newton’s method is utilized to solve the finite difference scheme as an outer iterative method in which the Jacobi matrix is found to possess Toeplitz-plus-diagonal structure. Consequently, the resulting linear system can be fast solved by the Krylov subspace method as an inner iterative method via fast Fourier transform (FFT). Furthermore, a novel preconditioner is proposed to speed up the convergence rate of the inner Krylov subspace iteration with theoretical analysis. With the above-mentioned preconditioning technique via FFT, under some mild conditions, the operation cost in each Newton’s step can be expected to be \(\mathcal{O}(N\mathrm{log}N)\), where N is the size of the coefficient matrix. Numerical examples are given to demonstrate the accuracy and efficiency of our proposed fast preconditioned penalty method. 相似文献

10.

Preconditioned Krylov subspace methods for solving radiative transfer problems with scattering and reflection

M.A. Badri P. Jolivet B. Rousseau Y. Favennec 《Computers & Mathematics with Applications》2019,77(6):1453-1465

Two Krylov subspace methods, the GMRES and the BiCGSTAB, are analyzed for solving the linear systems arising from the mixed finite element discretization of the discrete ordinates radiative transfer equation. To increase their convergence rate and stability, the Jacobi and block Jacobi methods are used as preconditioners for both Krylov subspace methods. Numerical experiments, designed to test the effectiveness of the (preconditioned) GMRES and the BiCGSTAB, are performed on various radiative transfer problems: (i) transparent, (ii) absorption dominant, (iii) scattering dominant, and (iv) with specular reflection. It is observed that the BiCGSTAB is superior to the GMRES, with lower iteration counts, solving times, and memory consumption. In particular, the BiCGSTAB preconditioned by the block Jacobi method performed best amongst the set of other solvers. To better understand the discrete systems for radiative problems (i) to (iv), an eigenvalue spectrum analysis has also been performed. It revealed that the linear system conditioning deteriorates for scattering media problems in comparison to absorbing or transparent media problems. This conditioning further deteriorates when reflection is involved. 相似文献

11.

Block preconditioners for elliptic PDE-constrained optimization problems

Zhong-Zhi Bai 《Computing》2011,91(4):379-395

For the structured systems of linear equations arising from the Galerkin finite-element discretizations of the distributed control problems, we construct block-counter-diagonal and block-counter-tridiagonal preconditioning matrices to precondition the Krylov subspace methods such as GMRES. We derive explicit expressions for the eigenvalues and eigenvectors of the corresponding preconditioned matrices. Numerical implementations show that these structured preconditioners may lead to satisfactory experimental results of the preconditioned GMRES methods when the regularization parameter is suitably small. 相似文献

12.

Modified HSS iteration methods for a class of complex symmetric linear systems

Zhong-Zhi Bai Michele Benzi Fang Chen 《Computing》2010,87(3-4):93-111

In this paper, we introduce and analyze a modification of the Hermitian and skew-Hermitian splitting iteration method for solving a broad class of complex symmetric linear systems. We show that the modified Hermitian and skew-Hermitian splitting (MHSS) iteration method is unconditionally convergent. Each iteration of this method requires the solution of two linear systems with real symmetric positive definite coefficient matrices. These two systems can be solved inexactly. We consider acceleration of the MHSS iteration by Krylov subspace methods. Numerical experiments on a few model problems are used to illustrate the performance of the new method. 相似文献

13.

A Fast Preconditioned Iterative Algorithm for the Electromagnetic Scattering from a Large Cavity

Chenliang Li Zhonghua Qiao 《Journal of scientific computing》2012,53(2):435-450

In this paper, a fast preconditioned Krylov subspace iterative algorithm is proposed for the electromagnetic scattering from a rectangular large open cavity embedded in an infinite ground plane. The scattering problem is described by the Helmholtz equation with a nonlocal artificial boundary condition on the aperture of the cavity and Dirichlet boundary conditions on the walls of the cavity. Compact fourth order finite difference schemes are employed to discretize the bounded domain problem. A much smaller interface discrete system is reduced by introducing the discrete Fourier transformation in the horizontal and a Gaussian elimination in the vertical direction, presented in Bao and Sun (SIAM J. Sci. Comput. 27:553, 2005). An effective preconditioner is developed for the Krylov subspace iterative solver to solve this interface system. Numerical results demonstrate the remarkable efficiency and accuracy of the proposed method. 相似文献

14.

Application of block Krylov subspace algorithms to the Wilson-Dirac equation with multiple right-hand sides in lattice QCD

T. Sakurai Y. Kuramashi 《Computer Physics Communications》2010,181(1):113-117

It is well known that the block Krylov subspace solvers work efficiently for some cases of the solution of differential equations with multiple right-hand sides. In lattice QCD calculation of physical quantities on a given configuration demands us to solve the Dirac equation with multiple sources. We show that a new block Krylov subspace algorithm recently proposed by the authors reduces the computational cost significantly without losing numerical accuracy for the solution of the O(a)-improved Wilson-Dirac equation. 相似文献

15.

预条件的平方Smith法求解大型Sylvester矩阵方程

蔡兆克鲍亮徐冬梅《计算机工程与科学》2017,39(8):1425-1430

提出了一种预条件的平方Smith算法求解大型连续Sylvester矩阵方程,该算法利用交替方向隐式迭代(ADI)来构造预条件算子,将原方程转换为非对称Stein方程,并在Krylov子空间中应用平方Smith法迭代产生低秩逼近解。数值实验表明,与已知的Jacobi迭代法等算法相比,该算法有更好的迭代效率和收敛精度。相似文献

16.

一种孔隙介质中地下水流并行计算方法

下载免费PDF全文

程汤培王群季晓慧黄林显徐腾《计算机工程与应用》2011,47(20):234-237

针对孔隙介质中地下水流动问题提出了一种并行数值计算方法,并基于此设计了一套专用于求解大规模三维地下水流动方程的并行计算模块。计算模块基于区域分解的方法实现对模型区域的并行求解,采用了分布式内存和压缩矩阵技术解决大规模稀疏矩阵的存储及其计算,整合多种并行Krylov子空间方法和预条件子技术迭代求解大规模线性方程组。在Linux集群系统上进行了数值模拟实验,性能测试结果表明,程序具有良好的加速比和可扩展性。相似文献

17.

Parallel Performance of Block ILU Preconditioners for a Block-tridiagonal Matrix

Heon Yun Jae 《The Journal of supercomputing》2003,24(1):69-89

The parallelizable block ILU (incomplete LU) factorization preconditioners for a block-tridiagonal matrix have been recently proposed by the author. In this paper, we describe a parallelization of Krylov subspace methods with the block ILU factorization preconditioners on distributed-memory computers such as the Cray T3E, and then parallel performance results of a preconditioned Krylov subspace method are provided to evaluate the effectiveness and efficiency of the block ILU preconditioners on the Cray T3E. 相似文献

18.

An efficient method for constructing an ILU preconditioner for solving large sparse nonsymmetric linear systems by the GMRES method

《Computers & Mathematics with Applications》2003,45(10-11):1757-1772

The main idea of this paper is in determination of the pattern of nonzero elements of the LU factors of a given matrix A. The idea is based on taking the powers of the Boolean matrix derived from A. This powers of a Boolean matrix strategy (PBS) is an efficient, effective, and inexpensive approach. Construction of an ILU preconditioner using PBS is described and used in solving large nonsymmetric sparse linear systems. Effectiveness of the proposed ILU preconditioner in solving large nonsymmetric sparse linear systems by the GMRES method is also shown. Numerical experiments are performed which show that it is possible to considerably reduce the number of GMRES iterations when the ILU preconditioner constructed here is used. In numerical examples, the influence of k, the dimension of the Krylov subspace, on the performance of the GMRES method using an ILU preconditioner is tested. For all the tests carried out, the best value for k is found to be 10. 相似文献

19.

An algorithm for solving sparse Nonlinear Least Squares problems

J. M. Martínez 《Computing》1987,39(4):307-325

We introduce a new method for solving Nonlinear Least Squares problems when the Jacobian matrix of the system is large and sparse. The main features of the new method are the following:

The Gauss-Newton equation is “partially” solved at each iteration using a preconditioned Conjugate Gradient algorithm.
The new point is obtained using a two-dimensional trust region scheme, similar to the one introduced by Bulteau and Vial.

We prove global and local convergence results and we present some numerical experiments. 相似文献

20.

Block triangular preconditioners for the discretized time-harmonic Maxwell equations in mixed form

Guang-Hui Cheng Shu-Qian Shen 《Computer Physics Communications》2009,180(2):192-196

In this paper, we consider the solution of the saddle point linear systems arising from the finite element discretization of the time-harmonic Maxwell equations in mixed form. Two types of block triangular Schur complement-free preconditioners used with Krylov subspace methods are proposed, involving the choice of the parameter. Furthermore, we give the optimal parameter in practice. Theoretical analysis shows that all eigenvalues of the preconditioned matrices are strongly clustered. Finally, numerical experiments that validate the analysis are presented. 相似文献