期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Auto-tuned Krylov methods on cluster of graphics processing unit

Frédéric Magoulès Abal-Kassim Cheik Ahamed Roman Putanowicz 《国际计算机数学杂志》2015,92(6):1222-1250

Exascale computers are expected to have highly hierarchical architectures with nodes composed by multiple core processors (CPU; central processing unit) and accelerators (GPU; graphics processing unit). The different programming levels generate new difficult algorithm issues. In particular when solving extremely large linear systems, new programming paradigms of Krylov methods should be defined and evaluated with respect to modern state of the art of scientific methods. Iterative Krylov methods involve linear algebra operations such as dot product, norm, addition of vectors and sparse matrix–vector multiplication. These operations are computationally expensive for large size matrices. In this paper, we aim to focus on the best way to perform effectively these operations, in double precision, on GPU in order to make iterative Krylov methods more robust and therefore reduce the computing time. The performance of our algorithms is evaluated on several matrices arising from engineering problems. Numerical experiments illustrate the robustness and accuracy of our implementation compared to the existing libraries. We deal with different preconditioned Krylov methods: Conjugate Gradient for symmetric positive-definite matrices, and Generalized Conjugate Residual, Bi-Conjugate Gradient Conjugate Residual, transpose-free Quasi Minimal Residual, Stabilized BiConjugate Gradient and Stabilized BiConjugate Gradient (L) for the solution of sparse linear systems with non symmetric matrices. We consider and compare several sparse compressed formats, and propose a way to implement effectively Krylov methods on GPU and on multicore CPU. Finally, we give strategies to faster algorithms by auto-tuning the threading design, upon the problem characteristics and the hardware changes. As a conclusion, we propose and analyse hybrid sub-structuring methods that should pave the way to exascale hybrid methods. 相似文献

2.

The communication-hiding pipelined BiCGstab method for the parallel solution of large unsymmetric linear systems

《Parallel Computing》2017

A High Performance Computing alternative to traditional Krylov subspace methods, pipelined Krylov subspace solvers offer better scalability in the strong scaling limit compared to standard Krylov subspace methods for large and sparse linear systems. The typical synchronization bottleneck is mitigated by overlapping time-consuming global communication phases with local computations in the algorithm. This paper describes a general framework for deriving the pipelined variant of any Krylov subspace algorithm. The proposed framework was implicitly used to derive the pipelined Conjugate Gradient (p-CG) method in Hiding global synchronization latency in the preconditioned Conjugate Gradient algorithm by P. Ghysels and W. Vanroose, Parallel Computing, 40(7):224–238, 2014. The pipelining framework is subsequently illustrated by formulating a pipelined version of the BiCGStab method for the solution of large unsymmetric linear systems on parallel hardware. A residual replacement strategy is proposed to account for the possible loss of attainable accuracy and robustness by the pipelined BiCGStab method. It is shown that the pipelined algorithm improves scalability on distributed memory machines, leading to significant speedups compared to standard preconditioned BiCGStab. 相似文献

3.

A stochastic performance model for pipelined Krylov methods

Hannah Morgan Matthew G. Knepley Patrick Sanan L. Ridgway Scott 《Concurrency and Computation》2016,28(18):4532-4542

Pipelined Krylov methods seek to ameliorate the latency due to inner products necessary for projection by overlapping it with the computation associated with sparse matrix‐vector multiplication. We clarify a folk theorem that this can only result in a speedup of 2× over the naive implementation. Examining many repeated runs, we show that stochastic noise also contributes to the latency, and we model this using an analytical probability distribution. Our analysis shows that speedups greater than 2× are possible with these algorithms. Copyright © 2016 John Wiley & Sons, Ltd. 相似文献

4.

A Matrix-free Two-grid Preconditioner for Solving Boundary Integral Equations in Electromagnetism

B. Carpentieri 《Computing》2006,77(3):275-296

In this paper, we describe a matrix-free iterative algorithm based on the GMRES method for solving electromagnetic scattering problems expressed in an integral formulation. Integral methods are an interesting alternative to differential equation solvers for this problem class since they do not require absorbing boundary conditions and they mesh only the surface of the radiating object giving rise to dense and smaller linear systems of equations. However, in realistic applications the discretized systems can be very large and for some integral formulations, like the popular Electric Field Integral Equation, they become ill-conditioned when the frequency increases. This means that iterative Krylov solvers have to be combined with fast methods for the matrix-vector products and robust preconditioning to be affordable in terms of CPU time. In this work we describe a matrix-free two-grid preconditioner for the GMRES solver combined with the Fast Multipole Method. The preconditioner is an algebraic two-grid cycle built on top of a sparse approximate inverse that is used as smoother, while the grid transfer operators are defined using spectral information of the preconditioned matrix. Experiments on a set of linear systems arising from real radar cross section calculation in industry illustrate the potential of the proposed approach for solving large-scale problems in electromagnetism. 相似文献

5.

Preconditioned Krylov solvers on GPUs

《Parallel Computing》2017

In this paper, we study the effect of enhancing GPU-accelerated Krylov solvers with preconditioners. We consider the BiCGSTAB, CGS, QMR, and IDR(s) Krylov solvers. For a large set of test matrices, we assess the impact of Jacobi and incomplete factorization preconditioning on the solvers’ numerical stability and time-to-solution performance. We also analyze how the use of a preconditioner impacts the choice of the fastest solver. 相似文献

6.

Efficient nonlinear solvers for Laplace–Beltrami smoothing of three-dimensional unstructured grids

Markus Berndt J. David Moulton Glen Hansen 《Computers & Mathematics with Applications》2008,55(12):2791-2806

The Laplace–Beltrami system of nonlinear, elliptic, partial differential equations has utility in the generation of computational grids on complex and highly curved geometry. Discretization of this system using the finite-element method accommodates unstructured grids, but generates a large, sparse, ill-conditioned system of nonlinear discrete equations. The use of the Laplace–Beltrami approach, particularly in large-scale applications, has been limited by the scalability and efficiency of solvers. This paper addresses this limitation by developing two nonlinear solvers based on the Jacobian-Free Newton–Krylov (JFNK) methodology. A key feature of these methods is that the Jacobian is not formed explicitly for use by the underlying linear solver. Iterative linear solvers such as the Generalized Minimal RESidual (GMRES) method do not technically require the stand-alone Jacobian; instead its action on a vector is approximated through two nonlinear function evaluations. The preconditioning required by GMRES is also discussed. Two different preconditioners are developed, both of which employ existing Algebraic Multigrid (AMG) methods. Further, the most efficient preconditioner, overall, for the problems considered is based on a Picard linearization. Numerical examples demonstrate that these solvers are significantly faster than a standard Newton–Krylov approach; a speedup factor of approximately 26 was obtained for the Picard preconditioner on the largest grids studied here. In addition, these JFNK solvers exhibit good algorithmic scaling with increasing grid size. 相似文献

7.

Solvers for the verified solution of parametric linear systems

Michael Zimmer Walter Kr?mer Evgenija D. Popova 《Computing》2012,94(2-4):109-123

We present a newly developed version of our solvers for the verified solution of dense parametric linear systems, i.e. linear systems whose system matrix and right-hand side depend affine-linearly on parameters that vary inside prescribed intervals. The solvers use our C++ class library for reliable computing, C-XSC. The C-XSC library provides many features, especially easy to handle data types for dense and sparse matrices and vectors and the ability to compute dot products and dot product expressions in arbitrary precision. The new solvers can use either sparse or dense matrices as the coefficient matrices for the parameters. The use of sparse coefficient matrices can result in huge improvements in both performance and memory consumption. BLAS and LAPACK routines are used where applicable, and OpenMP is used for the parallelization on multi-core and multi-processor systems. The solvers also provide the ability to compute not only an outer but also a componentwise inner enclosure of the solution set of the system and to choose between two versions of the algorithm, one being very fast and one giving sharp results and extending the range of solvable systems. We give some examples for parametric linear systems (also from real world examples such as worst-case tolerance analysis of linear electric circuits), give performance measurements of our solvers and also demonstrate that they scale very well when using multiple cores or processors. 相似文献

8.

Accelerating iterative linear solvers using multiple graphical processing units

Zhangxin Chen Bo Yang 《国际计算机数学杂志》2015,92(7):1422-1438

In this paper, we develop, study and implement iterative linear solvers and preconditioners using multiple graphical processing units (GPUs). Techniques for accelerating sparse matrix–vector (SpMV) multiplication, linear solvers and preconditioners are presented. Four Krylov subspace solvers, a Neumann polynomial preconditioner and a domain decomposition preconditioner are implemented. Our numerical tests with NVIDIA C2050 GPUs show that the SpMV kernel can be sped over 40 times faster using four GPUs. Our linear solvers and preconditioners have similar speedup. 相似文献

9.

Sparse iterative algorithm software for large-scale MIMD machines: An initial discussion and implementation

John N. Shadid Ray S. Tuminaro 《Concurrency and Computation》1992,4(6):481-497

The parallelization of sophisticated applications has dramatically increased in recent years. As machine capabilities rise, greater emphasis on modeling complex phenomena can be expected. Many of these applications require the solution of large sparse matrix equations which approximate systems of partial differential equations (PDEs). Therefore we consider parallel iterative solvers for large sparse non-symmetric systems and issues related to parallel sparse matrix software. We describe a collection of parallel iterative solvers which use a distributed sparse matrix format that facilitates the interface between specific applications and a variety of Krylov subspace techniques and multigrid methods. These methods have been used to solve a number of linear and non-linear PDE problems on a 1024-processor NCUBE 2 hypercube. Over 1 Gflop sustained computation rates are achieved with many of these solvers, demonstrating that high performance can be attained even when using sparse matrix data structures. 相似文献

10.

On the parallel iterative solution of linear systems arising in the FEAST algorithm for computing inner eigenvalues

《Parallel Computing》2015

Methods for the solution of sparse eigenvalue problems that are based on spectral projectors and contour integration have recently attracted more and more attention. Such methods require the solution of many shifted sparse linear systems of full size. In most of the literature concerning these eigenvalue solvers, only few words are said on the solution of the linear systems, but they turn out to be very hard to solve by iterative linear solvers in practice. In this work we identify a row projection method for the solution of the inner linear systems encountered in the FEAST algorithm and introduce a novel hybrid parallel and fully iterative implementation of the eigenvalue solver. Our approach ultimately aims at achieving extreme parallelism by exploiting the algorithm’s potential on several levels. We present numerical examples where graphene modeling is one of the target applications. In this application, several hundred or even thousands of eigenvalues from the interior of the spectrum are required, which is a big challenge for state-of-the-art numerical methods. 相似文献

11.

Preconditioned Krylov subspace methods for solving radiative transfer problems with scattering and reflection

M.A. Badri P. Jolivet B. Rousseau Y. Favennec 《Computers & Mathematics with Applications》2019,77(6):1453-1465

Two Krylov subspace methods, the GMRES and the BiCGSTAB, are analyzed for solving the linear systems arising from the mixed finite element discretization of the discrete ordinates radiative transfer equation. To increase their convergence rate and stability, the Jacobi and block Jacobi methods are used as preconditioners for both Krylov subspace methods. Numerical experiments, designed to test the effectiveness of the (preconditioned) GMRES and the BiCGSTAB, are performed on various radiative transfer problems: (i) transparent, (ii) absorption dominant, (iii) scattering dominant, and (iv) with specular reflection. It is observed that the BiCGSTAB is superior to the GMRES, with lower iteration counts, solving times, and memory consumption. In particular, the BiCGSTAB preconditioned by the block Jacobi method performed best amongst the set of other solvers. To better understand the discrete systems for radiative problems (i) to (iv), an eigenvalue spectrum analysis has also been performed. It revealed that the linear system conditioning deteriorates for scattering media problems in comparison to absorbing or transparent media problems. This conditioning further deteriorates when reflection is involved. 相似文献

12.

GMRES implementations and residual smoothing techniques for solving ill-posed linear systems

M. MatinfarH. Zareamoghaddam M. EslamiM. Saeidy 《Computers & Mathematics with Applications》2012,63(1):1-13

There are verities of useful Krylov subspace methods to solve nonsymmetric linear system of equations. GMRES is one of the best Krylov solvers with several different variants to solve large sparse linear systems. Any GMRES implementation has some advantages. As the solution of ill-posed problems are important. In this paper, some GMRES variants are discussed and applied to solve these kinds of problems. Residual smoothing techniques are efficient ways to accelerate the convergence speed of some iterative methods like CG variants. At the end of this paper, some residual smoothing techniques are applied for different GMRES methods to test the influence of these techniques on GMRES implementations. 相似文献

13.

An optimal storage format for sparse matrices

《Information Processing Letters》2004,90(2):87-92

The irregular nature of sparse matrix-vector multiplication, Ax=y, has led to the development of a variety of compressed storage formats, which are widely used because they do not store any unnecessary elements. One of these methods, the Jagged Diagonal Storage format (JDS) is, in addition, considered appropriate for the implementation of iterative methods on parallel and vector processors. In this work we present the Transpose Jagged Diagonal Storage format (TJDS) which drew inspiration from the Jagged Diagonal Storage scheme but requires less storage space than JDS. We propose an alternative storage scheme which makes no assumptions about the sparsity pattern of the matrix and only needs three linear arrays instead of the four linear arrays required by JDS. Specifically, the data is aligned in such a way that the permutation array used in JDS, to permute the solution vector back to the original ordering, is unnecessary. This allow us to save the memory space required to store an integer vector of length n, where n stands for the number of columns in the sparse matrix A. This storage saving reaches, for the selection of matrices used in this work, from 14% up to 45% of the number of non-zero values of the sparse matrices. We present a case study of a 6×6 sparse matrix to show the data structures and the algorithm to compute Ax=y using the TJDS format. 相似文献

14.

Model-order reductions for MIMO systems using global Krylov subspace methods

Chia-Chi Chu Ming-Hong Lai Wu-Shiung Feng 《Mathematics and computers in simulation》2008

This paper presents theoretical foundations of global Krylov subspace methods for model order reductions. This method is an extension of the standard Krylov subspace method for multiple-inputs multiple-outputs (MIMO) systems. By employing the congruence transformation with global Krylov subspaces, both one-sided Arnoldi and two-sided Lanczos oblique projection methods are explored for both single expansion point and multiple expansion points. In order to further reduce the computational complexity for multiple expansion points, adaptive-order multiple points moment matching algorithms, or the so-called rational Krylov space method, are also studied. Two algorithms, including the adaptive-order rational global Arnoldi (AORGA) algorithm and the adaptive-order global Lanczos (AOGL) algorithm, are developed in detail. Simulations of practical dynamical systems will be conducted to illustrate the feasibility and the efficiency of proposed methods. 相似文献

15.

Linear model reduction of large scale industrial models in elastic multibody dynamics

Michael Fischer Peter Eberhard 《Multibody System Dynamics》2014,31(1):27-46

相似文献

16.

几种Krylov迭代法在潮流计算中的对比

郑锦辉陆达《计算机与现代化》2011,(4):4-6

在潮流计算时,绝大部分时间都用在求解大规模稀疏线性方程组Ax=b上。众多文献中运用的迭代法并不统一,它们只注重预处理方法的改进。本文就针对几种流行的Krylov迭代法进行详细介绍,总结特性,并利用实验来分析它们总的FLOPS值和收敛效率。最后,通过评估算法的计算效率,得出一种比较适合潮流计算的Krylov迭代法。相似文献

17.

Practical Implementation of Krylov Subspace Spectral Methods

James V. Lambers 《Journal of scientific computing》2007,32(3):449-476

Krylov subspace spectral methods have been shown to be high-order accurate in time and more stable than explicit time-stepping methods, but also more difficult to implement efficiently. This paper describes how these methods can be fashioned into practical solvers by exploiting the simple structure of differential operators Numerical results concerning accuracy and efficiency are presented for parabolic problems in one and two space dimensions. 相似文献

18.

Preconditioning the solution of the time-dependent neutron diffusion equation by recycling Krylov subspaces

S. González-Pintor G. Verdú 《国际计算机数学杂志》2014,91(1):42-52

Spectral preconditioners are based on the fact that the convergence rate of the Krylov subspace methods is improved if the eigenvalues of the smallest magnitude of the system matrix are ‘removed’. In this paper, two preconditioning strategies are studied to solve a set of linear systems associated with the numerical integration of the time-dependent neutron diffusion equation. Both strategies can be implemented using the matrix–vector product as the main operation and succeed at reducing the total number of iterations needed to solve the set of systems. 相似文献

19.

Application of block Krylov subspace algorithms to the Wilson-Dirac equation with multiple right-hand sides in lattice QCD

T. Sakurai Y. Kuramashi 《Computer Physics Communications》2010,181(1):113-117

It is well known that the block Krylov subspace solvers work efficiently for some cases of the solution of differential equations with multiple right-hand sides. In lattice QCD calculation of physical quantities on a given configuration demands us to solve the Dirac equation with multiple sources. We show that a new block Krylov subspace algorithm recently proposed by the authors reduces the computational cost significantly without losing numerical accuracy for the solution of the O(a)-improved Wilson-Dirac equation. 相似文献

20.

Updating preconditioner for iterative method in time domain simulation of power systems

WANG Ke XUE Wei LIN HaiXiang XU ShiMing & ZHENG WeiMin Laboratory of Parallel Software Computational Science 《中国科学:信息科学(英文版)》2011,(4)

The numerical solution of the differential-algebraic equations(DAEs) involved in time domain simulation(TDS) of power systems requires the solution of a sequence of large scale and sparse linear systems.The use of iterative methods such as the Krylov subspace method is imperative for the solution of these large and sparse linear systems.The motivation of the present work is to develop a new algorithm to efficiently precondition the whole sequence of linear systems involved in TDS.As an improvement of dishon... 相似文献