首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 750 毫秒
1.
Exascale computers are expected to have highly hierarchical architectures with nodes composed by multiple core processors (CPU; central processing unit) and accelerators (GPU; graphics processing unit). The different programming levels generate new difficult algorithm issues. In particular when solving extremely large linear systems, new programming paradigms of Krylov methods should be defined and evaluated with respect to modern state of the art of scientific methods. Iterative Krylov methods involve linear algebra operations such as dot product, norm, addition of vectors and sparse matrix–vector multiplication. These operations are computationally expensive for large size matrices. In this paper, we aim to focus on the best way to perform effectively these operations, in double precision, on GPU in order to make iterative Krylov methods more robust and therefore reduce the computing time. The performance of our algorithms is evaluated on several matrices arising from engineering problems. Numerical experiments illustrate the robustness and accuracy of our implementation compared to the existing libraries. We deal with different preconditioned Krylov methods: Conjugate Gradient for symmetric positive-definite matrices, and Generalized Conjugate Residual, Bi-Conjugate Gradient Conjugate Residual, transpose-free Quasi Minimal Residual, Stabilized BiConjugate Gradient and Stabilized BiConjugate Gradient (L) for the solution of sparse linear systems with non symmetric matrices. We consider and compare several sparse compressed formats, and propose a way to implement effectively Krylov methods on GPU and on multicore CPU. Finally, we give strategies to faster algorithms by auto-tuning the threading design, upon the problem characteristics and the hardware changes. As a conclusion, we propose and analyse hybrid sub-structuring methods that should pave the way to exascale hybrid methods.  相似文献   

2.
A High Performance Computing alternative to traditional Krylov subspace methods, pipelined Krylov subspace solvers offer better scalability in the strong scaling limit compared to standard Krylov subspace methods for large and sparse linear systems. The typical synchronization bottleneck is mitigated by overlapping time-consuming global communication phases with local computations in the algorithm. This paper describes a general framework for deriving the pipelined variant of any Krylov subspace algorithm. The proposed framework was implicitly used to derive the pipelined Conjugate Gradient (p-CG) method in Hiding global synchronization latency in the preconditioned Conjugate Gradient algorithm by P. Ghysels and W. Vanroose, Parallel Computing, 40(7):224–238, 2014. The pipelining framework is subsequently illustrated by formulating a pipelined version of the BiCGStab method for the solution of large unsymmetric linear systems on parallel hardware. A residual replacement strategy is proposed to account for the possible loss of attainable accuracy and robustness by the pipelined BiCGStab method. It is shown that the pipelined algorithm improves scalability on distributed memory machines, leading to significant speedups compared to standard preconditioned BiCGStab.  相似文献   

3.
Pipelined Krylov methods seek to ameliorate the latency due to inner products necessary for projection by overlapping it with the computation associated with sparse matrix‐vector multiplication. We clarify a folk theorem that this can only result in a speedup of 2× over the naive implementation. Examining many repeated runs, we show that stochastic noise also contributes to the latency, and we model this using an analytical probability distribution. Our analysis shows that speedups greater than 2× are possible with these algorithms. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

4.
The Laplace–Beltrami system of nonlinear, elliptic, partial differential equations has utility in the generation of computational grids on complex and highly curved geometry. Discretization of this system using the finite-element method accommodates unstructured grids, but generates a large, sparse, ill-conditioned system of nonlinear discrete equations. The use of the Laplace–Beltrami approach, particularly in large-scale applications, has been limited by the scalability and efficiency of solvers. This paper addresses this limitation by developing two nonlinear solvers based on the Jacobian-Free Newton–Krylov (JFNK) methodology. A key feature of these methods is that the Jacobian is not formed explicitly for use by the underlying linear solver. Iterative linear solvers such as the Generalized Minimal RESidual (GMRES) method do not technically require the stand-alone Jacobian; instead its action on a vector is approximated through two nonlinear function evaluations. The preconditioning required by GMRES is also discussed. Two different preconditioners are developed, both of which employ existing Algebraic Multigrid (AMG) methods. Further, the most efficient preconditioner, overall, for the problems considered is based on a Picard linearization. Numerical examples demonstrate that these solvers are significantly faster than a standard Newton–Krylov approach; a speedup factor of approximately 26 was obtained for the Picard preconditioner on the largest grids studied here. In addition, these JFNK solvers exhibit good algorithmic scaling with increasing grid size.  相似文献   

5.
B. Carpentieri 《Computing》2006,77(3):275-296
In this paper, we describe a matrix-free iterative algorithm based on the GMRES method for solving electromagnetic scattering problems expressed in an integral formulation. Integral methods are an interesting alternative to differential equation solvers for this problem class since they do not require absorbing boundary conditions and they mesh only the surface of the radiating object giving rise to dense and smaller linear systems of equations. However, in realistic applications the discretized systems can be very large and for some integral formulations, like the popular Electric Field Integral Equation, they become ill-conditioned when the frequency increases. This means that iterative Krylov solvers have to be combined with fast methods for the matrix-vector products and robust preconditioning to be affordable in terms of CPU time. In this work we describe a matrix-free two-grid preconditioner for the GMRES solver combined with the Fast Multipole Method. The preconditioner is an algebraic two-grid cycle built on top of a sparse approximate inverse that is used as smoother, while the grid transfer operators are defined using spectral information of the preconditioned matrix. Experiments on a set of linear systems arising from real radar cross section calculation in industry illustrate the potential of the proposed approach for solving large-scale problems in electromagnetism.  相似文献   

6.
In this paper, we study the effect of enhancing GPU-accelerated Krylov solvers with preconditioners. We consider the BiCGSTAB, CGS, QMR, and IDR(s) Krylov solvers. For a large set of test matrices, we assess the impact of Jacobi and incomplete factorization preconditioning on the solvers’ numerical stability and time-to-solution performance. We also analyze how the use of a preconditioner impacts the choice of the fastest solver.  相似文献   

7.
We present a newly developed version of our solvers for the verified solution of dense parametric linear systems, i.e. linear systems whose system matrix and right-hand side depend affine-linearly on parameters that vary inside prescribed intervals. The solvers use our C++ class library for reliable computing, C-XSC. The C-XSC library provides many features, especially easy to handle data types for dense and sparse matrices and vectors and the ability to compute dot products and dot product expressions in arbitrary precision. The new solvers can use either sparse or dense matrices as the coefficient matrices for the parameters. The use of sparse coefficient matrices can result in huge improvements in both performance and memory consumption. BLAS and LAPACK routines are used where applicable, and OpenMP is used for the parallelization on multi-core and multi-processor systems. The solvers also provide the ability to compute not only an outer but also a componentwise inner enclosure of the solution set of the system and to choose between two versions of the algorithm, one being very fast and one giving sharp results and extending the range of solvable systems. We give some examples for parametric linear systems (also from real world examples such as worst-case tolerance analysis of linear electric circuits), give performance measurements of our solvers and also demonstrate that they scale very well when using multiple cores or processors.  相似文献   

8.
In this paper, we develop, study and implement iterative linear solvers and preconditioners using multiple graphical processing units (GPUs). Techniques for accelerating sparse matrix–vector (SpMV) multiplication, linear solvers and preconditioners are presented. Four Krylov subspace solvers, a Neumann polynomial preconditioner and a domain decomposition preconditioner are implemented. Our numerical tests with NVIDIA C2050 GPUs show that the SpMV kernel can be sped over 40 times faster using four GPUs. Our linear solvers and preconditioners have similar speedup.  相似文献   

9.
The parallelization of sophisticated applications has dramatically increased in recent years. As machine capabilities rise, greater emphasis on modeling complex phenomena can be expected. Many of these applications require the solution of large sparse matrix equations which approximate systems of partial differential equations (PDEs). Therefore we consider parallel iterative solvers for large sparse non-symmetric systems and issues related to parallel sparse matrix software. We describe a collection of parallel iterative solvers which use a distributed sparse matrix format that facilitates the interface between specific applications and a variety of Krylov subspace techniques and multigrid methods. These methods have been used to solve a number of linear and non-linear PDE problems on a 1024-processor NCUBE 2 hypercube. Over 1 Gflop sustained computation rates are achieved with many of these solvers, demonstrating that high performance can be attained even when using sparse matrix data structures.  相似文献   

10.
Methods for the solution of sparse eigenvalue problems that are based on spectral projectors and contour integration have recently attracted more and more attention. Such methods require the solution of many shifted sparse linear systems of full size. In most of the literature concerning these eigenvalue solvers, only few words are said on the solution of the linear systems, but they turn out to be very hard to solve by iterative linear solvers in practice. In this work we identify a row projection method for the solution of the inner linear systems encountered in the FEAST algorithm and introduce a novel hybrid parallel and fully iterative implementation of the eigenvalue solver. Our approach ultimately aims at achieving extreme parallelism by exploiting the algorithm’s potential on several levels. We present numerical examples where graphene modeling is one of the target applications. In this application, several hundred or even thousands of eigenvalues from the interior of the spectrum are required, which is a big challenge for state-of-the-art numerical methods.  相似文献   

11.
Two Krylov subspace methods, the GMRES and the BiCGSTAB, are analyzed for solving the linear systems arising from the mixed finite element discretization of the discrete ordinates radiative transfer equation. To increase their convergence rate and stability, the Jacobi and block Jacobi methods are used as preconditioners for both Krylov subspace methods. Numerical experiments, designed to test the effectiveness of the (preconditioned) GMRES and the BiCGSTAB, are performed on various radiative transfer problems: (i) transparent, (ii) absorption dominant, (iii) scattering dominant, and (iv) with specular reflection. It is observed that the BiCGSTAB is superior to the GMRES, with lower iteration counts, solving times, and memory consumption. In particular, the BiCGSTAB preconditioned by the block Jacobi method performed best amongst the set of other solvers. To better understand the discrete systems for radiative problems (i) to (iv), an eigenvalue spectrum analysis has also been performed. It revealed that the linear system conditioning deteriorates for scattering media problems in comparison to absorbing or transparent media problems. This conditioning further deteriorates when reflection is involved.  相似文献   

12.
There are verities of useful Krylov subspace methods to solve nonsymmetric linear system of equations. GMRES is one of the best Krylov solvers with several different variants to solve large sparse linear systems. Any GMRES implementation has some advantages. As the solution of ill-posed problems are important. In this paper, some GMRES variants are discussed and applied to solve these kinds of problems. Residual smoothing techniques are efficient ways to accelerate the convergence speed of some iterative methods like CG variants. At the end of this paper, some residual smoothing techniques are applied for different GMRES methods to test the influence of these techniques on GMRES implementations.  相似文献   

13.
The irregular nature of sparse matrix-vector multiplication, Ax=y, has led to the development of a variety of compressed storage formats, which are widely used because they do not store any unnecessary elements. One of these methods, the Jagged Diagonal Storage format (JDS) is, in addition, considered appropriate for the implementation of iterative methods on parallel and vector processors. In this work we present the Transpose Jagged Diagonal Storage format (TJDS) which drew inspiration from the Jagged Diagonal Storage scheme but requires less storage space than JDS. We propose an alternative storage scheme which makes no assumptions about the sparsity pattern of the matrix and only needs three linear arrays instead of the four linear arrays required by JDS. Specifically, the data is aligned in such a way that the permutation array used in JDS, to permute the solution vector back to the original ordering, is unnecessary. This allow us to save the memory space required to store an integer vector of length n, where n stands for the number of columns in the sparse matrix A. This storage saving reaches, for the selection of matrices used in this work, from 14% up to 45% of the number of non-zero values of the sparse matrices. We present a case study of a 6×6 sparse matrix to show the data structures and the algorithm to compute Ax=y using the TJDS format.  相似文献   

14.
This paper presents theoretical foundations of global Krylov subspace methods for model order reductions. This method is an extension of the standard Krylov subspace method for multiple-inputs multiple-outputs (MIMO) systems. By employing the congruence transformation with global Krylov subspaces, both one-sided Arnoldi and two-sided Lanczos oblique projection methods are explored for both single expansion point and multiple expansion points. In order to further reduce the computational complexity for multiple expansion points, adaptive-order multiple points moment matching algorithms, or the so-called rational Krylov space method, are also studied. Two algorithms, including the adaptive-order rational global Arnoldi (AORGA) algorithm and the adaptive-order global Lanczos (AOGL) algorithm, are developed in detail. Simulations of practical dynamical systems will be conducted to illustrate the feasibility and the efficiency of proposed methods.  相似文献   

15.
Spectral preconditioners are based on the fact that the convergence rate of the Krylov subspace methods is improved if the eigenvalues of the smallest magnitude of the system matrix are ‘removed’. In this paper, two preconditioning strategies are studied to solve a set of linear systems associated with the numerical integration of the time-dependent neutron diffusion equation. Both strategies can be implemented using the matrix–vector product as the main operation and succeed at reducing the total number of iterations needed to solve the set of systems.  相似文献   

16.
17.
Modelling the interaction of an acoustic field in a fluid and an elastic structure submerged in the fluid leads to a system of complex linear equations with a complicated sparsity structure and, for higher wavenumbers and adequate modelling, the systems are very large. Direct methods are not practical. Preconditioned iterative methods, which are suitable for single operator equations, are not immediately applicable to the coupled case. This article proposes a block diagonal preconditioner of the sparse approximate inverse (SPAI) type that can accelerate the convergence of Krylov iterative solvers for the coupled system. Moreover, the proposed preconditioner can properly and implicitly scale the coupled matrix. Some numerical results are presented to demonstrate the effectiveness of the new method.  相似文献   

18.
在潮流计算时,绝大部分时间都用在求解大规模稀疏线性方程组Ax=b上。众多文献中运用的迭代法并不统一,它们只注重预处理方法的改进。本文就针对几种流行的Krylov迭代法进行详细介绍,总结特性,并利用实验来分析它们总的FLOPS值和收敛效率。最后,通过评估算法的计算效率,得出一种比较适合潮流计算的Krylov迭代法。  相似文献   

19.
Krylov subspace spectral methods have been shown to be high-order accurate in time and more stable than explicit time-stepping methods, but also more difficult to implement efficiently. This paper describes how these methods can be fashioned into practical solvers by exploiting the simple structure of differential operators Numerical results concerning accuracy and efficiency are presented for parabolic problems in one and two space dimensions.  相似文献   

20.
《Parallel Computing》1988,6(2):165-184
The multiplication of a vector by a matrix and the solution of triangular linear systems are the most demanding operations in the majority of iterative techniques for the solution of linear systems. Data-driven VLSI networks which perform these two operations, efficiently, for certain sparse matrices are introduced. In order to avoid computations that involve zero operands, the non-zero elements in a sparse matrix are organized in the form of non-overlapping stripes, and only the elements within the stripe structure of the matrix are manipulated. Detailed analysis of the networks proves that both operations may be completed in n global cycles with minimal communication overhead, where n is the order of the linear system. The number of cells in each network as well as the communication overhead, are determined by the stripe structure of the matrix. Different stripe structures for the class of sparse matrices generated in Finite Element Analysis are examined in a separate paper.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号