首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 921 毫秒
1.
GPU-accelerated preconditioned iterative linear solvers   总被引:1,自引:1,他引:0  
This work is an overview of our preliminary experience in developing a high-performance iterative linear solver accelerated by GPU coprocessors. Our goal is to illustrate the advantages and difficulties encountered when deploying GPU technology to perform sparse linear algebra computations. Techniques for speeding up sparse matrix-vector product (SpMV) kernels and finding suitable preconditioning methods are discussed. Our experiments with an NVIDIA TESLA M2070 show that for unstructured matrices SpMV kernels can be up to 8 times faster on the GPU than the Intel MKL on the host Intel Xeon X5675 Processor. Overall performance of the GPU-accelerated Incomplete Cholesky (IC) factorization preconditioned CG method can outperform its CPU counterpart by a smaller factor, up to 3, and GPU-accelerated The incomplete LU (ILU) factorization preconditioned GMRES method can achieve a speed-up nearing 4. However, with better suited preconditioning techniques for GPUs, this performance can be further improved.  相似文献   

2.
A general method in the form of an accelerated preconditioned iterative refinement method (including some wellknown iterative methods and direct factorization methods) is presented for the solution of symmetric, sparse matrix problems. An analysis of one such approximate factorization, the SSOR method, is given, and some inherently advantageous properties of the conjugate gradient acceleration method are pointed out. A comparison is made of the computational complexity and storage in the SSOR preconditioned method with some direct methods applied to second order discretized boundary value problems. For plane problems of average size the direct methods are somewhat faster if enough right hand sides are present. For large enough problems (large number of nodes) the iterative method is faster. For three-dimensional problems no Cholesky factorization method can compete with the SSOR preconditioned method, not even for average sized problems.  相似文献   

3.
Scalability of Krylov subspace methods suffers from costly global synchronization steps that arise in dot-products and norm calculations on parallel machines. In this work, a modified preconditioned Conjugate Gradient (CG) method is presented that removes the costly global synchronization steps from the standard CG algorithm by only performing a single non-blocking reduction per iteration. This global communication phase can be overlapped by the matrix–vector product, which typically only requires local communication. The resulting algorithm will be referred to as pipelined CG. An alternative pipelined method, mathematically equivalent to the Conjugate Residual (CR) method that makes different trade-offs with regard to scalability and serial runtime is also considered. These methods are compared to a recently proposed asynchronous CG algorithm by Gropp. Extensive numerical experiments demonstrate the numerical stability of the methods. Moreover, it is shown that hiding the global synchronization step improves scalability on distributed memory machines using the message passing paradigm and leads to significant speedups compared to standard preconditioned CG.  相似文献   

4.
解大型稀疏线性方程组的一种有效并行ICCG法   总被引:6,自引:0,他引:6  
该文分析了不完全Cholesky分解预处理共轭梯度(ICCG)法各部分的计算量,给出了占ICCG法主要计算时间的解预处理方程的并行算法,它既有比目前迭代算法快的收敛速度,又有较好的并行度。  相似文献   

5.
The parallelizable block ILU (incomplete LU) factorization preconditioners for a block-tridiagonal matrix have been recently proposed by the author. In this paper, we describe a parallelization of Krylov subspace methods with the block ILU factorization preconditioners on distributed-memory computers such as the Cray T3E, and then parallel performance results of a preconditioned Krylov subspace method are provided to evaluate the effectiveness and efficiency of the block ILU preconditioners on the Cray T3E.  相似文献   

6.
针对基于GPU求解大规模稀疏线性方程组进行了研究,提出一种稀疏矩阵的分块存储格式HMEC(hybrid multiple ELL and CSR)。通过重排序优化系数矩阵的存储结构,将系数矩阵以一定的比例分块存储,采用ELL与CSR存储格式相结合的方式以适应不同的分块特征,分别使用适用于不对称矩阵的不完全LU分解预处理BICGStab法和对称正定矩阵的不完全Cholesky分解预处理共轭梯度法求解大规模稀疏线性系统。实验表明,应用HMEC格式存储稀疏矩阵并以调用GPU kernel的方式实现前述两种方法,与其他存储格式的实现方式作比较,最优可分别获得31.89%和17.50%的加速效果。  相似文献   

7.
In this paper a preconditioned iterative method suitable for the solution of the generalized eigenvalue problem Ax = λBx is presented. The proposed method is suitable for the determination of extreme eigenvalues and their corresponding eigenvectors of large sparse eigenproblems derived from the finite element discretization method. The solution is obtained through the minimization of the Rayleigh quotient by a preconditioned conjugate gradient (CG) method. The proposed triangular splitting preconditioner combines Evans' SSOR preconditioner with a drop-off tolerance criterion and is called partial preconditioner (PPR). The PPR is attractive in a large FE framework because it is simple and can be implemented at the element level, as opposed to incomplete Choleski preconditioners, which require a sparse assembly. Because of the renewed interest in CG techniques for FE work on microprocessors and parallel computers it is believed that this improved approach to the generalized eigenvalue problem, through the minimization of the Rayleigh quotient, is likely to be very promising.  相似文献   

8.
Normalized explicit approximate inverse matrix techniques for computing explicitly various families of normalized approximate inverses based on normalized approximate factorization procedures for solving sparse linear systems, which are derived from the finite difference and finite element discretization of partial differential equations are presented. Normalized explicit preconditioned conjugate gradient-type schemes in conjunction with normalized approximate inverse matrix techniques are presented for the efficient solution of linear and non-linear systems. Theoretical estimates on the rate of convergence and computational complexity of the normalized explicit preconditioned conjugate gradient method are also presented. Applications of the proposed methods on characteristic linear and non-linear problems are discussed and numerical results are given.  相似文献   

9.
基于四阶紧致格式对三维对流扩散方程进行离散,并给出所得到的离散线性方程组的块三角稀疏矩阵形式。以带双阈值的不完全因子化LU分解[(ILUT(τ,s))]作为预条件子,分别用FGMRES、BICGSTAB和TFQMR作为迭代加速器,对离散线性方程组进行求解验证了格式精度并比较了不同迭代法的CPU时间和迭代步。此外,通过比较传统迭代法和预条件迭代法的计算效率,表明预条件迭代法不仅能够保证格式的四阶精度,还能极大地提高收敛效率。  相似文献   

10.
热传导方程在地下水流动数值模拟、油藏数值模拟等工程计算中有着广泛应用,其并行实现是加速问题求解速度、提高问题求解规模的重要手段,因此热传导方程的并行求解具有重要意义。对Krylov子空间方法中的CG和GMRES算法进行并行分析,并对不同的预处理CG算法作了比较。在Linux集群系统上,以三维热传导模型为例进行了数值实验。实验结果表明,CG算法比GMRES算法更适合建立三维热传导模型的并行求解。此外,CG算法与BJACOBI预条件子的整合在求解该热传导模型时,其并行程序具有良好的加速比和效率。因此,采用BJACOBI预处理技术的CG算法是一种较好的求解三维热传导模型的并行方案。  相似文献   

11.
一种LU分解与迭代法的结合策略及算法实现   总被引:3,自引:1,他引:3  
在矩阵求解算法中,直接法或迭代法都不能有效地求解大规模稀疏或病态矩阵,因此提出一种LU分解与迭代法结合的策略。采用LU分解对矩阵进行预处理,以提高迭代法的收敛性,并采用一种判断策略使矩阵的LU分解结果可最大限度地重复利用。此结合策略应用于两种共轭梯度(CG)法,得到CLUCG和CLUTCG两种算法。它们已应用于模拟和混合信号电路模拟器ZeniVDE中。大量实验结果表明此结合策略是很有效的,得到的两种算法具有较快的速度和较好的收敛性。  相似文献   

12.
A new class of approximate inverse arrow-type matrix techniques based on the concept of sparse approximate LU-type factorization procedures is introduced for computing explicitly approximate inverses without inverting the decomposition factors. Isomorphic methods in conjunction with explicit preconditioned schemes based on approximate inverse matrix techniques are presented for the efficient solution of arrow-type linear systems. Applications of the proposed method on linear systems is discussed and numerical results are given  相似文献   

13.
细观数值模拟是混凝土性能研究的一种重要手段,但稀疏线性方程组求解在总体模拟时间中所占比重很大。由于属于三维问题,且规模很大,所以采用预条件Krylov子空间迭代是必由之路。Aztec是国际上专门设计用于求解稀疏线性方程组的软件包之一,由于目前混凝土细观数值模拟中的稀疏线性方程组对称正定,所以利用Aztec中提供的CG迭代法进行求解,并对多种能保持对称性的预条件选项进行了实验比较。结果表明,在基于区域分解的并行不完全Cholesky分解、无重叠对称化GS迭代、最小二乘等预条件技术中,第一种的效率最高,且在重叠度为0,填充层次为0时,效果最好;实验结果还表明,在本应用问题中,用RCM排序一般导致求解时间更长,从而没有必要采用。  相似文献   

14.
《国际计算机数学杂志》2012,89(1-4):341-364
An architecture of a parallel computing system for matrix computations based on a systolic array of processors is considered and a version of incomplete block-factorizations of sparse matrices that arise in finite difference solution of three dimensional elliptic differential equations of second order is studied. The parallelization, both of factorization and solution processes, is investigated and the implementation of a stationary preconditioned iterative method on the considered computer architecture is described. Representative numerical tests for the efficiency of the consecutive version of the method for solving three-dimensional finite difference approximations to the Poisson equation are presented.  相似文献   

15.
在超分辨率影像重建中,基于最大后验估计(MAP)框架的重建方法具有较大的优势,应用非常广泛。然而,常用的迭代求解方法如最速下降法、共轭梯度法等收敛速度慢、处理时间长,经常难以满足实际处理的需要。该文在MAP框架的基础上,提出了基于不完全乔莱斯基分解预优共轭梯度的模型求解方法,即在迭代求解过程中利用不完全乔莱斯基分解构造预优矩阵,降低系数矩阵的条件数,从而提高收敛速度,节省处理时间。实验结果证明,该方法是有效的、可行的。  相似文献   

16.
A parallel preconditioned conjugate gradient (PCG) algorithm is derived by using a two-level twisted factorization. The algorithm is optimal for a four-processor parallel computer (quadputer). Numerical results obtained on a quadputer with distributed memory are presented.  相似文献   

17.
The performance of conjugate gradient (CG) algorithms for the solution of the system of linear equations that results from the finite-differencing of the neutron diffusion equation was analyzed on SIMD, MIMD, and mixed-mode parallel machines. A block preconditioner based on the incomplete Cholesky factorization was used to accelerate the conjugate gradient search. The issues involved in mapping both the unpreconditioned and preconditioned conjugate gradient algorithms onto the mixed-mode PASM prototype, the SIMD MasPar MP-1, and the MIMD Intel Paragon XP/S are discussed. On PASM , the mixed-mode implementation outperformed either SIMD or MIMD alone. Theoretical performance predictions were analyzed and compared with the experimental results on the MasPar MP-1 and the Paragon XP/S. Other issues addressed include the impact on execution time of the number of processors used, the effect of the interprocessor communication network on performance, and the relationship of the number of processors to the quality of the preconditioning. Applications studies such as this are necessary in the development of software tools for mapping algorithms onto either a single parallel machine or a heterogeneous suite of parallel machines.  相似文献   

18.
Direct solvers are commonly used in implicit finite element codes for structural dynamics problems. This study explores an alternative approach to solving the resulting linear systems by using preconditioned iterative schemes such as the preconditioned conjugate gradient algorithm and its variants. Preconditioners used in this study include approximate Cholesky factorization, block Jacobi, and the symmetric Gauss-Seidel over-relaxation scheme. The effects of various preconditioners and ordering schemes on the solution time and required storage are investigated. Performance results of these iterative solver are compared against the MA57 direct solver routine of the commercial Harwell Software Library.  相似文献   

19.
Preconditioned iterative methods of conjugate gradient type for solving elliptic and parabolic problems discretized on grids wth local refinement are considered. The sparsity pattern of the residuals computed throughout the iterative process is investigated. It turns out that they are nonzero only near the interface nodes between the coarse-and fine-grids. This observation is used to formulate the preconditioned CG, and when the matrix is not symmetric as in the parabolic case—the generalized CG and GMRES methods, thus substantially saving storage and computation.  相似文献   

20.
In this paper, we made an attempt to establish the usefulness of Lanczos solver with preconditioning technique over the preconditioned Conjugate Gradient (CG) solvers. We have presented here a detail comparative study with respect to convergence, speed as well as CPU-time, by considering appropriate boundary value problems.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号