首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
The need for accuracy in the solution of linear systems derived from the discretization of partial differential equations leads to large sparse linear systems. The solution of sparse linear systems requires efficient scalable methods. Iterative solvers require efficient parallel preconditioning methods to solve effectively sparse linear systems. Herewith, a new parallel algorithm for the generic approximate sparse inverse matrix method for distributed memory systems is proposed. The computation of the distributed generic approximate sparse inverse matrix is based on a column-wise approach, which allows the separation to independent problems that can be handled in parallel without synchronization points or intermediate communications. This is achieved by reforming the generic approximate sparse inverse matrix algorithm and its process of computation with a new partial solution method for the computation of the nonzero elements of each column dictated by the approximate inverse sparsity pattern. Moreover, an algorithmic scheme is proposed for the efficient distribution of data amongst the available workstations, along with a load balancing scheme for problems with large standard deviation in the number of nonzero elements per column. Numerical results are presented for the proposed schemes for various model problems.  相似文献   

2.
Methods for the solution of sparse eigenvalue problems that are based on spectral projectors and contour integration have recently attracted more and more attention. Such methods require the solution of many shifted sparse linear systems of full size. In most of the literature concerning these eigenvalue solvers, only few words are said on the solution of the linear systems, but they turn out to be very hard to solve by iterative linear solvers in practice. In this work we identify a row projection method for the solution of the inner linear systems encountered in the FEAST algorithm and introduce a novel hybrid parallel and fully iterative implementation of the eigenvalue solver. Our approach ultimately aims at achieving extreme parallelism by exploiting the algorithm’s potential on several levels. We present numerical examples where graphene modeling is one of the target applications. In this application, several hundred or even thousands of eigenvalues from the interior of the spectrum are required, which is a big challenge for state-of-the-art numerical methods.  相似文献   

3.
Modelling the interaction of an acoustic field in a fluid and an elastic structure submerged in the fluid leads to a system of complex linear equations with a complicated sparsity structure and, for higher wavenumbers and adequate modelling, the systems are very large. Direct methods are not practical. Preconditioned iterative methods, which are suitable for single operator equations, are not immediately applicable to the coupled case. This article proposes a block diagonal preconditioner of the sparse approximate inverse (SPAI) type that can accelerate the convergence of Krylov iterative solvers for the coupled system. Moreover, the proposed preconditioner can properly and implicitly scale the coupled matrix. Some numerical results are presented to demonstrate the effectiveness of the new method.  相似文献   

4.
B. Carpentieri 《Computing》2006,77(3):275-296
In this paper, we describe a matrix-free iterative algorithm based on the GMRES method for solving electromagnetic scattering problems expressed in an integral formulation. Integral methods are an interesting alternative to differential equation solvers for this problem class since they do not require absorbing boundary conditions and they mesh only the surface of the radiating object giving rise to dense and smaller linear systems of equations. However, in realistic applications the discretized systems can be very large and for some integral formulations, like the popular Electric Field Integral Equation, they become ill-conditioned when the frequency increases. This means that iterative Krylov solvers have to be combined with fast methods for the matrix-vector products and robust preconditioning to be affordable in terms of CPU time. In this work we describe a matrix-free two-grid preconditioner for the GMRES solver combined with the Fast Multipole Method. The preconditioner is an algebraic two-grid cycle built on top of a sparse approximate inverse that is used as smoother, while the grid transfer operators are defined using spectral information of the preconditioned matrix. Experiments on a set of linear systems arising from real radar cross section calculation in industry illustrate the potential of the proposed approach for solving large-scale problems in electromagnetism.  相似文献   

5.
Lyapunov and Stein matrix equations arise in many important analysis and synthesis applications in control theory. The traditional approach to solving these equations relies on the QR algorithm which is notoriously difficult to parallelize. We investigate iterative solvers based on the matrix sign function and the squared Smith iteration which are highly efficient on parallel distributed computers. We also show that by coding using the Parallel Linear Algebra Package (PLAPACK) it is possible to exploit the structure in the matrices and reduce the cost of these solvers. While the performance improvements due to the optimizations are modest, so is the coding effort. One of the optimizations, the updating of a QR factorization, has important applications elsewhere, e.g., in applications requiring the solution of a linear least-squares problem when the linear system is periodically updated. The experimental results on a Cray T3E attest to the high efficiency of these parallel solvers.  相似文献   

6.
The hypre software library is being developed with the aim of providing scalable solvers for the solution of large, sparse linear systems on massively parallel computers. To this end, the notion of conceptual interfaces was introduced. These interfaces give applications users a more natural means for describing their linear systems, and provide access to methods such as geometric multigrid which require additional information beyond just the matrix. This paper discusses the design of the conceptual interfaces in hypre and illustrates their use with various examples. A brief overview of the solvers and preconditioners available through these interfaces is also given.  相似文献   

7.
Interval Newton/Generalized Bisection methods reliably find all numerical solutions within a given domain. Both computational complexity analysis and numerical experiments have shown that solving the corresponding interval linear system generated by interval Newton's methods can be computationally expensive (especially when the nonlinear system is large). In applications, many large-scale nonlinear systems of equations result in sparse interval jacobian matrices. In this paper, we first propose a general indexed storage scheme to store sparse interval matrices We then present an iterative interval linear solver that utilizes the proposed index storage scheme It is expected that the newly proposed general interval iterative sparse linear solver will improve the overall performance for interval Newton/Generalized bisection methods when the jacobian matrices are sparse. In section 1, we briefly review interval Newton's methods. In Section 2, we review some currently used storage schemes for sparse systems. In Section 3, we introduce a new index scheme to store general sparse matrices. In Section 4, we present both sequential and parallel algorithms to evaluate a general sparse Jacobian matrix. In Section 5, we present both sequential and parallel algorithms to solve the corresponding interval linear system by the all-row preconditioned scheme. Conclusions and future work are discussed in Section 6.  相似文献   

8.
基于曙光并行机的超大规模非线性方程组并行算法研究   总被引:8,自引:0,他引:8  
该文讨论了一类求解大规模非线性方程组算法的并行性能及其在曙光并行机上的实现过程,与传统的算法不同之处是用一个块对角矩阵作为迭代矩阵,且该矩阵可由一个仅包含向量内积和矩阵与向量乘积的递推关系简便计算得到,在对算法进行描述之后,分析了算法的并行加速比和存储需求,讨论了算法在基于消息传递的MPI并行环境下的实现流程,数值计算表明理论分析与数值结果相比,算法在分布式并行环境下具有有较好的并行主攻较低的存储要求,可适用于大规模科学与工程的高性能计算。  相似文献   

9.
线性系统求解中迭代算法的GPU加速方法   总被引:1,自引:0,他引:1  
在求解线性系统时,迭代法是一种基本的方法,特别是在系数矩阵为大规模稀疏矩阵的情况下,高效地使用迭代法求解变得十分重要。本文通过分析迭代法的一般特点,提出了使用具有强大计算能力和存储带宽的GPU加速迭代法的一般方法。利用这些方法,在两种主流GPU平台上实现了一个经典的迭代法PQMRCGSTAB,并且针对不同的GPU平台特点提出了具体的优化方法。与AMD Opteron 2.4GHz 4核处理器相比,双精度版本的PQMRCGSTAB算法经NVIDIA Tesla S1070加速后性能提高31倍,经AMD Radeon HD 4870 X2加速后性能提高9倍。  相似文献   

10.
11.
大规模有限元刚度矩阵存储及其并行求解算法   总被引:1,自引:0,他引:1  
本文提出一种将有限元单元刚度矩阵直接集成压缩格式的总体刚度矩阵的方法,并针对其线性系统设计了预处理的重启动GMRES(m)并行求解器.集成方法使用了一个“关联结点”的数据结构,它用来记录网格中节点的关联信息,作为集成过程的中间媒介.这种方法能减少大量的存储空间,简单且高效.求解器分别使用Jacobi和稀疏近似逆(SPAI)预条件子.二维和三维弹性力学问题的数值试验表明,在二维情形下,SPAI预条件子具有很好的加速收敛效果和并行效率;在三维情形下,Jacobi预条件子更能减少迭代收敛时间.  相似文献   

12.
Solving large, sparse, linear systems of equations is a fundamental problems in large scale scientific and engineering computation. A model of a general class of asynchronous, iterative solution methods for linear systems is developed. In the model, the system is solved by creating several cooperating tasks that each compute a portion of the solution vector. A data transfer model predicting both the probability that data must be transferred between two tasks and the amount of data to be transferred is presented. This model is used to derive an execution time model for predicting parallel execution time and an optimal number of tasks given the dimension and sparsity of the coefficient matrix and the costs of computation, synchronization, and communication.The suitability of different parallel architectures for solving randomly sparse linear systems is discussed. Based on the complexity of task scheduling, one parallel architecture, based on a broadcast bus, is presented and analyzed.  相似文献   

13.
We present the novel parallel linear least squares solvers ARPLS-IR and ARPLS-MPIR for dense overdetermined linear systems. All internode communication of our ARPLS solvers arises in the context of all-reduce operations across the parallel system and therefore they benefit directly from efficient implementations of such operations. Our approach is based on the semi-normal equations, which are in general not backward stable. However, the method is stabilised by using iterative refinement. We show that performing iterative refinement in mixed precision also increases the parallel performance of the algorithm. We consider different variants of the ARPLS algorithm depending on the conditioning of the problem and in this context also evaluate the method of normal equations with iterative refinement. For ill-conditioned systems, we demonstrate that the semi-normal equations with standard iterative refinement achieve the best accuracy compared to other parallel solvers.We discuss the conceptual advantages of ARPLS-IR and ARPLS-MPIR over alternative parallel approaches based on QR factorisation or the normal equations. Moreover, we analytically compare the communication cost to an approach based on communication-avoiding QR factorisation. Numerical experiments on a high performance cluster illustrate speed-ups up to 3820 on 2048 cores for ill-conditioned tall and skinny matrices over state-of-the-art solvers from DPLASMA or ScaLAPACK.  相似文献   

14.
The solution of large, sparse linear systems is often a dominant phase of computation for simulations based on partial differential equations, which are ubiquitous in scientific and engineering applications. While preconditioned Krylov methods are widely used and offer many advantages for solving sparse linear systems that do not have highly convergent, geometric multigrid solvers or specialized fast solvers, Krylov methods encounter well-known scaling difficulties for over 10,000 processor cores because each iteration requires at least one vector inner product, which in turn requires a global synchronization that scales poorly because of internode latency. To help overcome these difficulties, we have developed hierarchical Krylov methods and nested Krylov methods in the PETSc library that reduce the number of global inner products required across the entire system (where they are expensive), though freely allow vector inner products across smaller subsets of the entire system (where they are inexpensive) or use inner iterations that do not invoke vector inner products at all.  相似文献   

15.
《Parallel Computing》1999,25(13-14):1931-1970
In this review paper, we consider some important developments and trends in algorithm design for the solution of linear systems concentrating on aspects that involve the exploitation of parallelism. We briefly discuss the solution of dense linear systems, before studying the solution of sparse equations by direct and iterative methods. We consider preconditioning techniques for iterative solvers and discuss some of the present research issues in this field.  相似文献   

16.
There are verities of useful Krylov subspace methods to solve nonsymmetric linear system of equations. GMRES is one of the best Krylov solvers with several different variants to solve large sparse linear systems. Any GMRES implementation has some advantages. As the solution of ill-posed problems are important. In this paper, some GMRES variants are discussed and applied to solve these kinds of problems. Residual smoothing techniques are efficient ways to accelerate the convergence speed of some iterative methods like CG variants. At the end of this paper, some residual smoothing techniques are applied for different GMRES methods to test the influence of these techniques on GMRES implementations.  相似文献   

17.
A finite element method often leads to large sparse symmetric and positive definite systems of linear equations. We consider parallel solvers based on the Schur complement method on homogeneous parallel machines with distributed memory. A finite element mesh is partitioned by graph partitioning. Such partitioning results in submeshes with similar numbers of elements and, consequently, submatrices of similar sizes. The submatrices are partially factorised. The time spent on the partial factorisation can be different, i.e., disbalanced, because methods exploiting the sparsity of submatrices are used. This paper proposes a Quality Balancing heuristic that modifies classic mesh partitioning so that the partial factorisation times are balanced, which saves overall computation time, especially for time dependent mechanical and nonstationary transport problems.  相似文献   

18.
块三对角矩阵局部块分解及其在预条件中的应用   总被引:3,自引:1,他引:3  
该文利用块三对阵角阵分解因子的估值分析了其局部依赖性,并用其构了一类不完全分解型预条件子,给出了五点差分矩阵预条件后的条件数估计,并比较了条件数估计值与实际值,表明了估计值的准确性与预备件的有效性,在具体实现时,考虑了预条件的6个串行实现方案并提出了一个有效的并行化方法,该并行算法具有通信量少的特点,最后在由4中微机通过高速以太网连成的机群系统上作了大量数值实验,并将其与其它较效的预条件方法进行了。结果表明该预条件方法效果较好,尤其适用于并行计算。  相似文献   

19.
In this paper we describe general software utilities for performing unstructured sparse matrix–vector multiplications on distributed-memory message-passing computers. The matrix–vector multiply comprises an important kernel in the solution of large sparse linear systems by iterative methods. Our focus is to present the data structures and communication parameters required by these utilities for general sparse unstructured matrices with data locality. These types of matrices are commonly produced by finite difference and finite element approximations to systems of partial differential equations. In this discussion we also present representative examples and timings which demonstrate the utility and performance of the software. © 1998 John Wiley & Sons, Ltd.  相似文献   

20.
《Parallel Computing》1997,23(3):381-398
Conjugate gradient (CG) methods to solve sparse systems of linear equations play an important role in numerical methods for solving discretized partial differential equations. The large size and the condition of many technical or physical applications in this area result in the need for efficient parallelization and preconditioning techniques of the CG method, in particular on massively parallel machines. Here, the data distribution and the communication scheme for the sparse matrix operations of the preconditioned CG are based on the analysis of the indices of the non-zero elements. Polynomial preconditioning is shown to reduce global synchronizations considerably, and a fully local incomplete Cholesky preconditioner is presented. On a PARAGON XP/S 10 with 138 processors, the developed parallel methods outperform diagonally scaled CG markedly with respect to both scaling behavior and execution time for many matrices from real finite element applications.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号