首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The hypre software library is being developed with the aim of providing scalable solvers for the solution of large, sparse linear systems on massively parallel computers. To this end, the notion of conceptual interfaces was introduced. These interfaces give applications users a more natural means for describing their linear systems, and provide access to methods such as geometric multigrid which require additional information beyond just the matrix. This paper discusses the design of the conceptual interfaces in hypre and illustrates their use with various examples. A brief overview of the solvers and preconditioners available through these interfaces is also given.  相似文献   

2.
Interval arithmetic can be a useful tool in soft computing. However, working with intervals requires specialized algorithms and appropriate data structures. In this paper, we give an overview of the C++ library C-XSC, which provides many useful data types and functions for (verified) scientific computing, with a special focus on interval arithmetic. We describe its basic features and focus especially on some recent new features that significantly broaden the range of uses of C-XSC, such as dot products in K-fold double precision, sparse data types and BLAS support. In a second section, we describe an application of C-XSC in soft computing in the form of an interface between C-XSC and MATLAB for the DSI-Toolbox, which combines Dempster–Shafer theory to model uncertain data with verified interval arithmetic. For some applications, utilizing C-XSC in MATLAB (or as a standalone) can be more effective than the widely used INTLAB extension for MATLAB, due to the lack of interpretation overhead. As an example, we use C-XSC for the normalization function in the DSI-Toolbox and compare the results and performance to those provided by INTLAB. Using C-XSC in MATLAB also extends the functionality of MATLAB/INTLAB. As an example, the interval error function of C-XSC (this function is not provided by INTLAB) is utilized via a MEX-interface for the verified sampling of the cumulative normal distribution.  相似文献   

3.
In optimization routines used for on-line Model Predictive Control (MPC), linear systems of equations are solved in each iteration. This is true both for Active Set (AS) solvers as well as for Interior Point (IP) solvers, and for linear MPC as well as for nonlinear MPC and hybrid MPC. The main computational effort is spent while solving these linear systems of equations, and hence, it is of great interest to solve them efficiently. In high performance solvers for MPC, this is performed using Riccati recursions or generic sparsity exploiting algorithms. To be able to get this performance gain, the problem has to be formulated in a sparse way which introduces more variables. The alternative is to use a smaller formulation where the objective function Hessian is dense. In this work, it is shown that it is possible to exploit the structure also when using the dense formulation. More specifically, it is shown that it is possible to efficiently compute a standard Cholesky factorization for the dense formulation. This results in a computational complexity that grows quadratically in the prediction horizon length instead of cubically as for the generic Cholesky factorization.  相似文献   

4.
A simple direct method and its variants, based on matrix multiplications, to invert square non-singular matrices (need not be positive definite) and to solve systems of linear equations are developed. Since the method and its variants involve only matrix multiplications they are straightforward to parallelise and hence have an advantage over some existing well known direct methods. Theoretical background, analysis of the proposed method and the complexity of the algorithm for sparse and dense matrices are given. Two significantly different parallel algorithms for dense and sparse matrices are developed. In the case of the dense matrix standard parallelisation of matrix multiplications was undertaken. However, for sparse matrices the standard parallel matrix multiplication is not effective which led us to develop a parallel algorithm different from that for the dense matrix. The performances of our algorithms are tested via numerical examples.  相似文献   

5.
《Parallel Computing》1999,25(13-14):1931-1970
In this review paper, we consider some important developments and trends in algorithm design for the solution of linear systems concentrating on aspects that involve the exploitation of parallelism. We briefly discuss the solution of dense linear systems, before studying the solution of sparse equations by direct and iterative methods. We consider preconditioning techniques for iterative solvers and discuss some of the present research issues in this field.  相似文献   

6.
Methods for the solution of sparse eigenvalue problems that are based on spectral projectors and contour integration have recently attracted more and more attention. Such methods require the solution of many shifted sparse linear systems of full size. In most of the literature concerning these eigenvalue solvers, only few words are said on the solution of the linear systems, but they turn out to be very hard to solve by iterative linear solvers in practice. In this work we identify a row projection method for the solution of the inner linear systems encountered in the FEAST algorithm and introduce a novel hybrid parallel and fully iterative implementation of the eigenvalue solver. Our approach ultimately aims at achieving extreme parallelism by exploiting the algorithm’s potential on several levels. We present numerical examples where graphene modeling is one of the target applications. In this application, several hundred or even thousands of eigenvalues from the interior of the spectrum are required, which is a big challenge for state-of-the-art numerical methods.  相似文献   

7.
Exascale computers are expected to have highly hierarchical architectures with nodes composed by multiple core processors (CPU; central processing unit) and accelerators (GPU; graphics processing unit). The different programming levels generate new difficult algorithm issues. In particular when solving extremely large linear systems, new programming paradigms of Krylov methods should be defined and evaluated with respect to modern state of the art of scientific methods. Iterative Krylov methods involve linear algebra operations such as dot product, norm, addition of vectors and sparse matrix–vector multiplication. These operations are computationally expensive for large size matrices. In this paper, we aim to focus on the best way to perform effectively these operations, in double precision, on GPU in order to make iterative Krylov methods more robust and therefore reduce the computing time. The performance of our algorithms is evaluated on several matrices arising from engineering problems. Numerical experiments illustrate the robustness and accuracy of our implementation compared to the existing libraries. We deal with different preconditioned Krylov methods: Conjugate Gradient for symmetric positive-definite matrices, and Generalized Conjugate Residual, Bi-Conjugate Gradient Conjugate Residual, transpose-free Quasi Minimal Residual, Stabilized BiConjugate Gradient and Stabilized BiConjugate Gradient (L) for the solution of sparse linear systems with non symmetric matrices. We consider and compare several sparse compressed formats, and propose a way to implement effectively Krylov methods on GPU and on multicore CPU. Finally, we give strategies to faster algorithms by auto-tuning the threading design, upon the problem characteristics and the hardware changes. As a conclusion, we propose and analyse hybrid sub-structuring methods that should pave the way to exascale hybrid methods.  相似文献   

8.
The solution of large, sparse linear systems is often a dominant phase of computation for simulations based on partial differential equations, which are ubiquitous in scientific and engineering applications. While preconditioned Krylov methods are widely used and offer many advantages for solving sparse linear systems that do not have highly convergent, geometric multigrid solvers or specialized fast solvers, Krylov methods encounter well-known scaling difficulties for over 10,000 processor cores because each iteration requires at least one vector inner product, which in turn requires a global synchronization that scales poorly because of internode latency. To help overcome these difficulties, we have developed hierarchical Krylov methods and nested Krylov methods in the PETSc library that reduce the number of global inner products required across the entire system (where they are expensive), though freely allow vector inner products across smaller subsets of the entire system (where they are inexpensive) or use inner iterations that do not invoke vector inner products at all.  相似文献   

9.
A.  U.  M.  K. 《Future Generation Computer Systems》2005,21(8):1275-1284
For the solution of sparse linear systems from circuit simulation whose coefficient matrices include a few dense rows and columns, a parallel iterative algorithm with distributed Schur complement preconditioning is presented. The parallel efficiency of the solver is increased by transforming the equation system into a problem without dense rows and columns as well as by exploitation of parallel graph partitioning methods. The costs of local, incomplete LU decompositions are decreased by fill-in reducing reordering methods of the matrix and a threshold strategy for the factorization. The efficiency of the parallel solver is demonstrated with real circuit simulation problems on PC clusters.  相似文献   

10.
The parallelization of sophisticated applications has dramatically increased in recent years. As machine capabilities rise, greater emphasis on modeling complex phenomena can be expected. Many of these applications require the solution of large sparse matrix equations which approximate systems of partial differential equations (PDEs). Therefore we consider parallel iterative solvers for large sparse non-symmetric systems and issues related to parallel sparse matrix software. We describe a collection of parallel iterative solvers which use a distributed sparse matrix format that facilitates the interface between specific applications and a variety of Krylov subspace techniques and multigrid methods. These methods have been used to solve a number of linear and non-linear PDE problems on a 1024-processor NCUBE 2 hypercube. Over 1 Gflop sustained computation rates are achieved with many of these solvers, demonstrating that high performance can be attained even when using sparse matrix data structures.  相似文献   

11.
In this paper we describe general software utilities for performing unstructured sparse matrix–vector multiplications on distributed-memory message-passing computers. The matrix–vector multiply comprises an important kernel in the solution of large sparse linear systems by iterative methods. Our focus is to present the data structures and communication parameters required by these utilities for general sparse unstructured matrices with data locality. These types of matrices are commonly produced by finite difference and finite element approximations to systems of partial differential equations. In this discussion we also present representative examples and timings which demonstrate the utility and performance of the software. © 1998 John Wiley & Sons, Ltd.  相似文献   

12.
This paper presents a collection of linear algebra subroutines for transputer networks. The developed pilot library is intended to form a basis of a complete parallel linear algebra library for validating computations, whose routines will deliver (as accurately as necessary) either the best possible result, or a corresponding inclusion based on controlled rounding and an optimal scalar product. So far, as a first step, we have produced code for interval arithmetic, scalar products and simple vector-matrix operations with maximum accuracy. For the solution of triangular systems and the LU decomposition of dense matrices, new versions of classical methods are implemented which allow the application of optimal scalar products as a single, invidisible operation and optimize the overlap of communication and computation. The library also contains routines for computing inclusions of unstructured, dense linear systems of equations. Network topology dependency is avoided in all numerical routines by the use of general communication routines. This way the user is able to work with different topologies like ring structures, binary trees and hypercubes.  相似文献   

13.
In this paper, we describe scalable parallel algorithms for symmetric sparse matrix factorization, analyze their performance and scalability, and present experimental results for up to 1,024 processors on a Gray T3D parallel computer. Through our analysis and experimental results, we demonstrate that our algorithms substantially improve the state of the art in parallel direct solution of sparse linear systems-both in terms of scalability and overall performance. It is a well known fact that dense matrix factorization scales well and can be implemented efficiently on parallel computers. In this paper, we present the first algorithms to factor a wide class of sparse matrices (including those arising from two- and three-dimensional finite element problems) that are asymptotically as scalable as dense matrix factorization algorithms on a variety of parallel architectures. Our algorithms incur less communication overhead and are more scalable than any previously known parallel formulation of sparse matrix factorization. Although, in this paper, we discuss Cholesky factorization of symmetric positive definite matrices, the algorithms can be adapted for solving sparse linear least squares problems and for Gaussian elimination of diagonally dominant matrices that are almost symmetric in structure. An implementation of one of our sparse Cholesky factorization algorithms delivers up to 20 GFlops on a Gray T3D for medium-size structural engineering and linear programming problems. To the best of our knowledge, this is the highest performance ever obtained for sparse Cholesky factorization on any supercomputer  相似文献   

14.
In this paper, we develop an efficient Petrov-Galerkin method for the generalized airfoil equation. In general, the Petrov-Galerkin method for this equation leads to a linear system with a dense coefficient matrix. When the order of the coefficient matrix is large, the complexity for solving the corresponding linear system is huge. For this purpose, we propose a matrix truncation strategy to compress the dense coefficient matrix into a sparse matrix. Subsequently, we use a numerical integration method to generate the fully discrete truncated linear system. At last we solve the corresponding linear system by the multilevel augmentation method. An optimal order of the approximate solution is preserved. The computational complexity for generating the fully discrete truncated linear system and solving it is estimated to be linear up to a logarithm. The spectral condition number of the truncated matrix is proved to be bounded. Numerical examples complete the paper. Supported by the Doctoral Foundation of Shandong Finance Institute.  相似文献   

15.
We present the novel parallel linear least squares solvers ARPLS-IR and ARPLS-MPIR for dense overdetermined linear systems. All internode communication of our ARPLS solvers arises in the context of all-reduce operations across the parallel system and therefore they benefit directly from efficient implementations of such operations. Our approach is based on the semi-normal equations, which are in general not backward stable. However, the method is stabilised by using iterative refinement. We show that performing iterative refinement in mixed precision also increases the parallel performance of the algorithm. We consider different variants of the ARPLS algorithm depending on the conditioning of the problem and in this context also evaluate the method of normal equations with iterative refinement. For ill-conditioned systems, we demonstrate that the semi-normal equations with standard iterative refinement achieve the best accuracy compared to other parallel solvers.We discuss the conceptual advantages of ARPLS-IR and ARPLS-MPIR over alternative parallel approaches based on QR factorisation or the normal equations. Moreover, we analytically compare the communication cost to an approach based on communication-avoiding QR factorisation. Numerical experiments on a high performance cluster illustrate speed-ups up to 3820 on 2048 cores for ill-conditioned tall and skinny matrices over state-of-the-art solvers from DPLASMA or ScaLAPACK.  相似文献   

16.
In this paper, we study the effect of the choice of mesh quality metric, preconditioner, and sparse linear solver on the numerical solution of elliptic partial differential equations (PDEs). We smooth meshes on several geometric domains using various quality metrics and solve the associated elliptic PDEs using the finite element method. The resulting linear systems are solved using various combinations of preconditioners and sparse linear solvers. We use the inverse mean ratio and radius ratio metrics in addition to conditioning-based scale-invariant and interpolation-based size-and-shape metrics. We employ the Jacobi, SSOR, incomplete LU, and algebraic multigrid preconditioners and the conjugate gradient, minimum residual, generalized minimum residual, and bi-conjugate gradient stabilized solvers. We focus on determining the most efficient quality metric, preconditioner, and linear solver combination for the numerical solution of various elliptic PDEs with isotropic coefficients. We also investigate the effect of vertex perturbation and the effect of increasing the problem size on the number of iterations required to converge and on the solver time. In this paper, we consider Poisson’s equation, general second-order elliptic PDEs, and linear elasticity problems.  相似文献   

17.
We present a linear algebra framework for structured matrices and general optimization problems. The matrices and matrix operations are defined recursively to efficiently capture complex structures and enable advanced compiler optimization. In addition to common dense and sparse matrix types, we define mixed matrices, which allow every element to be of a different type. Using mixed matrices, the low‐ and high‐level structure of complex optimization problems can be encoded in a single type. This type is then analyzed at compile time by a recursive linear solver that picks the optimal algorithm for the given problem. For common computer vision problems, our system yields a speedup of 3–5 compared to other optimization frameworks. The BLAS performance is benchmarked against the MKL library. We achieve a significant speedup in block‐SPMV and block‐SPMM. This work is implemented and released open‐source as a header‐only extension to the C+ + math library Eigen.  相似文献   

18.
We develop a novel preconditioning strategy, based on a non-standard, Discrete Wavelet Transform (DWT), for the dense, non-symmetric, linear systems that must be solved when Newton's method is used in the solution of Elastohydrodynamic Lubrication (EHL) problems. Simple band preconditioners and sparse preconditioners based on standard DWT have been found to be of limited value for EHL problems, since they may be singular, give poor convergence or be expensive to apply. We present algorithms for preconditioner design based on detecting non-smooth diagonal bands within an otherwise smooth matrix and applying a non-standard DWT to compress the part of the matrix away from the band. We illustrate, by numerical examples, the improvements that can be made when our methods are used.  相似文献   

19.
线性系统求解中迭代算法的GPU加速方法   总被引:1,自引:0,他引:1  
在求解线性系统时,迭代法是一种基本的方法,特别是在系数矩阵为大规模稀疏矩阵的情况下,高效地使用迭代法求解变得十分重要。本文通过分析迭代法的一般特点,提出了使用具有强大计算能力和存储带宽的GPU加速迭代法的一般方法。利用这些方法,在两种主流GPU平台上实现了一个经典的迭代法PQMRCGSTAB,并且针对不同的GPU平台特点提出了具体的优化方法。与AMD Opteron 2.4GHz 4核处理器相比,双精度版本的PQMRCGSTAB算法经NVIDIA Tesla S1070加速后性能提高31倍,经AMD Radeon HD 4870 X2加速后性能提高9倍。  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号