首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Tian  Min  Wang  Junjie  Zhang  Zanjun  Du  Wei  Pan  Jingshan  Liu  Tao 《The Journal of supercomputing》2022,78(9):11441-11463
The Journal of Supercomputing - Sparse LU factorization is essential for scientific and engineering simulations. In this work, we present swSuperLU, a highly scalable sparse direct solver on Sunway...  相似文献   

2.
3.
In this paper, an efficient method for the solution of matrix equations resulting from the flow of water in a pipe network is presented. The method is found to be rapidly convergent, and computationally efficient, and the results obtained compare favourably with those found in the literature.  相似文献   

4.
In this work, efficient algorithms for sparse computations (reordering algorithms, storage schemes, symbolic factorization, master degree-of-freedom, L1D1U numerical factorization, forward and backward solutions) are developed and integrated into the proposed procedures. In order to exploit fast saxpy operations offered by many vector computers, take advantage of available cache in many workstations, and minimize data movements into fast memory, special storage schemes are designed to store the coefficient (unsymmetrical) matrix. Thus, the upper triangular portion of the coefficient matrix is stored in a compressed sparse row format, while the lower triangular portion of the same matrix is stored in a compressed sparse column format. A reordering algorithm is applied on one portion of the matrix to minimize fill-ins. Unsymmetrical matrix–vector multiplication has also been sparsely computed for “error-norm check” purpose. The entire sparse procedures have been coded in standard Fortran language.  相似文献   

5.
大规模三角线性方程求解是科学与工程应用中重要的计算核心,受限于处理器的缓存容量和结构设计,其在CPU和GPU等平台上的计算效率不高。大规模三角线性方程的分块求解中,矩阵乘是主要运算,其计算效率对提升三角线性方程求解的计算效率至关重要。以矩阵乘计算效率较高的矩阵乘协处理器为计算平台,针对其结构特点提出了矩阵乘协处理器上大规模三角线性方程分块求解的实现方法和性能分析模型。实验结果表明,矩阵乘协处理器上大规模三角线性方程求解的计算效率最高可达85.9%,其实际性能和资源利用率分别为同等工艺下GPU的2.42倍和10.72倍。  相似文献   

6.
We present in this work stable methods for solving sparse linear equations systems based on the LU factorizations and its updating when columns and rows of the system matrix are added, deleted or modified.  相似文献   

7.
A simple direct method and its variants, based on matrix multiplications, to invert square non-singular matrices (need not be positive definite) and to solve systems of linear equations are developed. Since the method and its variants involve only matrix multiplications they are straightforward to parallelise and hence have an advantage over some existing well known direct methods. Theoretical background, analysis of the proposed method and the complexity of the algorithm for sparse and dense matrices are given. Two significantly different parallel algorithms for dense and sparse matrices are developed. In the case of the dense matrix standard parallelisation of matrix multiplications was undertaken. However, for sparse matrices the standard parallel matrix multiplication is not effective which led us to develop a parallel algorithm different from that for the dense matrix. The performances of our algorithms are tested via numerical examples.  相似文献   

8.
Interval Newton/Generalized Bisection methods reliably find all numerical solutions within a given domain. Both computational complexity analysis and numerical experiments have shown that solving the corresponding interval linear system generated by interval Newton's methods can be computationally expensive (especially when the nonlinear system is large). In applications, many large-scale nonlinear systems of equations result in sparse interval jacobian matrices. In this paper, we first propose a general indexed storage scheme to store sparse interval matrices We then present an iterative interval linear solver that utilizes the proposed index storage scheme It is expected that the newly proposed general interval iterative sparse linear solver will improve the overall performance for interval Newton/Generalized bisection methods when the jacobian matrices are sparse. In section 1, we briefly review interval Newton's methods. In Section 2, we review some currently used storage schemes for sparse systems. In Section 3, we introduce a new index scheme to store general sparse matrices. In Section 4, we present both sequential and parallel algorithms to evaluate a general sparse Jacobian matrix. In Section 5, we present both sequential and parallel algorithms to solve the corresponding interval linear system by the all-row preconditioned scheme. Conclusions and future work are discussed in Section 6.  相似文献   

9.
In this paper, we aim at exploiting the power computing of a graphics processing unit (GPU) cluster for solving large sparse linear systems. We implement the parallel algorithm of the generalized minimal residual iterative method using the Compute Unified Device Architecture programming language and the MPI parallel environment. The experiments show that a GPU cluster is more efficient than a CPU cluster. In order to optimize the performances, we use a compressed storage format for the sparse vectors and the hypergraph partitioning. These solutions improve the spatial and temporal localization of the shared data between the computing nodes of the GPU cluster.  相似文献   

10.
A coarse-grain parallel solver for systems of linear algebraic equations with general sparse matrices by Gaussian elimination is discussed. Before the factorization two other steps are performed. A reordering algorithm is used during the first step in order to obtain a permuted matrix with as many zero elements under the main diagonal as possible. During the second step the reordered matrix is partitioned into blocks for asynchronous parallel processing (normally the number of blocks is equal to the number of processors). It is possible to obtain blocks with nearly the same number of rows, because there is no requirement to produce square diagonal blocks. The first step is much more important than the second one and has a significant influence on the performance of the solver. A straightforward implementation of the reordering algorithm will result inO(n 2) operations. By using binary trees this cost can be reduced toO(NZ logn), whereNZ is the number of non-zero elements in the matrix andn is its order (normallyNZ is much smaller thann 2). Some experiments on parallel computers with shared memory have been performed. The results show that a solver based on the proposed reordering performs better than another solver based on a cheaper (but at the same time rather crude) reordering whose cost is onlyO(NZ) operations.  相似文献   

11.
12.
We propose two different algorithms which depend on the modified digraph approach for solving a sparse system of linear equations. The main feature of the algorithms is that the solution of a sparse system of linear equations can be expressed exactly if all the non-zero entries, including the right-hand side, are integers and if none of the products exceeds the size of the largest integer that can be represented in the arithmetic of the computer used. The implementation of the algorithms is tested on five problems. The results are compared with those obtained using an algorithm proposed earlier. It is shown that the efficiency with which a sparse system of linear equations can be analysed by a digital computer using the proposed modified digraph approach as a tool depends mainly on the efficiency with which semifactors and k-semifactors are generated. Finally, in our implementation of the proposed algorithms, the input sparse matrix is stored using a row-ordered list of a modified uncompressed storage scheme.  相似文献   

13.
基于有限元总刚矩阵的大规模稀疏性、对称性等特性,采用全稀疏存储结构以及最小填入元算法,使得计算机的存储容量达到最少。为了节省计算机的运算时间,对总刚矩阵进行符号LU分解方法,大大减少了数值求解过程中的数据查询。这种全稀疏存储结构和符号LU分解相结合的求解方法,使大规模稀疏线性化方程组的求解效率大大提高。数值算例证明该算法在时间和存贮上都较为占优,可靠高效,能够应用于有限元线性方程组的求解。  相似文献   

14.
In this paper, we describe scalable parallel algorithms for symmetric sparse matrix factorization, analyze their performance and scalability, and present experimental results for up to 1,024 processors on a Gray T3D parallel computer. Through our analysis and experimental results, we demonstrate that our algorithms substantially improve the state of the art in parallel direct solution of sparse linear systems-both in terms of scalability and overall performance. It is a well known fact that dense matrix factorization scales well and can be implemented efficiently on parallel computers. In this paper, we present the first algorithms to factor a wide class of sparse matrices (including those arising from two- and three-dimensional finite element problems) that are asymptotically as scalable as dense matrix factorization algorithms on a variety of parallel architectures. Our algorithms incur less communication overhead and are more scalable than any previously known parallel formulation of sparse matrix factorization. Although, in this paper, we discuss Cholesky factorization of symmetric positive definite matrices, the algorithms can be adapted for solving sparse linear least squares problems and for Gaussian elimination of diagonally dominant matrices that are almost symmetric in structure. An implementation of one of our sparse Cholesky factorization algorithms delivers up to 20 GFlops on a Gray T3D for medium-size structural engineering and linear programming problems. To the best of our knowledge, this is the highest performance ever obtained for sparse Cholesky factorization on any supercomputer  相似文献   

15.
Coarse grain parallel codes for solving sparse systems of linear algebraic equations can be developed in several different ways. The following procedure is suitable for some parallel computers. A preliminary reordering of the matrix is first applied to move as many zero elements as possible to the lower left corner. After that the matrix is partitioned into large blocks and the blocks in the lower left corner contain only zero elements. An attempt to obtain a good load-balance is carried out by allowing the diagonal blocks to be rectangular.

While the algorithm based on the above ideas has good parallel properties, some stability problems may arise during the factorization because the pivotal search is restricted to the diagonal blocks. A simple a priori procedure has been used in a previous version in an attempt to stabilize the algorithm. In this paper it is shown that three enhanced stability devices can successfully be incorporated in the algorithm so that it is further stabilized and, moreover, the parallel properties of the original algorithm are preserved.

The first device is based on a dynamic check of the stability. In the second device a slightly modified reordering is used in an attempt to get more nonzero elements in the diagonal blocks (the number of candidates for pivots tends to increase in this situation and, therefore, there is a better chance to select more stable pivots). The third device applies a P5-like ordering as a secondary criterion in the basic reordering procedure. This tends to improve the reordering and the performance of the solver. Moreover, the device is stable, while the original P5 ordering is often unstable.

Numerical results obtained by using the three new devices are presented. The well-known sparse matrices from the Harwell-Boeing set are used in the experiments.  相似文献   


16.
In this paper a numerical algorithm for the solution of the multi-dimensional steady Euler equations in conservative and non-conservative form is presented. Most existing standard and multi-dimensional schemes use flux balances with assumed constant distribution of variables along each cell edge, which interfaces two grid cells. This assumption is believed to be one of the main reasons for the limited advantage gained from multi-dimensional high order discretisations compared to standard one-dimensional ones. The present algorithm is based on the optimisation of polynomials describing the distribution of flow variables in grid cells, where only polynomials that satisfy the Euler equations in the entire grid cell can be selected. The global solution is achieved if all polynomials and by that the flow variables are continuous along edges interfacing neighbouring grid cells. A discrete approximation of a given spatial order is converged if the deviation between polynomial distributions of adjacent grid cells along the interfacing edge of the cells is minimal. Results from the present scheme between first and fifth order spatial accuracy are compared to standard first and second order Roe computations for simple test cases demonstrating the gain in accuracy for a number of sub- and supersonic flow problems.  相似文献   

17.
The ODEs describing a chemical kinetics system can be very stiff and are the most computationally costly part of most reactive flow simulations. Research areas ranging from combustion to climate modeling are often limited by their ability to solve these chemical ODE systems both accurately and efficiently. These problems are commonly treated with an implicit numerical method due to the stiffness that is usually present. The implicit solution technique introduces a large amount of computational overhead necessary to solve the nonlinear algebraic system derived from the implicit time-stepping method. In this paper, a code is presented that avoids much of the usual overhead by preconditioning the implicit method with an iterative technique. This results in a class of time-stepping method that is explicit and very stable for chemical kinetics problems.  相似文献   

18.
《国际计算机数学杂志》2012,89(1-4):181-191
A method is described for solving a system of linear algebraic equations which is almost tridiagonal. Numerical results are presented for a number of test problems and some comparisons are made with the results obtained from algorithms proposed by other authors. A possible extension of the technique is briefly outlined.  相似文献   

19.
Summary The notion of strong or adjoint stability for linear ordinary differential equations is generalized to the theory of Volterra integral equations. It is found that this generalization is not unique in that equivalent definitions for differential equations lead to different stabilities for integral equations in general. Three types of stabilities arising naturally are introduced: strong stability, adjoint stability, and uniform adjoint stability. Necessary and sufficient conditions relative to the fundamental matrix for these stabilities are proved. Some lemmas dealing with non-oscillation of solutions and a semi-group property of the fundamental matrix are also given.  相似文献   

20.
Efficient and reliable numerical techniques of high-order accuracy are presented for solving problems for nonlinear perturbed biharmonic equations. The method is widely applicable, e.g. to problems of elasticity and in fluid mechanics. Here it is used to obtain accurate solutions for the driven cavity applying fourth order approximation on all convective terms. Solutions are calculated up to Reynolds number 30000.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号