共查询到20条相似文献,搜索用时 15 毫秒
1.
The Journal of Supercomputing - Sparse LU factorization is essential for scientific and engineering simulations. In this work, we present swSuperLU, a highly scalable sparse direct solver on Sunway... 相似文献
3.
In this paper, an efficient method for the solution of matrix equations resulting from the flow of water in a pipe network is presented. The method is found to be rapidly convergent, and computationally efficient, and the results obtained compare favourably with those found in the literature. 相似文献
4.
Interval Newton/Generalized Bisection methods reliably find all numerical solutions within a given domain. Both computational complexity analysis and numerical experiments have shown that solving the corresponding interval linear system generated by interval Newton's methods can be computationally expensive (especially when the nonlinear system is large). In applications, many large-scale nonlinear systems of equations result in sparse interval jacobian matrices. In this paper, we first propose a general indexed storage scheme to store sparse interval matrices We then present an iterative interval linear solver that utilizes the proposed index storage scheme It is expected that the newly proposed general interval iterative sparse linear solver will improve the overall performance for interval Newton/Generalized bisection methods when the jacobian matrices are sparse. In section 1, we briefly review interval Newton's methods. In Section 2, we review some currently used storage schemes for sparse systems. In Section 3, we introduce a new index scheme to store general sparse matrices. In Section 4, we present both sequential and parallel algorithms to evaluate a general sparse Jacobian matrix. In Section 5, we present both sequential and parallel algorithms to solve the corresponding interval linear system by the all-row preconditioned scheme. Conclusions and future work are discussed in Section 6. 相似文献
5.
In this paper, we aim at exploiting the power computing of a graphics processing unit (GPU) cluster for solving large sparse linear systems. We implement the parallel algorithm of the generalized minimal residual iterative method using the Compute Unified Device Architecture programming language and the MPI parallel environment. The experiments show that a GPU cluster is more efficient than a CPU cluster. In order to optimize the performances, we use a compressed storage format for the sparse vectors and the hypergraph partitioning. These solutions improve the spatial and temporal localization of the shared data between the computing nodes of the GPU cluster. 相似文献
6.
A coarse-grain parallel solver for systems of linear algebraic equations with general sparse matrices by Gaussian elimination is discussed. Before the factorization two other steps are performed. A reordering algorithm is used during the first step in order to obtain a permuted matrix with as many zero elements under the main diagonal as possible. During the second step the reordered matrix is partitioned into blocks for asynchronous parallel processing (normally the number of blocks is equal to the number of processors). It is possible to obtain blocks with nearly the same number of rows, because there is no requirement to produce square diagonal blocks. The first step is much more important than the second one and has a significant influence on the performance of the solver. A straightforward implementation of the reordering algorithm will result in O(n 2) operations. By using binary trees this cost can be reduced to O(NZ log n), where NZ is the number of non-zero elements in the matrix and n is its order (normally NZ is much smaller than n 2). Some experiments on parallel computers with shared memory have been performed. The results show that a solver based on the proposed reordering performs better than another solver based on a cheaper (but at the same time rather crude) reordering whose cost is only O(NZ) operations. 相似文献
8.
We propose two different algorithms which depend on the modified digraph approach for solving a sparse system of linear equations. The main feature of the algorithms is that the solution of a sparse system of linear equations can be expressed exactly if all the non-zero entries, including the right-hand side, are integers and if none of the products exceeds the size of the largest integer that can be represented in the arithmetic of the computer used. The implementation of the algorithms is tested on five problems. The results are compared with those obtained using an algorithm proposed earlier. It is shown that the efficiency with which a sparse system of linear equations can be analysed by a digital computer using the proposed modified digraph approach as a tool depends mainly on the efficiency with which semifactors and k-semifactors are generated. Finally, in our implementation of the proposed algorithms, the input sparse matrix is stored using a row-ordered list of a modified uncompressed storage scheme. 相似文献
9.
In this paper, we describe scalable parallel algorithms for symmetric sparse matrix factorization, analyze their performance and scalability, and present experimental results for up to 1,024 processors on a Gray T3D parallel computer. Through our analysis and experimental results, we demonstrate that our algorithms substantially improve the state of the art in parallel direct solution of sparse linear systems-both in terms of scalability and overall performance. It is a well known fact that dense matrix factorization scales well and can be implemented efficiently on parallel computers. In this paper, we present the first algorithms to factor a wide class of sparse matrices (including those arising from two- and three-dimensional finite element problems) that are asymptotically as scalable as dense matrix factorization algorithms on a variety of parallel architectures. Our algorithms incur less communication overhead and are more scalable than any previously known parallel formulation of sparse matrix factorization. Although, in this paper, we discuss Cholesky factorization of symmetric positive definite matrices, the algorithms can be adapted for solving sparse linear least squares problems and for Gaussian elimination of diagonally dominant matrices that are almost symmetric in structure. An implementation of one of our sparse Cholesky factorization algorithms delivers up to 20 GFlops on a Gray T3D for medium-size structural engineering and linear programming problems. To the best of our knowledge, this is the highest performance ever obtained for sparse Cholesky factorization on any supercomputer 相似文献
10.
Coarse grain parallel codes for solving sparse systems of linear algebraic equations can be developed in several different ways. The following procedure is suitable for some parallel computers. A preliminary reordering of the matrix is first applied to move as many zero elements as possible to the lower left corner. After that the matrix is partitioned into large blocks and the blocks in the lower left corner contain only zero elements. An attempt to obtain a good load-balance is carried out by allowing the diagonal blocks to be rectangular. While the algorithm based on the above ideas has good parallel properties, some stability problems may arise during the factorization because the pivotal search is restricted to the diagonal blocks. A simple a priori procedure has been used in a previous version in an attempt to stabilize the algorithm. In this paper it is shown that three enhanced stability devices can successfully be incorporated in the algorithm so that it is further stabilized and, moreover, the parallel properties of the original algorithm are preserved. The first device is based on a dynamic check of the stability. In the second device a slightly modified reordering is used in an attempt to get more nonzero elements in the diagonal blocks (the number of candidates for pivots tends to increase in this situation and, therefore, there is a better chance to select more stable pivots). The third device applies a P5-like ordering as a secondary criterion in the basic reordering procedure. This tends to improve the reordering and the performance of the solver. Moreover, the device is stable, while the original P5 ordering is often unstable. Numerical results obtained by using the three new devices are presented. The well-known sparse matrices from the Harwell-Boeing set are used in the experiments. 相似文献
11.
In this paper a numerical algorithm for the solution of the multi-dimensional steady Euler equations in conservative and non-conservative form is presented. Most existing standard and multi-dimensional schemes use flux balances with assumed constant distribution of variables along each cell edge, which interfaces two grid cells. This assumption is believed to be one of the main reasons for the limited advantage gained from multi-dimensional high order discretisations compared to standard one-dimensional ones. The present algorithm is based on the optimisation of polynomials describing the distribution of flow variables in grid cells, where only polynomials that satisfy the Euler equations in the entire grid cell can be selected. The global solution is achieved if all polynomials and by that the flow variables are continuous along edges interfacing neighbouring grid cells. A discrete approximation of a given spatial order is converged if the deviation between polynomial distributions of adjacent grid cells along the interfacing edge of the cells is minimal. Results from the present scheme between first and fifth order spatial accuracy are compared to standard first and second order Roe computations for simple test cases demonstrating the gain in accuracy for a number of sub- and supersonic flow problems. 相似文献
12.
The ODEs describing a chemical kinetics system can be very stiff and are the most computationally costly part of most reactive flow simulations. Research areas ranging from combustion to climate modeling are often limited by their ability to solve these chemical ODE systems both accurately and efficiently. These problems are commonly treated with an implicit numerical method due to the stiffness that is usually present. The implicit solution technique introduces a large amount of computational overhead necessary to solve the nonlinear algebraic system derived from the implicit time-stepping method. In this paper, a code is presented that avoids much of the usual overhead by preconditioning the implicit method with an iterative technique. This results in a class of time-stepping method that is explicit and very stable for chemical kinetics problems. 相似文献
13.
Efficient and reliable numerical techniques of high-order accuracy are presented for solving problems for nonlinear perturbed biharmonic equations. The method is widely applicable, e.g. to problems of elasticity and in fluid mechanics. Here it is used to obtain accurate solutions for the driven cavity applying fourth order approximation on all convective terms. Solutions are calculated up to Reynolds number 30000. 相似文献
14.
Summary The notion of strong or adjoint stability for linear ordinary differential equations is generalized to the theory of Volterra integral equations. It is found that this generalization is not unique in that equivalent definitions for differential equations lead to different stabilities for integral equations in general. Three types of stabilities arising naturally are introduced: strong stability, adjoint stability, and uniform adjoint stability. Necessary and sufficient conditions relative to the fundamental matrix for these stabilities are proved. Some lemmas dealing with non-oscillation of solutions and a semi-group property of the fundamental matrix are also given. 相似文献
15.
The shallow water equations (SWE), which describe the flow of a thin layer of fluid in two dimensions have been used by the atmospheric modelling community as a vehicle for testing promising numerical methods for solving atmospheric and oceanic problems. The SWE are important for the study of the dynamics of large-scale flows, as well for the development of new numerical schemes that are applied to more complex models. In this paper we present a finite difference p-adaptive method based on high order finite differences that is applied using an error indicator for solving the SWE on the sphere. A standard test set is used to evaluate the accuracy of the new method. The results obtained are compared with the pseudo-spectral method. 相似文献
16.
A three-stage Runge-Kutta (RK) scheme with multigrid and an implicit preconditioner has been shown to be an effective solver for the fluid dynamic equations. Using the algebraic turbulence model of Baldwin and Lomax, this scheme has been used to solve the compressible Reynolds-averaged Navier–Stokes (RANS) equations for transonic and low-speed flows. In this paper we focus on the convergence of the RK/Implicit scheme when the effects of turbulence are represented by the one-equation model of Spalart and Allmaras. With the present scheme the RANS equations and the partial differential equation of the turbulence model are solved in a loosely coupled manner. This approach allows the convergence behavior of each system to be examined. Point symmetric Gauss-Seidel supplemented with local line relaxation is used to approximate the inverse of the implicit operator of the RANS solver. To solve the turbulence equation we consider three alternative methods: diagonally dominant alternating direction implicit (DDADI), symmetric line Gauss-Seidel (SLGS), and a two-stage RK scheme with implicit preconditioning. Computational results are presented for airfoil flows, and comparisons are made with experimental data. We demonstrate that the two-dimensional RANS equations and a transport-type equation for turbulence modeling can be efficiently solved with an indirectly coupled algorithm that uses RK/Implicit schemes. 相似文献
17.
A method is described for solving a system of linear algebraic equations which is almost tridiagonal. Numerical results are presented for a number of test problems and some comparisons are made with the results obtained from algorithms proposed by other authors. A possible extension of the technique is briefly outlined. 相似文献
18.
A code for the direct numerical simulation (DNS) of incompressible flows with one periodic direction has been developed. It provides a fairly good performance on both Beowulf clusters and supercomputers. Since the code is fully explicit, from a parallel point-of-view, the main bottleneck is the Poisson equation. To solve it, a Fourier diagonalization is applied in the periodic direction to decompose the original 3D system into a set of mutually independent 2D systems. Then, different strategies can be used to solved them. In the previous version of the code, that was conceived for low-cost PC clusters with poor network performance, a Direct Schur-complement Decomposition (DSD) algorithm was used to solve them. Such a method, that is very efficient for PC clusters, cannot be used with an arbitrarily large number of processors and mesh sizes, mainly due to the RAM memory requirements. To do so, a new version of the solver is presented in this paper. It is based on the DSD algorithm that is used as a preconditioner for a Conjugate Gradient method. Numerical experiments showing the scalability and the flexibility of the method on both the MareNostrum supercomputer and a PC cluster with a conventional 100 Mbits/s network are presented and discussed. Finally, illustrative DNS results of an air-filled differentially heated cavity at Ra = 10 11 are also presented. 相似文献
19.
Computing and Visualization in Science - This article deals with the nonlinear eigenvalue problem originating from the finite element discretization of mechanical structures involving linear... 相似文献
|