共查询到20条相似文献,搜索用时 15 毫秒
1.
For the solution of sparse linear systems from circuit simulation whose coefficient matrices include a few dense rows and columns, a parallel iterative algorithm with distributed Schur complement preconditioning is presented. The parallel efficiency of the solver is increased by transforming the equation system into a problem without dense rows and columns as well as by exploitation of parallel graph partitioning methods. The costs of local, incomplete LU decompositions are decreased by fill-in reducing reordering methods of the matrix and a threshold strategy for the factorization. The efficiency of the parallel solver is demonstrated with real circuit simulation problems on PC clusters. 相似文献
2.
Arguello F. Bruguera J.D. Doallo R. Zapata E.L. 《Parallel and Distributed Systems, IEEE Transactions on》1994,5(10):1091-1099
We present an unified parallel architecture for four of the most important fast orthogonal transforms with trigonometric kernel: Complex Valued Fourier (CFFT), Real Valued Fourier (RFFT), Hartley (FHT), and Cosine (FCT). Out of these, only the CFFT has a data flow coinciding with the one generated by the successive doubling method, which can be transformed on a constant geometry flow using perfect unshuffle or shuffle permutations. The other three require some type of hardware modification to guarantee the constant geometry of the successive doubling method. We have defined a generalized processing section (PS), based on a circular CORDIC rotator, for the four transforms. This PS section permits the evaluation of the CFFT and FCT transforms in n data recirculations and the RFFT and FHT transforms in n-1 data recirculations, with n being the number of stages of a transform of length N=rn. Also, the efficiency of the partitioned parallel architecture is optimum because there is no cycle loss in the systolic computation of all the butterflies for each of the four transforms 相似文献
3.
V. Chugunov D. Svyatski E. Tyrtyshnikov Yu. Vassilevski 《Concurrency and Computation》2006,18(5):501-518
A combination of several contemporary techniques is used for the efficient parallel solution of the mixed finite element systems on locally refined Grids. Implementation experience and numerical results are reported. Copyright © 2005 John Wiley & Sons, Ltd. 相似文献
4.
We present the novel parallel linear least squares solvers ARPLS-IR and ARPLS-MPIR for dense overdetermined linear systems. All internode communication of our ARPLS solvers arises in the context of all-reduce operations across the parallel system and therefore they benefit directly from efficient implementations of such operations. Our approach is based on the semi-normal equations, which are in general not backward stable. However, the method is stabilised by using iterative refinement. We show that performing iterative refinement in mixed precision also increases the parallel performance of the algorithm. We consider different variants of the ARPLS algorithm depending on the conditioning of the problem and in this context also evaluate the method of normal equations with iterative refinement. For ill-conditioned systems, we demonstrate that the semi-normal equations with standard iterative refinement achieve the best accuracy compared to other parallel solvers.We discuss the conceptual advantages of ARPLS-IR and ARPLS-MPIR over alternative parallel approaches based on QR factorisation or the normal equations. Moreover, we analytically compare the communication cost to an approach based on communication-avoiding QR factorisation. Numerical experiments on a high performance cluster illustrate speed-ups up to 3820 on 2048 cores for ill-conditioned tall and skinny matrices over state-of-the-art solvers from DPLASMA or ScaLAPACK. 相似文献
5.
Solving large, sparse, linear systems of equations is a fundamental problems in large scale scientific and engineering computation. A model of a general class of asynchronous, iterative solution methods for linear systems is developed. In the model, the system is solved by creating several cooperating tasks that each compute a portion of the solution vector. A data transfer model predicting both the probability that data must be transferred between two tasks and the amount of data to be transferred is presented. This model is used to derive an execution time model for predicting parallel execution time and an optimal number of tasks given the dimension and sparsity of the coefficient matrix and the costs of computation, synchronization, and communication.The suitability of different parallel architectures for solving randomly sparse linear systems is discussed. Based on the complexity of task scheduling, one parallel architecture, based on a broadcast bus, is presented and analyzed. 相似文献
6.
In this paper, we are concerned with a generalized Gauss-Seidel approach to sparse linear least-squares problems. Two algorithms, related to those given by Schechter (1959), for the solution of linear systems are presented and their parallel implementation is discussed. In these procedures, which can be viewed as an alternative ordering of the variables in the SOR methods, the variables are divided into nondisjoint groups. Numerical results, obtained on CRAY X-MP/48, are presented and discussed. 相似文献
7.
The Journal of Supercomputing - Numerical simulation of multi-physical processes requires a lot of processor time, especially when solving ill-conditional linear systems arising in fluid dynamics... 相似文献
8.
Sami A. Kilic 《Computers & Structures》2004,82(28):2363-2375
Direct solvers are commonly used in implicit finite element codes for structural dynamics problems. This study explores an alternative approach to solving the resulting linear systems by using preconditioned iterative schemes such as the preconditioned conjugate gradient algorithm and its variants. Preconditioners used in this study include approximate Cholesky factorization, block Jacobi, and the symmetric Gauss-Seidel over-relaxation scheme. The effects of various preconditioners and ordering schemes on the solution time and required storage are investigated. Performance results of these iterative solver are compared against the MA57 direct solver routine of the commercial Harwell Software Library. 相似文献
9.
10.
Parallel iterative solvers for finite-element methods using an OpenMP/MPI hybrid programming model on the Earth Simulator 总被引:1,自引:0,他引:1
Kengo Nakajima 《Parallel Computing》2005,31(10-12):1048
An efficient parallel iterative method for finite-element method has been developed for symmetric multiprocessor (SMP) cluster architectures with vector processors such as the Earth Simulator. The method is based on a three-level hybrid parallel programming model, including message passing for inter-SMP node communication, loop directives by OpenMP for intra-SMP node parallelization and vectorization for each processing element (PE). Simple 3D linear elastic problems with more than 2.2 × 109 DOF have been solved using 3 × 3 block ICCG(0) method with additive Schwarz domain decomposition and PDJDS/CM-RCM reordering on 176 nodes of the Earth Simulator, achieving performance of 3.80 TFLOPS. Furthermore, effect of color number in reordering has been evaluated on various types of computers. 相似文献
11.
Recent research on using the preconditioned conjugate gradient method as an iterative method for solving Toeplitz systems
has brought much attention. One of the main important results of this methodology is that the complexity of solving a large
class of Toeplitz systems can be reduced toO (n logn) operations as compared to theO(n log2
n) operations required by fast direct Toeplitz solvers, provided that a suitable preconditioner is chosen under certain conditions
on the Toeplitz operator. In this paper, we survery some applications of iterative Toeplitz solvers to Toeplitz-related problems
arising from scientific applications. These applications include partial differential equations, queueing networks, signal
and image processing, integral equations, and time series analysis.
Research supported by the Cooperative Research Centre for Advanced Computational Systems.
Research supported in part by HKRGC grants no. CUHK 316/94E. 相似文献
12.
《国际计算机数学杂志》2012,89(4):497-513
We develop a novel preconditioning strategy, based on a non-standard, Discrete Wavelet Transform (DWT), for the dense, non-symmetric, linear systems that must be solved when Newton's method is used in the solution of Elastohydrodynamic Lubrication (EHL) problems. Simple band preconditioners and sparse preconditioners based on standard DWT have been found to be of limited value for EHL problems, since they may be singular, give poor convergence or be expensive to apply. We present algorithms for preconditioner design based on detecting non-smooth diagonal bands within an otherwise smooth matrix and applying a non-standard DWT to compress the part of the matrix away from the band. We illustrate, by numerical examples, the improvements that can be made when our methods are used. 相似文献
13.
A method is described to solve the systems of tridiagonal linear equations that result from discrete approximations of the Poisson or Helmholtz equation with either periodic, Dirichlet, Neumann, or shear-periodic boundary conditions. The problem is partitioned into a set of smaller Dirichlet problems which can be solved simultaneously on parallel or vector computers leaving a smaller tridiagonal system to be solved on one of the processors. 相似文献
14.
Rafael Mayo Enrique S. Quintana-Ortí Gregorio Quintana-Ortí Vicente Hernández 《Concurrency and Computation》2001,13(2):153-162
We investigate the numerical solution of discrete-time algebraic Riccati equations on a parallel distributed architecture. Our solvers obtain an initial solution of the Riccati equation via the disc function method, and then refine this solution using Newton's method. The Smith iteration is employed to solve the Stein equation that arises at each step of Newton's method. The numerical experiments on an Intel Pentium-II cluster, connected via a Myrinet switch, report the performance and scalability of the new algorithms. Copyright © 2001 John Wiley & Sons, Ltd. 相似文献
15.
Robert Ian Mackie 《Computers & Structures》2008,86(6):511-519
An object-oriented approach is used to develop classes and frameworks for the implementation of distributed iterative equation solution. The software is implemented using the .NET framework, and builds upon previous work by the author. Development of the framework for iterative solution makes good use of interfaces to isolate sources of complexity. The framework is used for three different solution scenarios (i) conjugate gradient iteration on a single matrix; (ii) conjugate gradient iteration when domain decomposition is used; and (iii) using the Schur complement approach. Moreover, the framework is used for both local and remote objects. The .NET framework makes it very straightforward to program distributed applications, and the object-oriented approach greatly facilitates the software development. The framework was used in a finite element program and the speed-up results are shown. 相似文献
16.
Rami Melhem 《Parallel Computing》1987,4(3):339-343
Any factorization/back substitution scheme for the solution of linear systems consists of two phases which are different in nature, and hence may be inefficient for parallel implementation on a single computational network. The Gauss-Jordan elimination scheme unifies the nature of the two phases of the solution process and thus seems to be more suitable for parallel architectures, especially if reconfiguration of the communication pattern is not permitted. In this communication, a computational network for the Gauss-Jordan algorithm is presented. This network compares favorably with optimal implementations of the Gauss elimination/back substitution algorithm. 相似文献
17.
DASPK solves large-scale systems of differential-algebraic equations. It is based on the integration method in DASSL, but instead of a direct method for the associated linear systems which arise at each time step, the preconditioned GMRES iteration is applied in combination with an inexact Newton method. Two parallel versions of DASPK have been developed: DASPKF90, a Fortran 90 data parallel implementation, and DASPKMP, a message-passing implementation written in Fortran 77 with extended BLAS. The parallel versions have been implemented for the Thinking Machines Corporation (TMC) CM-5, a massively parallel multiprocessor, keeping the user interface relatively simple while allowing for portability to other massively parallel architectures. The codes have been demonstrated on several large-scale test problems, including three-dimensional formulations of the heat equation, the Cahn-Hilliard equation and a multi-species reaction-diffusion problem. The formulations are described, including detail on preconditioning the Krylov iteration, timing results and performance analysis. 相似文献
18.
Sparse bordered matrix systems arise in certain classes of problems. An example is the Jacobian system for the finite difference or finite element approximation and continuation solution of nonlinear PDE's. We develop a partitioning scheme that can be exploited in residual-based iterative methods and permits easy implementation in existing software. The scheme has been examined for several iterative methods, and particularly the Lanczos algorithms which proves very effective for the representative nonlinear test problems examined. The scheme vectorizes well and performance studies on vectorized supercomputers are given. 相似文献
19.
A new method, namely, the parallel two-level hybrid (PTH) method, is developed to solve tridiagonal systems on parallel computers. PTH has two levels of parallelism. The first level is based on algorithms developed from the Sherman-Morrison modification formula, and the second level can choose different parallel tridiagonal solvers for different applications. By choosing different outer and inner solvers and by controlling its two-level partition, PTH can deliver better performance for different applications on different machine ensembles and problem sizes. In an extreme case, the two levels of parallelism can be merged into one, and PTH can be the best algorithm otherwise available. Theoretical analyses and numerical experiments indicate that PTH is significantly better than existing methods on massively parallel computers. For instance, using PTH in a fast Poisson solver results in a 2-folds speedup compared to a conventional parallel Poisson solver on a 512 nodes IBM machine. When only the tridiagonal solver is considered, PTH is over 10 times faster than the currently used implementation. 相似文献
20.
《国际计算机数学杂志》2012,89(3-4):355-369
For the solution of the linear system Ax=b many iterative methods based on a splitting of A exist. Among them the Jacobi, the Gauss-Seidel and the Successive Overrelaxation (SOR) methods as well as their extrapolated counterparts are the most popular. This paper presents a new general method such that the aforementioned methods become special cases of it. Besides its four degrees of freedom, which make it a very flexible method, another of its main characteristics is that it is well-defined even when some elements on the diagonal of A are zero. The first results concerning the new method show that a proper exploitation of its basic properties will make it a very powerful technique. 相似文献