期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Specialized Parallel Algorithms for Solving Lyapunov and Stein Equations

《Journal of Parallel and Distributed Computing》2001,61(10):1489-1504

Lyapunov and Stein matrix equations arise in many important analysis and synthesis applications in control theory. The traditional approach to solving these equations relies on the QR algorithm which is notoriously difficult to parallelize. We investigate iterative solvers based on the matrix sign function and the squared Smith iteration which are highly efficient on parallel distributed computers. We also show that by coding using the Parallel Linear Algebra Package (PLAPACK) it is possible to exploit the structure in the matrices and reduce the cost of these solvers. While the performance improvements due to the optimizations are modest, so is the coding effort. One of the optimizations, the updating of a QR factorization, has important applications elsewhere, e.g., in applications requiring the solution of a linear least-squares problem when the linear system is periodically updated. The experimental results on a Cray T3E attest to the high efficiency of these parallel solvers. 相似文献

2.

On the iterative refinement of the solution of ill-conditioned linear system of equations

Fatemeh Panjeh Ali Beik Salman Ahmadi-Asl Arezo Ameri 《国际计算机数学杂志》2018,95(2):427-443

Recently, Salkuyeh and Fahim [A new iterative refinement of the solution of ill-conditioned linear system of equations, Int. Comput. Math. 88(5) (2011), pp. 950–956] have proposed a two-step iterative refinement of the solution of an ill-conditioned linear system of equations. In this paper, we first present a generalized two-step iterative refinement procedure to solve ill-conditioned linear system of equations and study its convergence properties. Afterward, it is shown that the idea of an orthogonal projection technique together with a basic stationary iterative method can be utilized to construct a new efficient and neat hybrid algorithm for solving the mentioned problem. The convergence of the offered hybrid approach is also established. Numerical examples are examined to demonstrate the feasibility of proposed algorithms and their superiority to some of existing approaches for solving ill-conditioned linear system of equations. 相似文献

3.

Parallel multilevel solution of nonlinear shell structures

《Computer Methods in Applied Mechanics and Engineering》2005,194(21-24):2513-2533

The analysis of large-scale nonlinear shell problems asks for parallel simulation approaches. One crucial part of efficient and well scalable parallel FE-simulations is the solver for the system of equations. Due to the inherent suitability for parallelization one is very much directed towards preconditioned iterative solvers. However thin-walled-structures discretized by finite elements lead to ill-conditioned system matrices and therefore performance of iterative solvers is generally poor. This situation further deteriorates when the thickness change of the shell is taken into account. A preconditioner for this challenging class of problems is presented combining two approaches in a parallel framework. The first approach is a mechanically motivated improvement called ‘scaled director conditioning’ (SDC) and is able to remove the extra-ill conditioning that appears with three-dimensional shell formulations as compared to formulations that neglect thickness change of the shell. It is introduced at the element level and harmonizes well with the second approach utilizing a multilevel algorithm. Here a hierarchy of coarse grids is generated in a semi-algebraic sense using an aggregation concept. Thereby the complicated and expensive explicit generation of course triangulations can be avoided. The formulation of this combined preconditioning approach is given and the effects on the performance of iterative solvers is demonstrated via numerical examples. 相似文献

4.

A Matrix-free Two-grid Preconditioner for Solving Boundary Integral Equations in Electromagnetism

B. Carpentieri 《Computing》2006,77(3):275-296

In this paper, we describe a matrix-free iterative algorithm based on the GMRES method for solving electromagnetic scattering problems expressed in an integral formulation. Integral methods are an interesting alternative to differential equation solvers for this problem class since they do not require absorbing boundary conditions and they mesh only the surface of the radiating object giving rise to dense and smaller linear systems of equations. However, in realistic applications the discretized systems can be very large and for some integral formulations, like the popular Electric Field Integral Equation, they become ill-conditioned when the frequency increases. This means that iterative Krylov solvers have to be combined with fast methods for the matrix-vector products and robust preconditioning to be affordable in terms of CPU time. In this work we describe a matrix-free two-grid preconditioner for the GMRES solver combined with the Fast Multipole Method. The preconditioner is an algebraic two-grid cycle built on top of a sparse approximate inverse that is used as smoother, while the grid transfer operators are defined using spectral information of the preconditioned matrix. Experiments on a set of linear systems arising from real radar cross section calculation in industry illustrate the potential of the proposed approach for solving large-scale problems in electromagnetism. 相似文献

5.

Non-stationary iterative solvers on a PC cluster

《Advances in Engineering Software》2005,36(6):393-400

In this study, we introduce cost effective strategies and algorithms for parallelizing the Krylov subspace based non-stationary iterative solvers such as Bi-CGM and Bi-CGSTAB for distributed computing on a cluster of PCs using ANULIB message passing libraries. We investigate the effectiveness of the parallel solvers on the linear systems resulting in numerical solution of some 2D and 3D nonlinear partial differential equations governing heat convection process by finite element, finite difference and wavelet based numerical schemes. Largely Bi-CGM is found to give better performance measured in terms of speedup factors. 相似文献

6.

Static load balancing applied to Schur complement method

Ond?ej Medek Jaroslav Kruis 《Computers & Structures》2007,85(9):489-498

A finite element method often leads to large sparse symmetric and positive definite systems of linear equations. We consider parallel solvers based on the Schur complement method on homogeneous parallel machines with distributed memory. A finite element mesh is partitioned by graph partitioning. Such partitioning results in submeshes with similar numbers of elements and, consequently, submatrices of similar sizes. The submatrices are partially factorised. The time spent on the partial factorisation can be different, i.e., disbalanced, because methods exploiting the sparsity of submatrices are used. This paper proposes a Quality Balancing heuristic that modifies classic mesh partitioning so that the partial factorisation times are balanced, which saves overall computation time, especially for time dependent mechanical and nonstationary transport problems. 相似文献

7.

Sparse iterative algorithm software for large-scale MIMD machines: An initial discussion and implementation

John N. Shadid Ray S. Tuminaro 《Concurrency and Computation》1992,4(6):481-497

The parallelization of sophisticated applications has dramatically increased in recent years. As machine capabilities rise, greater emphasis on modeling complex phenomena can be expected. Many of these applications require the solution of large sparse matrix equations which approximate systems of partial differential equations (PDEs). Therefore we consider parallel iterative solvers for large sparse non-symmetric systems and issues related to parallel sparse matrix software. We describe a collection of parallel iterative solvers which use a distributed sparse matrix format that facilitates the interface between specific applications and a variety of Krylov subspace techniques and multigrid methods. These methods have been used to solve a number of linear and non-linear PDE problems on a 1024-processor NCUBE 2 hypercube. Over 1 Gflop sustained computation rates are achieved with many of these solvers, demonstrating that high performance can be attained even when using sparse matrix data structures. 相似文献

8.

On the parallel iterative solution of linear systems arising in the FEAST algorithm for computing inner eigenvalues

《Parallel Computing》2015

Methods for the solution of sparse eigenvalue problems that are based on spectral projectors and contour integration have recently attracted more and more attention. Such methods require the solution of many shifted sparse linear systems of full size. In most of the literature concerning these eigenvalue solvers, only few words are said on the solution of the linear systems, but they turn out to be very hard to solve by iterative linear solvers in practice. In this work we identify a row projection method for the solution of the inner linear systems encountered in the FEAST algorithm and introduce a novel hybrid parallel and fully iterative implementation of the eigenvalue solver. Our approach ultimately aims at achieving extreme parallelism by exploiting the algorithm’s potential on several levels. We present numerical examples where graphene modeling is one of the target applications. In this application, several hundred or even thousands of eigenvalues from the interior of the spectrum are required, which is a big challenge for state-of-the-art numerical methods. 相似文献

9.

Parallel methods for optimality criteria-based topology optimization

《Computer Methods in Applied Mechanics and Engineering》2005,194(34-35):3637-3667

Topology optimization problems require the repeated solution of finite element problems that are often extremely ill-conditioned due to highly heterogeneous material distributions. This makes the use of iterative linear solvers inefficient unless appropriate preconditioning is used. Even then, the solution time for topology optimization problems is typically very high. These problems are addressed by considering the use of non-overlapping domain decomposition-based parallel methods for the solution of topology optimization problems. The parallel algorithms presented here are based on the solid isotropic material with penalization (SIMP) formulation of the topology optimization problem and use the optimality criteria method for iterative optimization. We consider three parallel linear solvers to solve the equilibrium problem at each step of the iterative optimization procedure. These include two preconditioned conjugate gradient (PCG) methods: one using a diagonal preconditioner and one using an incomplete LU factorization preconditioner with a drop tolerance. A third substructuring solver that employs a hybrid of direct and iterative (PCG) techniques is also studied. This solver is found to be the most effective of the three solvers studied, both in terms of parallel efficiency and in terms of its ability to mitigate the effects of ill-conditioning. In addition to examining parallel linear solvers, we consider the parallelization of the iterative optimality criteria method. To tackle checkerboarding and mesh dependence, we propose a multi-pass filtering technique that limits the number of “ghost” elements that need to be exchanged across interprocessor boundaries. 相似文献

10.

A Decoupled Preconditioning Technique for a Mixed Stokes–Darcy Model

Antonio Márquez Salim Meddahi Francisco-Javier Sayas 《Journal of scientific computing》2013,57(1):174-192

We propose an efficient iterative method to solve the mixed Stokes–Darcy model for coupling fluid and porous media flow. The weak formulation of this problem leads to a coupled, indefinite, ill-conditioned and symmetric linear system of equations. We apply a decoupled preconditioning technique requiring only good solvers for the local mixed-Darcy and Stokes subproblems. We prove that the method is asymptotically optimal and confirm, with numerical experiments, that the performance of the preconditioners does not deteriorate on arbitrarily fine meshes. 相似文献

11.

An Efficient Parallel Algorithm to Solve Block–Toeplitz Systems

P.?Alonso Email author J.?M.?Badía A.?M.?Vidal 《The Journal of supercomputing》2005,32(3):251-278

In this paper, we present an efficient parallel algorithm to solve Toeplitz–block and block–Toeplitz systems in distributed memory multicomputers. This algorithm parallelizes the Generalized Schur Algorithm to obtain the semi-normal equations. Our parallel implementation reduces the communication cost and optimizes the memory access. The experimental analysis on a cluster of personal computers shows the scalability of the implementation. The algorithm is portable because it is based on standard tools and libraries, such as ScaLAPACK and MPI. 相似文献

12.

Reducing latency cost in 2D sparse matrix partitioning models

《Parallel Computing》2016

Sparse matrix partitioning is a common technique used for improving performance of parallel linear iterative solvers. Compared to solvers used for symmetric linear systems, solvers for nonsymmetric systems offer more potential for addressing different multiple communication metrics due to the flexibility of adopting different partitions on the input and output vectors of sparse matrix-vector multiplication operations. In this regard, there exist works based on one-dimensional (1D) and two-dimensional (2D) fine-grain partitioning models that effectively address both bandwidth and latency costs in nonsymmetric solvers. In this work, we propose two new models based on 2D checkerboard and jagged partitioning. These models aim at minimizing total message count while maintaining a balance on communication volume loads of processors; hence, they address both bandwidth and latency costs. We evaluate all partitioning models on two nonsymmetric system solvers implemented using the widely adopted PETSc toolkit and conduct extensive experiments using these solvers on a modern system (a BlueGene/Q machine) successfully scaling them up to 8K processors. Along with the proposed models, we put practical aspects of eight evaluated models (two 1D- and six 2D-based) under thorough analysis. To the best of our knowledge, this is the first work that analyzes practical performance of 2D models on this scale. Among evaluated models, the models that rely on 2D jagged partitioning obtain the most promising results by striking a balance between minimizing bandwidth and latency costs. 相似文献

13.

Developments and trends in the parallel solution of linear systems

《Parallel Computing》1999,25(13-14):1931-1970

In this review paper, we consider some important developments and trends in algorithm design for the solution of linear systems concentrating on aspects that involve the exploitation of parallelism. We briefly discuss the solution of dense linear systems, before studying the solution of sparse equations by direct and iterative methods. We consider preconditioning techniques for iterative solvers and discuss some of the present research issues in this field. 相似文献

14.

Feasibility refinement in sequential quadratic programming using parametric sensitivity analysis

Sören Geffken Christof Büskens 《Optimization methods & software》2017,32(4):754-769

In this paper we present results that extend the sequential quadratic programming (SQP) algorithm with an additional feasibility refinement based on parametric sensitivity derivatives. The refinement is applicable without restriction on the problem dimensions in sparse SQP solvers. Parametric sensitivity analysis is a tool for post optimality analysis of the solution of a nonlinear optimization problem. For the refinement approach we apply this technique on the quadratic subproblems in order to improve the overall algorithm. The sensitivity derivatives required for this approach can be computed without noticeable computational effort as the system of linear equations to be solved coincides with the system already solved for the search direction computation. For similar algorithms in the context of post optimality analysis a linear rate of convergence has been proven and therefore an extrapolation method is applied to speed up the process. The presented algorithm has been integrated into the nonlinear program (NLP) solver WORHP and we perform a numerical study to evaluate different termination criteria for the proposed algorithm. Furthermore, numerical results on the CUTEst test set are shown. 相似文献

15.

A survey of recent developments in parallel implementations of Gaussian elimination

Simplice Donfack Jack Dongarra Mathieu Faverge Mark Gates Jakub Kurzak Piotr Luszczek Ichitaro Yamazaki 《Concurrency and Computation》2015,27(5):1292-1309

Gaussian elimination is a canonical linear algebra procedure for solving linear systems of equations. In the last few years, the algorithm has received a lot of attention in an attempt to improve its parallel performance. This article surveys recent developments in parallel implementations of Gaussian elimination for shared memory architecture. Five different flavors are investigated. Three of them are based on different strategies for pivoting: partial pivoting, incremental pivoting, and tournament pivoting. The fourth one replaces pivoting with the Partial Random Butterfly Transformation, and finally, an implementation without pivoting is used as a performance baseline. The technique of iterative refinement is applied to recover numerical accuracy when necessary. All parallel implementations are produced using dynamic, superscalar, runtime scheduling and tile matrix layout. Results on two multisocket multicore systems are presented. Performance and numerical accuracy is analyzed. Copyright © 2014 John Wiley & Sons, Ltd. 相似文献

16.

Towards fully mesh adaptive FE-simulations in 3D using multi-grid solver

《Computers & Structures》2003,81(8-11):735-746

The four cornerstones of efficient simulation in engineering practice are efficient and robust finite elements, solution of systems of linear equations, accuracy assessment (AA) and mesh generation and adaptation (MGA). This paper describes our current work in these fields as implemented in the in-house program FEM90. Special attention is given to indefinite systems due to constraint equations used to connect regions with non-conforming meshes, since this situation is commonly encountered in engineering structures. This type of system is especially problematic for iterative solvers, but the paper presents a new pre-conditioner that carry over the fast convergence of the pre-conditioned conjugate gradient method also to indefinite systems. Indeed, comparison tests of the direct and iterative solvers on realistic examples with up to 1,000,000 equations show that the iterative multi-grid solver has optimal performance and is superior both in terms of memory and time on large 3D problems. To solve even larger problems with required accuracy, the paper also implements and tests AA and MGA in 3D. The combination of the FEM90 and Ideas programs provides automatic refinement of both brick and tetrahedron meshes. 相似文献

17.

Preconditioning for a Class of Spectral Differentiation Matrices

Weiming?Cao Ronald?D.?Haynes Manfred?R.?Trummer Email author 《Journal of scientific computing》2005,24(3):343-371

We propose an efficient preconditioning technique for the numerical solution of first-order partial differential equations (PDEs). This study has been motivated by the computation of an invariant torus of a system of ordinary differential equations. We find the torus by discretizing a nonlinear first-order PDE with a full two-dimensional Fourier spectral method and by applying Newton’s method. This leads to large nonsymmetric linear algebraic systems. The sparsity pattern of these systems makes the use of direct solvers prohibitively expensive. Commonly used iterative methods, e.g., GMRes, BiCGStab and CGNR (Conjugate Gradient applied to the normal equations), are quite slow to converge. Our preconditioner is derived from the solution of a PDE with constant coefficients; it has a fast implementation based on the Fast Fourier Transform (FFT). It effectively increases the clustering of the spectrum, and speeds up convergence significantly. We demonstrate the performance of the preconditioner in a number of linear PDEs and the nonlinear PDE arising from the Van der Pol oscillator 相似文献

18.

Threaded Runtime Support for Execution of Fine Grain Parallel Code on Coarse Grain Multiprocessors

Richard Neves Robert B. Schnabel 《Journal of Parallel and Distributed Computing》1997,42(2):128

The goal of this research is to provide systems support that allows fine grain, data parallel code to execute efficiently on much coarser grain multiprocessors. The task of writing parallel applications is simplified by allowing the programmer to assume a number of processors convenient to the algorithm being implemented. This paper describes and evaluates a runtime approach that efficiently manages thousands of virtual processors per actual processor. The limits in using user-level threads as fine grain virtual processors are identified. Key techniques used are tight integration and specialization of scheduling, communication, optimized context switching, and fine-tuned stack management. A prototype of this runtime approach is evaluated by comparing implementations of three problems, a smoothing kernel of a thin-layer Navier–Stokes code, a five point stencil problem, and a block bordered system of linear equations on an Intel Paragon multiprocessor and on a network of DEC Alpha workstations. The additional cost relative to an efficient manually contracted code can be as low as 15% for granularities of 50 floating point operations per virtual processor and is typically 5–20% for granularities of about 100 floating point operations per virtual processor. The overhead is analyzed in detail to show the costs of scheduling, communication, context switching, reduced memory performance, and insuring data consistency. The implementation and analysis indicate that fine grain code can be efficiently executed on a coarse grain multiprocessor using very lightweight, specialized threads. 相似文献

19.

基于PVM的线性方程组的一种网上并行迭代算法 总被引：1，自引：0，他引：1

尚月强杨一都《计算机应用与软件》2006,23(11):50-51

针对基于PVM的桌面PC机联网而成的网络并行计算环境中,处理机的运算速度较快,而处理机间的通信相对较慢的实际情况,提出了求解线性方程组的一种分组Guass-Seidel并行迭代算法,该算法将线性方程组的增广矩阵按行分块储存在各处理机,每台处理机分别对各自的块采用Guass-Seidel迭代法进行迭代计算,其处理机间的通信较少,实现容易。并用1～24台桌面PC机联成的局域网,在PVM 3．4 on Windows2000,VC 6．0并行计算平台上编程对该算法进行了数值试验,试验结果表明,该算法较传统的Jacobi并行迭代算法和传统的Guass—Seidel并行迭代算法更优越。相似文献

20.

A new iterative refinement of the solution of ill-conditioned linear system of equations

《国际计算机数学杂志》2012,89(5):950-956

In this paper, a new iterative refinement of the solution of an ill-conditioned linear system of equations are given. The convergence properties of the method are studied. Some numerical experiments of the method are given and compared with that of two of the available methods. 相似文献