首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A preconditioner for iterative solution of the interface problem in Schur Complement Domain Decomposition Methods is presented. This preconditioner is based on solving a global problem in a narrow strip around the interface. It requires much less memory and computing time than classical Neumann–Neumann preconditioner and its variants, and handles correctly the flux splitting among subdomains that share the interface. The aim of this work is to present a theoretical basis (regarding the behavior of Schur complement matrix spectra) and some simple numerical experiments conducted in a sequential environment as a motivation for adopting the proposed preconditioner. Efficiency, scalability, and implementation details on a production parallel finite element code [Sonzogni V, Yommi A, Nigro N, Storti M. A parallel finite element program on a Beowulf cluster. Adv Eng Software 2002;33(7–10):427–43; Storti M, Nigro N, Paz R, Dalcín L. PETSc-FEM: a general purpose, parallel, multi-physics FEM program, 1999–2006] can be found in works [Paz R, Storti M. An interface strip preconditioner for domain decomposition methods: application to hydrology. Int J Numer Methods Eng 2005;62(13):1873–94; Paz R, Nigro N, Storti M. On the efficiency and quality of numerical solutions in cfd problems using the interface strip preconditioner for domain decomposition methods. Int J Numer Methods Fluids, in press].  相似文献   

2.
Chi Shen  Jun Zhang   《Parallel Computing》2003,29(11-12):1685
We present a fully parallel algorithm for constructing block independent set for general sparse matrices in a distributed environment. The block independent set is used in the construction of parallel multilevel preconditioners in solving large sparse matrices on distributed memory parallel computers. We compare a few implementations of the parallel multilevel ILU preconditioners with different block independent set construction strategies. Numerical experiments indicate that the parallel block independent set algorithm is effective in reducing both the parallel multilevel preconditioner construction time and the size of the last level reduced system.  相似文献   

3.
We present a polynomial preconditioner that can be used with the conjugate gradient method to solve symmetric and positive definite systems of linear equations. Each step of the preconditioning is achieved by simultaneously taking an iteration of the SOR method and an iteration of the reverse SOR method (equations taken in reverse order) and averaging the results. This yields a symmetric preconditioner that can be implemented on parallel computers by performing the forward and reverse SOR iterations simultaneously. We give necessary and sufficient conditions for additive preconditioners to be positive definite.

We find an optimal parameter, ω, for the SOR-Additive linear stationary iterative method applied to 2-cyclic matrices. We show this method is asymptotically twice as fast as SSOR when the optimal ω is used.

We compare our preconditioner to the SSOR polynomial preconditioner for a model problem. With the optimal ω, our preconditioner was found to be as effective as the SSOR polynomial preconditioner in reducing the number of conjugate gradient iterations. Parallel implementations of both methods are discussed for vector and multiple processors. Results show that if the same number of processors are used for both preconditioners, the SSOR preconditioner is more effective. If twice as many processors are used for the SOR-Additive preconditioner, it becomes more efficient than the SSOR preconditioner when the number of equations assigned to a processor is small. These results are confirmed by the Blue Chip emulator at the University of Washington.  相似文献   


4.
We conduct simulations for the 3D unsteady state anisotropic diffusion process with DT-MRI data in the human brain by discretizing the governing diffusion equation on Cartesian grid and adopting a high performance differential-algebraic equation (DAE) solver, the parallel version of implicit differential-algebraic (IDA) solver, to tackle the resulting large scale system of DAEs. Parallel preconditioning techniques including sparse approximate inverse and banded-block-diagonal preconditioners are used with the GMRES method to accelerate the convergence rate of the iterative solution. We then investigate and compare the efficiency and effectiveness of the two parallel preconditioners. The experimental results of the diffusion simulations on a parallel supercomputer show that the sparse approximate inverse preconditioning strategy, which is robust and efficient with good scalability, gives a much better overall performance than the banded-block-diagonal preconditioner.  相似文献   

5.
An implicit time-linearized finite difference discretization of partial differential equations on regular/structured meshes results in an n-diagonal block system of algebraic equations, which is usually solved by means of the Preconditioned Conjugate Gradient (PCG) method. In this paper, an analysis of the parallel implementation of this method on several computer architectures and for several programming paradigms is presented. For three-dimensional regular/structured meshes, a new implementation of the PCG method with Jacobi preconditioner is proposed. For the computer architectures and number of processors employed in this study, it has been found that this implementation is more efficient than the standard one, and can be applied to narrow-band matrices and other preconditioners, such as, for example, polynomial ones.  相似文献   

6.
We combine the adaptive and multilevel approaches to the BDDC and formulate a method which allows an adaptive selection of constraints on each decomposition level. We also present a strategy for the solution of local eigenvalue problems in the adaptive algorithm using the LOBPCG method with a preconditioner based on standard components of the BDDC. The effectiveness of the method is illustrated on several engineering problems. It appears that the Adaptive-Multilevel BDDC algorithm is able to effectively detect troublesome parts on each decomposition level and improve convergence of the method. The developed open-source parallel implementation shows a good scalability as well as applicability to very large problems and core counts.  相似文献   

7.
We present a class of parallel preconditioning strategies utilizing multilevel block incomplete LU (ILU) factorization techniques to solve large sparse linear systems. The preconditioners are constructed by exploiting the concept of block independent sets (BISs). Two algorithms for constructing BISs of a sparse matrix in a distributed environment are proposed. We compare a few implementations of the parallel multilevel ILU preconditioners with different BIS construction strategies and different Schur complement preconditioning strategies. We also use some diagonal thresholding and perturbation strategies for the BIS construction and for the last level Schur complement ILU factorization. Numerical experiments indicate that our domain-based parallel multilevel block ILU preconditioners are robust and efficient.  相似文献   

8.
J. Schöberl 《Computing》1998,60(4):323-344
The finite element discretization of the Signorini Problem leads to a large scale constrained minimization problem. To improve the convergence rate of the projection method preconditioning must be developed. To be effective, the relative condition number of the system matrix with respect to the preconditioning matrix has to be small and the applications of the preconditioner as well as the projection onto the set of feasible elements have to be fast computable. In this paper, we show how to construct and analyze such preconditioners on the basis of domain decomposition techniques. The numerical results obtained for the Signorini problem as well as for contact problems in plane elasticity confirm the theoretical analysis quite well.  相似文献   

9.
In the first part of this article series, we had derived Domain Decomposition (DD) preconditioners containing three block matrices which must be specified for specific applications. In the present paper, we consider finite element equations arising from the DD discretization of plane, symmetric, 2nd-order, elliptic b.v.p.s and specify the matrices involved in the preconditioner via multigrid and hierarchical techniques. The resulting DD-PCCG methods are asymptotically almost optimal with respect to the operation count and well suited for parallel computations on MIMD computers with local memory and message passing. The numerical experiments performed on a transputer hypercube confirm the efficiency of the DD preconditioners proposed.  相似文献   

10.
块三对角矩阵局部块分解及其在预条件中的应用   总被引:3,自引:1,他引:3  
该文利用块三对阵角阵分解因子的估值分析了其局部依赖性,并用其构了一类不完全分解型预条件子,给出了五点差分矩阵预条件后的条件数估计,并比较了条件数估计值与实际值,表明了估计值的准确性与预备件的有效性,在具体实现时,考虑了预条件的6个串行实现方案并提出了一个有效的并行化方法,该并行算法具有通信量少的特点,最后在由4中微机通过高速以太网连成的机群系统上作了大量数值实验,并将其与其它较效的预条件方法进行了。结果表明该预条件方法效果较好,尤其适用于并行计算。  相似文献   

11.
In this work, we analyze the scalability of inexact two-level balancing domain decomposition by constraints (BDDC) preconditioners for Krylov subspace iterative solvers, when using a highly scalable asynchronous parallel implementation where fine and coarse correction computations are overlapped in time. This way, the coarse-grid problem can be fully overlapped by fine-grid computations (which are embarrassingly parallel) in a wide range of cases. Further, we consider inexact solvers to reduce the computational cost/complexity and memory consumption of coarse and local problems and boost the scalability of the solver. Out of our numerical experimentation, we conclude that the BDDC preconditioner is quite insensitive to inexact solvers. In particular, one cycle of algebraic multigrid (AMG) is enough to attain algorithmic scalability. Further, the clear reduction of computing time and memory requirements of inexact solvers compared to sparse direct ones makes possible to scale far beyond state-of-the-art BDDC implementations. Excellent weak scalability results have been obtained with the proposed inexact/overlapped implementation of the two-level BDDC preconditioner, up to 93,312 cores and 20 billion unknowns on JUQUEEN. Further, we have also applied the proposed setting to unstructured meshes and partitions for the pressure Poisson solver in the backward-facing step benchmark domain.  相似文献   

12.
This paper presents parallel computational strategies to implement explicit nonlinear finite element analysis code onto distributed memory parallel computers for solving large-scale problems in structural dynamics. Implementation details on both homogeneous and heterogeneous parallel processing environments are considered in detail in this paper. Implementation of an explicit nonlinear finite element dynamic analysis code on homogeneous systems is discussed first and this is later moved onto heterogeneous systems. Domain decomposition with explicit message passing is preferred for parallel implementation. The message passing implementation in the parallel algorithm is based on MPI (Message Passing Interface) libraries. Implementation aspects of overlapped, non-overlapped domain decomposition techniques, Dynamic Task Allocation (DTA) and clustering techniques for DTA and their relative merits are presented. The interprocessor communications are optimised by overlapping with computations to improve the performance of the domain decomposition based explicit dynamic analysis finite element code.The issues related to implementation of finite element code for nonlinear dynamic analysis on heterogeneous parallel computing environment are later presented. A new dynamic load-balancing algorithm is developed for this purpose and it is integrated with the domain decomposition based parallel explicit finite element code to test our algorithms on a coarse grain heterogeneous cluster of workstations. Numerical experiments have been carried out on PARAM-10000, an Indian parallel computer and also on cluster of Unix workstations.  相似文献   

13.
In this paper, we develop, study and implement iterative linear solvers and preconditioners using multiple graphical processing units (GPUs). Techniques for accelerating sparse matrix–vector (SpMV) multiplication, linear solvers and preconditioners are presented. Four Krylov subspace solvers, a Neumann polynomial preconditioner and a domain decomposition preconditioner are implemented. Our numerical tests with NVIDIA C2050 GPUs show that the SpMV kernel can be sped over 40 times faster using four GPUs. Our linear solvers and preconditioners have similar speedup.  相似文献   

14.
The numerical investigation of the interaction of large, solid particles with fluids is an important area of research for many manufacturing processes. Such studies frequently lead to models that are very large and require the use of parallel solution techniques. This paper presents the results of a parallel implementation of a serial code for the direct numerical simulation of solid-liquid flows. The base code is a serial, arbitrary Lagrangian-Eulerian (ALE) formulation of the equations of motion, which views that particles as solid bodies are embedded into the flow domain. This particular model poses some interesting difficulties for domain decomposition type approaches for parallel solutions. In particular, it is not fully understood how the partitioning of the particles among the subdomains influences the performance of parallel solvers. We present several strategies for the partitioning of the solid particles, focusing on the effectiveness of these techniques in terms of parallel speedup and efficiency.  相似文献   

15.
1.引言考虑求解线性方程组AX一b,X,bE*”,山其中A二(a;小_是大型稀疏非对称矩阵.通常使用迭代法求解式(1),如GMRESBICGSTAB,CGSTFQMRCGSZ等Kryl0V子空间迭代法.直接使用迭代法的收敛速度有时特别慢,或根本不收敛,需使用预条件以加速迭代法的收敛速度.通常使用左或右预条件子M使式(1)变成易于求解的形式*M9一6,X二M队或*AX二*6.由然后用迭代法求解式(2),M的选择要使得AM(或M则近似等于单位矩阵.构造预条件子的方法有很多,如不完全分解方法、SSOR方法、多项式方法等,不完全分解方法和SSOR…  相似文献   

16.
The present paper proposes recent developments in theoretical and implementation aspects including parallel computations via a single analysis code of a unified family of generalized integration operators [GInO] in time with particular emphasis on non-linear structural dynamics. The focus of this research is on the implementation aspects including the development of coarse-grained parallel computational models for such generalized time integration operators that be can readily ported to a wide range of parallel architectures via a message-passing paradigm (using MPI) and domain decomposition techniques. The implementation aspects are first described followed by an evaluation for a range of problems which exhibit large deformation, elastic, elastic–plastic dynamic behavior. For geometric non-linearity a total Lagrangian formulation and for material non-linearity elasto-plastic formulations are employed. Serial and parallel performance issues on the SGI Origin 2000 system are discussed and analyzed for illustration for selected schemes. For illustration, particular forms of [GInO] are investigated and a complete development via a single analysis code is currently underway. Nevertheless, this is the first time that such a capability is plausible and the developments further enhance computational structural dynamics areas.  相似文献   

17.
A block preconditioner is considered in a parallel computing environment. This preconditioner has good parallel properties, however, the convergence deteriorates when the number of blocks increases. Two different techniques are studied to accelerate the convergence: overlapping at the interfaces and using a coarse grid correction. It appears that the latter technique is indeed scalable, so the wall clock time is constant when the number of blocks increases. Furthermore, the method is easily added to an existing solution code.  相似文献   

18.
In this paper, based on the preconditioners presented by Rees and Greif [T. Rees, C. Greif, A preconditioner for linear systems arising from interior point optimization methods, SIAM J. Sci. Comput. 29 (2007) 1992-2007], we present a new block triangular preconditioner applied to the problem of solving linear systems arising from finite element discretization of the mixed formulation of the time-harmonic Maxwell equations (k=0) in electromagnetic problems, since linear systems arising from the corresponding equations and methods have the same matrix block structure. Similar to spectral distribution of the preconditioners presented by Rees and Greif, this paper analyzes the corresponding spectral distribution of the new preconditioners considered in this paper. From the views of theories and applications, the presented preconditioners are as efficient as the preconditioners presented by Rees and Greif to apply. Moreover, numerical experiments are also reported to illustrate the efficiency of the presented preconditioners.  相似文献   

19.
《国际计算机数学杂志》2012,89(1-4):223-242
Once a decomposition of the finite element space V into two or more subspaces is given, e.g., via domain decomposition, a specific Multiplicative Schwarz Method (MSM) and Additive Schwarz Method (ASM) is defined. In this paper, we analyse the MSM for the decomposition induced by the approximate discrete harmonic finite element basis which was introduced in a joint paper of the authors with A. Meyer (1990). The main theorem of the present paper states that a special symmetric version of the MSM with approximate orthoprojections is equivalent to some ASM with specially chosen basic transformation and block preconditioners. From this observation we can benefit twice. Indeed, the MSM-DD-preconditioner can be analysed in the MSM framework and implemented as specific ASM-DD-preconditioner in the parallel PCG method studied previously. Emphasis that we look at the ASM and MSM as techniques for defining and analysing parallel DD preconditioners used then in a parallelized version of the PCG-method which is well suited for computations on MIMD computers with local memory and message passing principle.  相似文献   

20.
Regularizing preconditioners for accelerating the convergence of iterative regularization methods without spoiling the quality of the approximated solution have been extensively investigated in the last twenty years. Several strategies have been proposed for defining proper preconditioners. Usually, in methods for image restoration, the structure of the preconditioner is chosen Block Circulant with Circulant Blocks (BCCB) because it can be efficiently exploited by Fast Fourier Transform (FFT). Nevertheless, for ill-conditioned problems, it is well-known that BCCB preconditioners cannot provide a strong clustering of the eigenvalues. Moreover, in order to get an effective preconditioner, it is crucial to preserve the structure of the coefficient matrix. The structure of such a matrix, in case of image deblurring problem, depends on the boundary conditions imposed on the imaging model. Therefore, we propose a technique to construct a preconditioner which has the same structure of the blurring matrix related to the restoration problem at hand. The construction of our preconditioner requires two FFTs like the BCCB preconditioner. The presented preconditioning strategy represents a generalization and an improvement with respect to both circulant and structured preconditioning available in the literature. The technique is further extended to provide a non-stationary preconditioning in the same spirit of a recent proposal for BCCB matrices. Some numerical results show the importance of preserving the matrix structure from the point of view of both restoration quality and robustness of the regularization parameter.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号