期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Parallel computation of three-dimensional nonlinear magnetostatic problems

David Levine William Gropp Kimmo Forsman Lauri Kettunen 《Concurrency and Computation》1999,11(2):109-120

We describe a general-purpose parallel code for computing accurate solutions to large computationally demanding, 3D, nonlinear magnetostatic problems. The code, CORAL, is based on a volume integral equation formulation. Using an IBM SP parallel computer and iterative solution methods, we successfully solved the dense linear systems inherent in such formulations. A key component of our work was the use of the PETSc library, which provides parallel portability and access to the latest linear algebra solution technology. Copyright © 1999 John Wiley & Sons, Ltd. 相似文献

2.

多色SSOR-PCG的MPI+OpenMP混合编程实现

林绍忠许合伟颉志强《计算机辅助工程》2013,22(6):79-83

针对对称逐步超松驰预处理共轭梯度（Symmetric Successive Over Relaxation Preconditioned Conjugate Gradient,SSOR-PCG）法并行化时每步迭代都要并行求解2个三角方程组的困难,采用多色排序技术提高并行度,基于MPI＋OpenMP混合编程模型开发适合于分布共享内存计算机的并行程序,通过测试选择有效的MPI通信函数,并给出3种避免共享数据竞争的措施,供不同规模问题和不同内存容量计算机情况选用．相似文献

3.

Parallel scheduling of the PCG method for banded matrices rising from FDM/FEM

E. M. Ortigosa L. F. Romero J. I. Ramos 《Journal of Parallel and Distributed Computing》2003,63(12):1243-1256

An implicit time-linearized finite difference discretization of partial differential equations on regular/structured meshes results in an n-diagonal block system of algebraic equations, which is usually solved by means of the Preconditioned Conjugate Gradient (PCG) method. In this paper, an analysis of the parallel implementation of this method on several computer architectures and for several programming paradigms is presented. For three-dimensional regular/structured meshes, a new implementation of the PCG method with Jacobi preconditioner is proposed. For the computer architectures and number of processors employed in this study, it has been found that this implementation is more efficient than the standard one, and can be applied to narrow-band matrices and other preconditioners, such as, for example, polynomial ones. 相似文献

4.

Fast parallel Preconditioned Conjugate Gradient algorithms for robot manipulator dynamics simulation

Amir Fijany Robert E. Scheid 《Journal of Intelligent and Robotic Systems》1994,9(1-2):73-99

In this paper fast parallel Preconditioned Conjugate Gradient (PCG) algorithms for robot manipulator forward dynamics, or dynamic simulation, problem are presented. By exploiting the inherent structure of the forward dynamics problem, suitable preconditioners are devised to accelerate the iterations. Also, based on the choice of preconditioners, a modified dynamic formulation is used to speedup both serial and parallel computation of each iteration. The implementation of the parallel algorithms on two interconnected processor arrays is discussed and their computation and communication complexities are analyzed. The simulation results for a Puma Arm are presented to illustrate the effectiveness of the proposed preconditioners. With a faster convergence due to preconditioning and a faster computation of iterations due to parallelization, the developed parallel PCG algorithms represent the fastest alternative for parallel computation of the problem withO(n) processors. 相似文献

5.

Numerical Treatment of Elliptic Problems Nonlinearly Coupled Through the Interface

Francesco Calabrò 《Journal of scientific computing》2013,57(2):300-312

This work is devoted to the study of the numerical treatment of linear elliptic problems in adjoined domains nonlinearly coupled at the interface. The problem arises in semi-discretization of mass diffusion problems typically when an osmotic effect is taken into account. Convergence of both the Conjugate Gradient and the Fixed Point method are considered and compared. 相似文献

6.

A comparison of conjugate gradient preconditionings for three-dimensional problems on a CRAY-1

《Computer Physics Communications》1985,37(1-3):205-214

This paper presents the results of a study of the application of the Preconditioned Conjugate Gradient algorithm to some equations arising in three-dimensional turbulent flow prediction. The merits of several of the standard preconditioning are discussed, and some of the difficulties which are encountered when trying to solve such problems efficiently on a vector computer are described. The results show that the coefficient matrices which arise have important differences from those which derive from many standard test problems. Efficient solution of the problems is still possible, but the correct choice of preconditioning is important. In particular, the application of truccated power series expansions to the preconditioning phase is shown to produce substantial gains. 相似文献

7.

基于改进共轭梯度的大规模多输入多输出预编码

白鹤刘紫燕张杰万培佩马珊珊《计算机应用》2019,39(10):3007-3012

针对大规模多输入多输出（Massive MIMO）系统下行链路预编码实现复杂、线性预编码矩阵求逆困难等问题，提出一种基于对称逐步超松弛预处理共轭梯度法（SSOR-PCG）的低复杂度预编码算法。该算法在共轭梯度（PCG）算法的基础上，采用对称逐步超松弛分裂（SSOR）算法对矩阵进行预处理以降低矩阵的条件数，达到提高预编码算法收敛速度、降低复杂度的目的。仿真结果表明：与PCG算法相比，所提出的SSOR-PCG预编码算法运行时间缩短约88.93%，在信噪比为26 dB时已收敛；与迫零预编码算法相比，所提算法迭代2次即可获得与迫零预编码算法相近的系统容量性能，复杂度降低约一个数量级，误码率降低约49.94%。相似文献

8.

A Large scale finite element analysis using domain decomposition method on a parallel computer

G. Yagawa N. Soneda S. Yoshimura 《Computers & Structures》1991,38(5-6):615-625

A parallel finite element analysis based on a domain decomposition technique (DDT) is considered. In the present DDT, an analysis domain is divided into a number of smaller subdomains without overlap. Finite element analyses of the subdomains are performed under the constraint of both displacement continuity and force equivalence among them. The constraint is satisfied through iterative calculations based on either the Uzawa algorithm or the Conjugate Gradient (CG) method. Owing to the iterative algorithm, a large scale finite element analysis can be divided into a number of smaller ones which can be carried out in parallel.

The DDT is implemented on a parallel computer network composed of a number of 32-bit microprocessors, transputers. The developed parallel calculation system named the ‘FEM server type system’ involves peculiar features such as network independence and dynamic workload balance.

The characteristics of the domain decomposition method such as computational speed and memory requirement are first examined in detail through the finite element calculations of homogeneous or inhomogeneous cracked plate subjected to a tensile load on a single CPU computer.

The ‘speedup’ and ‘performance’ features of the FEM server type system are discussed on a parallel computer system composed of up to 16 transputers, with changing network types and domain decompositions. It is clearly demonstrated that the present parallel computing system requires a much smaller amount of computational memory than the conventional finite element method and also that, due to the feature of dynamic workload balancing, high performance (over 90%) is achieved even in a large scale finite element calculation with irregular domain decomposition. 相似文献

9.

On the control of the navier- stokes equation with the extended conjugate gradient method

《国际计算机数学杂志》2012,89(1):75-91

The paper formulates and solves an optimal control problem of the Navier-Stokes equation for an incompressible fluid, employing an Extended Conjugate Gradient algorithm, with computer simulations for the optimal velocity fields. 相似文献

10.

有限元并行程序设计与实现 总被引：1，自引：0，他引：1

余天堂姜弘道《数值计算与计算机应用》2000,21(2):155-160

1．引言有限元并行计算的一个主要途径是利用子结构方法山;并行对各子结构进行静凝聚,再并行求解界面方程,然后并行回代求内点位移和计算应变、应力．并行程序的设计与有效实现强烈地依赖于并行机硬件的计算模型．网络并行计算由于具有巨大的计算潜能、良好的性能价格比和可扩展性,以及灵活的体系结构等优点,和以PVM,MPI,EXPRESSP[2,3]等为代表的一批基于消息传递的并行程序设计软件平台的出现,使得可伸缩分布式网络并行有限元成了有限元并行计算的一个重要方向．本文详细介绍了基于PVM的分布式网络并行环境下有限元并行分… 相似文献

11.

Successive underrelaxation (SUR) and generalised conjugate gradient (GCG) methods for hyperbolic difference equations on a parallel computer

D.J Evans C Li 《Parallel Computing》1990,16(2-3):207-220

When we consider the numerical solution of the 2-dimensional linear hyperbolic problem by implicit difference equations we need to solve a set of linear systems Ax = b with many rightband sides b, where A is large, sparse and nonsymmetric. The SUR (Successive Underrelaxation) and GCG (Generalised Conjugate Gradient) methods are used for solving the linear systems. We compare the two methods on sequential and parallel computations. Numerical results indicate that the SUR method is nearly twice as fast as the GCG method and the SUR method has an almost linear speedup. 相似文献

12.

On a model of three-dimensional bursting and its parallel implementation

S. Tabik E.M. Garzón 《Computer Physics Communications》2008,178(7):471-485

A mathematical model for the simulation of three-dimensional bursting phenomena and its parallel implementation are presented. The model consists of four nonlinearly coupled partial differential equations that include fast and slow variables, and exhibits bursting in the absence of diffusion. The differential equations have been discretized by means of a second-order accurate in both space and time, linearly-implicit finite difference method in equally-spaced grids. The resulting system of linear algebraic equations at each time level has been solved by means of the Preconditioned Conjugate Gradient (PCG) method. Three different parallel implementations of the proposed mathematical model have been developed; two of these implementations, i.e., the MPI and the PETSc codes, are based on a message passing paradigm, while the third one, i.e., the OpenMP code, is based on a shared space address paradigm. These three implementations are evaluated on two current high performance parallel architectures, i.e., a dual-processor cluster and a Shared Distributed Memory (SDM) system. A novel representation of the results that emphasizes the most relevant factors that affect the performance of the paralled implementations, is proposed. The comparative analysis of the computational results shows that the MPI and the OpenMP implementations are about twice more efficient than the PETSc code on the SDM system. It is also shown that, for the conditions reported here, the nonlinear dynamics of the three-dimensional bursting phenomena exhibits three stages characterized by asynchronous, synchronous and then asynchronous oscillations, before a quiescent state is reached. It is also shown that the fast system reaches steady state in much less time than the slow variables. 相似文献

13.

Hiding global synchronization latency in the preconditioned Conjugate Gradient algorithm

P. Ghysels W. Vanroose 《Parallel Computing》2014

Scalability of Krylov subspace methods suffers from costly global synchronization steps that arise in dot-products and norm calculations on parallel machines. In this work, a modified preconditioned Conjugate Gradient (CG) method is presented that removes the costly global synchronization steps from the standard CG algorithm by only performing a single non-blocking reduction per iteration. This global communication phase can be overlapped by the matrix–vector product, which typically only requires local communication. The resulting algorithm will be referred to as pipelined CG. An alternative pipelined method, mathematically equivalent to the Conjugate Residual (CR) method that makes different trade-offs with regard to scalability and serial runtime is also considered. These methods are compared to a recently proposed asynchronous CG algorithm by Gropp. Extensive numerical experiments demonstrate the numerical stability of the methods. Moreover, it is shown that hiding the global synchronization step improves scalability on distributed memory machines using the message passing paradigm and leads to significant speedups compared to standard preconditioned CG. 相似文献

14.

Auto-tuned Krylov methods on cluster of graphics processing unit

Frédéric Magoulès Abal-Kassim Cheik Ahamed Roman Putanowicz 《国际计算机数学杂志》2015,92(6):1222-1250

Exascale computers are expected to have highly hierarchical architectures with nodes composed by multiple core processors (CPU; central processing unit) and accelerators (GPU; graphics processing unit). The different programming levels generate new difficult algorithm issues. In particular when solving extremely large linear systems, new programming paradigms of Krylov methods should be defined and evaluated with respect to modern state of the art of scientific methods. Iterative Krylov methods involve linear algebra operations such as dot product, norm, addition of vectors and sparse matrix–vector multiplication. These operations are computationally expensive for large size matrices. In this paper, we aim to focus on the best way to perform effectively these operations, in double precision, on GPU in order to make iterative Krylov methods more robust and therefore reduce the computing time. The performance of our algorithms is evaluated on several matrices arising from engineering problems. Numerical experiments illustrate the robustness and accuracy of our implementation compared to the existing libraries. We deal with different preconditioned Krylov methods: Conjugate Gradient for symmetric positive-definite matrices, and Generalized Conjugate Residual, Bi-Conjugate Gradient Conjugate Residual, transpose-free Quasi Minimal Residual, Stabilized BiConjugate Gradient and Stabilized BiConjugate Gradient (L) for the solution of sparse linear systems with non symmetric matrices. We consider and compare several sparse compressed formats, and propose a way to implement effectively Krylov methods on GPU and on multicore CPU. Finally, we give strategies to faster algorithms by auto-tuning the threading design, upon the problem characteristics and the hardware changes. As a conclusion, we propose and analyse hybrid sub-structuring methods that should pave the way to exascale hybrid methods. 相似文献

15.

Comparison of Krylov subspace methods with preconditioning techniques for solving boundary value problems

S. Sundar B. K. Bhagavan 《Computers & Mathematics with Applications》1999,38(11-12)

In this paper, we made an attempt to establish the usefulness of Lanczos solver with preconditioning technique over the preconditioned Conjugate Gradient (CG) solvers. We have presented here a detail comparative study with respect to convergence, speed as well as CPU-time, by considering appropriate boundary value problems. 相似文献

16.

Efficient solution of fluid flow using the generalised conjugate grandient algorithm on a transputer-based machine

《Computing Systems in Engineering》1995,6(4-5):319-324

The discretisation of the equations governing fluid flow gives rise to coupled, quasi-linear and non-symmetric systems. The solution is usually obtained by iteration using a guess-and-correct procedure where each iteration aims to improve the solution of the previous step. Each step or outer iteration of the process involves the solution of nominally linear algebraic systems. These systems are normally solved using methods based on the Gauss-Seidel iteration—such as the TDMA. However, these methods generally converge very slowly and can be very time consuming for realistic applications. In this paper, these equations are solved using the Generalised Conjugate Gradient (GCG) algorithm with a simple-to-implement Gauss-Seidel-based preconditioner on a distributed memory message-passing machine. We take advantage of the fact that only tentative improvements to the flow-field are sought during each iteration and study the convergence behaviour of the parallel implementation on a multi-processor environment. 相似文献

17.

Parallel custom instruction identification for extensible processors

《Journal of Systems Architecture》2017

With the ability of customization for an application domain, extensible processors have been used more and more in embedded systems in recent years. Extensible processors customize an application domain by executing parts of application code in hardware instead of software. Determining parts of application code as custom instruction generally requires subgraph enumeration and subgraph selection. Both subgraph enumeration problem and subgraph selection problem are computationally difficult problems. Most of previous works focus on sequential algorithms for these two problems. In this paper, we present a parallel implementation of a latest subgraph enumeration algorithm based on a computer cluster. A standard ant colony optimization algorithm (ACO), a modified version of ACO with local optimum search and a parallel ACO algorithm are also proposed to solve the subgraph selection problem in this work. Experimental results show that the parallel algorithms outperform the sequential algorithms in terms of runtime or (and) quality of results. In addition, we have formally proved the upper bound on the number of feasible solutions in subgraph selection problem with or without the overlapping constraint. 相似文献

18.

An M-Stage System with Indeterminate Processing Time. I. Schedule Optimization

V. I. Levin 《Automation and Remote Control》2002,63(1):103-110

The well-known m-machine problem in which job times are defined by intervals of possible values is formulated in new terms. This problem is reduced to two usual m-machine problems and a solution algorithm is designed. Part I is devoted to the formulation of the problem and ideas underlying its solution and the necessary mathematical apparatus, whereas Part II is concerned with the solution. 相似文献

19.

Approximate Inverse Preconditioners for the Conjugate Gradient Method

《国际计算机数学杂志》2012,89(4):495-521

The method of Conjugate Gradients is known to converge for symmetric positive definite systems of equations. This paper applies it to non-symmetric and ill-conditioned matrices. In order to facilitate convergence, an approximate inverse is used to precondition the Conjugate Gradient method. This is achieved by applying Newton's method. Three versions of Newton's method are introduced to compute the approximate inverse. Convergence of each version is compared. Numerical experimentation is done for some known "ill-conditioned" problems. 相似文献

20.

A program generator for the incomplete cholesky conjugate gradient (ICCG) method with a symmetrizing preprocessor

G. Kuo-Petravic M. Petravic 《Computer Physics Communications》1981,22(1):33-48

This paper is an extension of our previous paper “A program generator for the Incomplete LU-decomposition-Conjugate Gradient (ILUCG) method” which appeared in Computer Physics Communications. In that paper we presented a generator program which produced a code package to solve the system of equations Ax = bb, where A is an arbitrary nonsingular matrix, by the ILUCG method. In the present paper we offer an alternative generator program which produces a code package applicable to the case where A is symmetric and positive definite. The numerical algorithm used is the Incomplete Cholesky Conjugate Gradient (ICCG) method of Meijerink and Van der Vorst which executes approximately twice as fast per iteration as the ILUCG method. In addition, we provide an optional preprocessor to treat the case of a not diagonally dominant nonsymmetric and nonsingular matrix A by solving the equation A^TAx = A^Tb. 相似文献