期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Polynomial collocation using a domain decomposition solution to parabolic PDE's via the penalty method and explicit/implicit time marching

Kelly Black 《Journal of scientific computing》1992,7(4):313-338

A domain decomposition method is examined to solve a time-dependent parabolic equation. The method employs an orthogonal polynomial collocation technique on multiple subdomains. The subdomain interfaces are approximated with the aid of a penalty method. The time discretization is implemented in an explicit/implicit finite difference method. The subdomain interface is approximated using an explicit Dufort-Frankel method, while the interior of each subdomain is approximated using an implicit backwards Euler's method. The principal advantage to the method is the direct implementation on a distributed computing system with a minimum of interprocessor communication. Theoretical results are given for Legendre polynomials, while computational results are given for Chebyshev polynomials. Results are given for both a single processor computer and a distributed computing system. 相似文献

2.

A parallel mixed time integration algorithm for nonlinear dynamic analysis

《Advances in Engineering Software》2002,33(5):261-271

This paper presents a parallel mixed time integration algorithm formulated by synthesising the implicit and explicit time integration techniques. The proposed algorithm is an extension of the mixed time integration algorithms [Comput. Meth. Appl. Mech. Engng 17/18 (1979) 259; Int. J. Numer. Meth. Engng 12 (1978) 1575] being successfully employed for solving media-structure interaction problems. The parallel algorithm for nonlinear dynamic response of structures employing mixed time integration technique has been devised within the broad framework of domain decomposition. Concurrency is introduced into this algorithm, by integrating interface nodes with explicit time integration technique and later solving the local submeshes with implicit algorithm. A flexible parallel data structure has been devised to implement the parallel mixed time integration algorithm. Parallel finite element code has been developed using portable Message Passing Interface software development environment. Numerical studies have been conducted on PARAM-10000 (Indian parallel supercomputer) to test the accuracy and also the performance of the proposed algorithm. Numerical studies indicate that the proposed algorithm is highly adaptive for parallel processing. 相似文献

3.

Parallel computation for shallow water flow: A domain decomposition approach

《Parallel Computing》1997,23(9):1261-1277

This paper describes a strategy for the parallelization of a finite element code for the numerical simulation of shallow water flow. The numerical scheme adopted for the discretization of the equations in the scalar algorithm is briefly described, with emphasis on the aspects concerning its porting to a parallel architecture. The parallelization strategy is of the domain decomposition type: the implicit computational kernel of the scheme, a Poisson problem, is solved by an additive Schwarz preconditioning technique within conjugate gradient iterations. Both the theoretical and the implementation aspects of the domain decomposition method are described as applied in the present context. Finally, some computational examples are shown and discussed. 相似文献

4.

Parallel-multigrid computation of unsteady incompressible viscous flows using a matrix-free implicit method and high-resolution characteristics-based scheme

《Computer Methods in Applied Mechanics and Engineering》2005,194(36-38):3949-3983

A three-dimensional parallel unstructured non-nested multigrid solver for solutions of unsteady incompressible viscous flow is developed and validated. The finite-volume Navier–Stokes solver is based on the artificial compressibility approach with a high-resolution method of characteristics-based scheme for handling convection terms. The unsteady flow is calculated with a matrix-free implicit dual time stepping scheme. The parallelization of the multigrid solver is achieved by multigrid domain decomposition approach (MG-DD), using single program multiple data (SPMD) and multiple instruction multiple data (MIMD) programming paradigm. There are two parallelization strategies proposed in this work, first strategy is a one-level parallelization strategy using geometric domain decomposition technique alone, second strategy is a two-level parallelization strategy that consists of a hybrid of both geometric domain decomposition and data decomposition techniques. Message-passing interface (MPI) and OpenMP standard are used to communicate data between processors and decompose loop iterations arrays, respectively. The parallel-multigrid code is used to simulate both steady and unsteady incompressible viscous flows over a circular cylinder and a lid-driven cavity flow. A maximum speedup of 22.5 could be achieved on 32 processors, for instance, the lid-driven cavity flow of Re = 1000. The results obtained agree well with numerical solutions obtained by other researchers as well as experimental measurements. A detailed study of the time step size and number of pseudo-sub-iterations per time step required for simulating unsteady flow are presented in this paper. 相似文献

5.

Porting an industrial sheet metal forming code to a distributed memory parallel computer

G.P. Nikishkov M. Kawka A. Makinouchi G. Yagawa S. Yoshimura 《Computers & Structures》1998,67(6):439-449

The parallel version of the sheet metal forming semi-implicit finite element code ITAS3D has been developed using the domain decomposition method and direct solution methods at both subdomain and interface levels. IBM Message Passing Library is used for data communication between tasks of the parallel code. Solutions of some sheet metal forming problems on IBM SP2 computer show that the adopted DDM algorithm with the direct solver provides acceptable parallel efficiency using a moderate number of processors. The speedup 6.7 is achieved for the problem with 20000 degrees-of-freedom on the 8-processor configuration. 相似文献

6.

MPI+TBB混合并行编程模型在分子动力学中的应用

白明泽赵文辉豆育升孙世新温迪《计算机应用研究》2012,29(5):1772-1774

为了提高分子动力学模拟在对称多处理(SMP)集群上的计算速度,在分子动力学并行方法中引入MPI+TBB的混合并行编程模型。基于该模型,在分子动力学软件LAMMPS中设计并实现混合并行算法,在节点间采用MPI及空间分解技术实施进程级并行,节点内采用TBB及临界区技术实施线程级并行。在SMP集群中的测试表明,该方法在体系较大以及节点数较多时可以明显减少通信时间,使加速比在纯MPI模型上提高45%。结果表明,MPI+TBB混合并行编程模型可促进分子动力学并行模拟且效率明显提升。相似文献

7.

Parallelization algorithm for implicit method computation of hypersonic nonequilibrium gas flow past a body,based on Navier-Stokes equations

A. B. Gorshkov 《Mathematical Models and Computer Simulations》2010,2(2):252-260

The earlier proposed algorithm of parallelization of the computer-code developed for solving the two-dimensional stationary Navier-Stokes equations using the implicit iterative scheme is extended to the nonequilibrium gaseous mixture flow. The parallelization algorithm is based on decomposition of the computation region into several parts corresponding to the number of processors, with the maintenance of the implicit type of a difference scheme in each subregion. The parallelization efficiency is analyzed by the example of the computation of the flow past a re-entry vehicle moving in the Earth’s atmosphere at a hypersonic velocity. The algorithm has demonstrated good scalability for a number of processors N ≤ 15. 相似文献

8.

Parallel implementation of a high-order implicit collocation method for the heat equation

《Mathematics and computers in simulation》2001,54(6):509-519

We combine a high-order compact finite difference approximation and collocation techniques to numerically solve the two-dimensional heat equation. The resulting method is implicit and can be parallelized with a strategy that allows parallelization across both time and space. We compare the parallel implementation of the new method with a classical implicit method, namely the Crank–Nicolson method, where the parallelization is done across space only. We find the set of conditions for which each method is more advantageous than the other. Numerical experiments are carried out on the SGI Origin 2000. 相似文献

9.

Parallel implementation of finite-element/newton method for solution of steady-state and transient nonlinear partial differential equations

M. Reza Mehrabi Robert A. Brown 《Journal of scientific computing》1995,10(1):93-137

Domain decomposition by nested dissection for concurrent factorization and storage (CFS) of asymmetric matrices is coupled with finite element and spectral element discretizations and with Newton's method to yield an algorithm for parallel solution of nonlinear initial-and boundary-value problem. The efficiency of the CFS algorithm implemented on a MIMD computer is demonstrated by analysis of the solution of the two-dimensional, Poisson equation discretized using both finite and spectral elements. Computation rates and speedups for the LU-decomposition algorithm, which is the most time consuming portion of the solution algorithm, scale with the number of processors. The spectral element discretization with high-order interpolating polynomials yields especially high speedups because the ratio of communication to computation is lower than for low-order finite element discretizations. The robustness of the parallel implementation of the finite-element/Newton algorithm is demonstrated by solution of steady and transient natural convection in a two-dimensional cavity, a standard test problem for low Prandtl number convection. Time integration is performed using a fully implicit algorithm with a modified Newton's method for solution of nonlinear equations at each time step. The efficiency of the CFS version of the finite-element/Newton algorithm compares well with a spectral element algorithm implemented on a MIMD computer using iterative matrix methods.Submitted toJ. Scientific Computing, August 25, 1994. 相似文献

10.

Structural dynamics methods for concurrent processing computers 总被引：3，自引：0，他引：3

K. N. Chiang R. E. Fulton 《Computers & Structures》1990,36(6):1031-1037

In the area of crash impact, research is urgently required on the development and evaluation of parallel methods for crash dynamics analysis of complex nonlinear finite element and/or finite difference structural problems. An investigation of selected nonlinear dynamics algorithms appropriate for parallel computers is reported. Implicit methods such as those of the Newmark type which build on the Cholesky decomposition strategy and explicit methods such as the central difference time integration method are included. Both implicit and explicit dynamics algorithms are investigated on two significantly different parallel computers, the FLEX/32 shared memory multicomputer and the INTEL iPSC Hypercube local memory computer. 相似文献

11.

A parallel FE–FV scheme to solve fluid flow in complex geologic media

Dim Coumou Stephan Matthäi Sebastian Geiger Thomas Driesner 《Computers & Geosciences》2008,34(12):1697-1707

相似文献

12.

非结构网格上求解中子输运方程的并行流水线Sn扫描算法 总被引：11，自引：4，他引：7

莫则尧傅连祥阳述林《计算机学报》2004,27(5):587-595

间断有限元离散纵标方法(Sn)是广泛应用于求解高维非定常中子输运方程的数值方法,它涉及几何网格空间、速度相空间和中子能群的离散,计算量很大．该文基于非结构网格,提出了基于区域分解的并行流水线Sn扫描算法,通过设计具有不同内在并行度和通信面体比的区域分解方法和队列插入算法,对两个不同物理模型,分别使用两台并行机的92个和256个CPU,获得72倍和78倍以上的加速．可扩展性能分析表明,算法的性能非常依赖于并行机的点对点通信延迟．相似文献

13.

Numerical results for a parallel linearly-implicit Runge-Kutta method

Jürgen Bruder 《Computing》1997,59(2):139-151

For the parallelization of implicit Runge-Kutta methods for stiff ODE’s a parallel computation of the stages is obvious. In this paper we consider the parallelization of the stages of linearly-implicit Runge-Kutta methods. The construction and implementation of a parallel linearly-implicit Runge-Kutta method is described. The numerical results are compared with the code PSODE of van der Houwen/Sommeijer [6] and a straightforward parallelization of RADAU5 [5]. All methods are based on the 3-stage implicit Radau-IIA method. 相似文献

14.

Parallel genetic simulated annealing: a massively parallel SIMDalgorithm

Chen H. Flann N.S. Watson D.W. 《Parallel and Distributed Systems, IEEE Transactions on》1998,9(2):126-136

Many significant engineering and scientific problems involve optimization of some criteria over a combinatorial configuration space. The two methods most often used to solve these problems effectively-simulated annealing (SA) and genetic algorithms (GA)-do not easily lend themselves to massive parallel implementations. Simulated annealing is a naturally serial algorithm, while GA involves a selection process that requires global coordination. This paper introduces a new hybrid algorithm that inherits those aspects of GA that lend themselves to parallelization, and avoids serial bottle-necks of GA approaches by incorporating elements of SA to provide a completely parallel, easily scalable hybrid GA/SA method. This new method, called Genetic Simulated Annealing, does not require parallelization of any problem specific portions of a serial implementation-existing serial implementations can be incorporated as is. Results of a study on two difficult combinatorial optimization problems, a 100 city traveling salesperson problem and a 24 word, 12 bit error correcting code design problem, performed on a 16 K PE MasPar MP-1, indicate advantages over previous parallel GA and SA approaches. One of the key results is that the performance of the algorithm scales up linearly with the increase of processing elements, a feature not demonstrated by any previous parallel GA or SA approaches, which enables the new algorithm to utilize massive parallel architecture with maximum effectiveness. Additionally, the algorithm does not require careful choice of control parameters, a significant advantage over SA and GA 相似文献

15.

Recursion based parallelization of exact dense linear algebra routines for Gaussian elimination

《Parallel Computing》2016

We present block algorithms and their implementation for the parallelization of sub-cubic Gaussian elimination on shared memory architectures. Contrarily to the classical cubic algorithms in parallel numerical linear algebra, we focus here on recursive algorithms and coarse grain parallelization. Indeed, sub-cubic matrix arithmetic can only be achieved through recursive algorithms making coarse grain block algorithms perform more efficiently than fine grain ones. This work is motivated by the design and implementation of dense linear algebra over a finite field, where fast matrix multiplication is used extensively and where costly modular reductions also advocate for coarse grain block decomposition. We incrementally build efficient kernels, for matrix multiplication first, then triangular system solving, on top of which a recursive PLUQ decomposition algorithm is built. We study the parallelization of these kernels using several algorithmic variants: either iterative or recursive and using different splitting strategies. Experiments show that recursive adaptive methods for matrix multiplication, hybrid recursive–iterative methods for triangular system solve and tile recursive versions of the PLUQ decomposition, together with various data mapping policies, provide the best performance on a 32 cores NUMA architecture. Overall, we show that the overhead of modular reductions is more than compensated by the fast linear algebra algorithms and that exact dense linear algebra matches the performance of full rank reference numerical software even in the presence of rank deficiencies. 相似文献

16.

A finite volume method parallelization for the simulation of free surface shallow water flows

A.I. Delis E.N. Mathioudakis 《Mathematics and computers in simulation》2009

We construct a parallel algorithm, suitable for distributed memory architectures, of an explicit shock-capturing finite volume method for solving the two-dimensional shallow water equations. The finite volume method is based on the very popular approximate Riemann solver of Roe and is extended to second order spatial accuracy by an appropriate TVD technique. The parallel code is applied to distributed memory architectures using domain decomposition techniques and we investigate its performance on a grid computer and on a Distributed Shared Memory supercomputer. The effectiveness of the parallel algorithm is considered for specific benchmark test cases. The performance of the realization measured in terms of execution time and speedup factors reveals the efficiency of the implementation. 相似文献

17.

A hybrid message passing/shared memory parallelization of the adaptive integral method for multi-core clusters

Fangzhou Wei Ali E. Yilmaz 《Parallel Computing》2011,37(6-7):279-301

A hybrid message passing and shared memory parallelization technique is presented for improving the scalability of the adaptive integral method (AIM), an FFT based algorithm, on clusters of identical multi-core processors. The proposed hybrid MPI/OpenMP parallelization scheme is based on a nested one-dimensional (1-D) slab decomposition of the 3-D auxiliary regular grid and the associated AIM calculations: If there are M processors and T cores per processor, the scheme (i) divides the regular grid into M slabs and MT sub-slabs, (ii) assigns each slab/sub-slab and the associated operations to one of the processors/cores, and (iii) uses MPI for inter-processor data communication and OpenMP for intra-processor data exchange. The MPI/OpenMP parallel AIM is used to accelerate the solution of the combined-field integral equation pertinent to the analysis of time-harmonic electromagnetic scattering from perfectly conducting surfaces. The scalability of the scheme is investigated theoretically and verified on a state-of-the-art multi-core cluster for benchmark scattering problems. Timing and speedup results on up to 1024 quad-core processors show that the hybrid MPI/OpenMP parallelization of AIM exhibits better strong scalability (fixed problem size speedup) than pure MPI parallelization of it when multiple cores are used on each processor. 相似文献

18.

Pathfinder: A parallel search algorithm for concerted atomistic events

Aiichiro Nakano 《Computer Physics Communications》2007,176(4):292-299

An algorithm has been designed to search for the escape paths with the lowest activation barriers when starting from a local minimum-energy configuration of a many-atom system. The pathfinder algorithm combines: (1) a steered eigenvector-following method that guides a constrained escape from the convex region and subsequently climbs to a transition state tangentially to the eigenvector corresponding to the lowest negative Hessian eigenvalue; (2) discrete abstraction of the atomic configuration to systematically enumerate concerted events as linear combinations of atomistic events; (3) evolutionary control of the population dynamics of low activation-barrier events; and (4) hybrid task + spatial decompositions to implement massive search for complex events on parallel computers. The program exhibits good scalability on parallel computers and has been used to study concerted bond-breaking events in the fracture of alumina. 相似文献

19.

无穷凹角区域椭圆边值问题的重叠型区域分解算法

杨敏杜其奎《数值计算与计算机应用》2004,25(2):90-99

§1.引言许多科学和工程计算问题都可归结为无界区域上的偏微分方程边值问题,数值求解无界相似文献

20.

Efficient parallelization of a parabolized flow solver

A. Povitsky 《Computers & Fluids》1998,27(8):985-1000

This article describes application of our theory of parallelization of implicit ADI schemes to parabolized flows. A parallel multi-domain version of a turbulent developing flow in a straight duct (case A) and viscous flow in a curved duct (case B) are presented. Semi-implicit and explicit methods for the determination of boundary values for the auxiliary ADI functions on the interfaces between the sub-domains are utilized. Numerical runs show that the proposed algorithm is valid in the regions with rapidly varying fields of governing variables (near-entrance region for the case A, region 30°<θ<60° for the case B) as well as in the regions with slow axial modification of the flowfield. The algorithm is suitable for small transverse velocity (case A) and for transverse velocity of order of streamwise velocity (case B). A simplified version of our theoretical model of parallel efficiency is developed and utilized for optimal multidomain partitioning. Computer runs of the multi-domain code are done on a Meiko CS and on a DEC Alpha farm with PVM communication software. The predictions of parallel efficiency obtained by the model compare well with those of actual computer runs. The parallelization parameters obtained are quite different for two considered MIMD machines. This fact confirms the importance of a priori estimation of parallelization efficiency of an algorithm and correct choice of a parallel computer. 相似文献