首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This paper presents parallel computational strategies to implement explicit nonlinear finite element analysis code onto distributed memory parallel computers for solving large-scale problems in structural dynamics. Implementation details on both homogeneous and heterogeneous parallel processing environments are considered in detail in this paper. Implementation of an explicit nonlinear finite element dynamic analysis code on homogeneous systems is discussed first and this is later moved onto heterogeneous systems. Domain decomposition with explicit message passing is preferred for parallel implementation. The message passing implementation in the parallel algorithm is based on MPI (Message Passing Interface) libraries. Implementation aspects of overlapped, non-overlapped domain decomposition techniques, Dynamic Task Allocation (DTA) and clustering techniques for DTA and their relative merits are presented. The interprocessor communications are optimised by overlapping with computations to improve the performance of the domain decomposition based explicit dynamic analysis finite element code.The issues related to implementation of finite element code for nonlinear dynamic analysis on heterogeneous parallel computing environment are later presented. A new dynamic load-balancing algorithm is developed for this purpose and it is integrated with the domain decomposition based parallel explicit finite element code to test our algorithms on a coarse grain heterogeneous cluster of workstations. Numerical experiments have been carried out on PARAM-10000, an Indian parallel computer and also on cluster of Unix workstations.  相似文献   

2.
Concepts and implementation of parallel finite element analysis   总被引:1,自引:0,他引:1  
The design of complex engineering systems such as advanced aircraft structures and offshore platforms requires continually increasing levels of detail in supporting analysis. The finite element method is widely used as a computational method with which to model physical systems in various engineering problems. For detailed analyses of complex designs, structural models composed of several thousands of degrees of freedom are no longer uncommon. Such design activities require large order finite element and/or finite difference models and excessive computation demands in both calculation speed and information management. The computer simulation of the nonlinear dynamic response of structures and the implementation of parallel FEM systems on a high speed multiprocessor have received considerable attention in recent years. The driving forces of these activities included the reliable simulation of automotive and aircraft crash phenomena, and the increased performance of computers. Most existing major structural analysis software systems were designed 10–20 years ago and have been optimized for current sequential computers. Such systems often are not well structured to take maximum advantage of the recent and continuing revolution in parallel vector computing capabilities. These parallel vector computer architectures not only occur in the form of large supercomputers, but are now also occurring for minicomputers and even engineering workstations. To benefit from advances in parallel computers, software must be developed which takes maximum advantage of the parallel processing feature.  相似文献   

3.
基于曙光并行机的超大规模非线性方程组并行算法研究   总被引:8,自引:0,他引:8  
该文讨论了一类求解大规模非线性方程组算法的并行性能及其在曙光并行机上的实现过程,与传统的算法不同之处是用一个块对角矩阵作为迭代矩阵,且该矩阵可由一个仅包含向量内积和矩阵与向量乘积的递推关系简便计算得到,在对算法进行描述之后,分析了算法的并行加速比和存储需求,讨论了算法在基于消息传递的MPI并行环境下的实现流程,数值计算表明理论分析与数值结果相比,算法在分布式并行环境下具有有较好的并行主攻较低的存储要求,可适用于大规模科学与工程的高性能计算。  相似文献   

4.
对流占优扩散问题的并行计算   总被引:1,自引:0,他引:1  
1.引言 在刻画流体运动的某些物理现象,以及研究热的传导、粒子的扩散等问题时,都会归结到求解对流扩散方程.用有限差分方法求解该方程,若采用显式方法,计算格式简单,但它们都是条件稳定的,时间步长必须取得非常小;若采用隐式方法,方法是无条件稳定的,但要解代数方程组,求解比较困难.D.J.EVANS和A.R.AHMAD在文[2]中提出了用显式交替方向法求解定态椭圆型方程,对Laplace方程做了数值实验.本文将这个方法推广到了时间依赖的问题,而且适用于对流占优扩散问题的求解.基于二阶迎风格式[1];本…  相似文献   

5.
A new class of algorithms for transient finite element analysis which is amenable to an efficient implementation in parallel computers is proposed. The suitability of the method for parallel computation stems from the fact that, given an arbitrary partition of the finite element mesh, each subdomain in the partition can be processed over a time step independently and simultaneously with the rest. Both element-by-element and coarse partitions of the mesh are discussed. For the former, the proposed algorithms are shown to have the structure of an explicit scheme. In particular, no global equation solving effort is involved in the update procedure. However, in contrast to explicit schemes the proposed algorithms are shown to be unconditionally stable over a certain range of the algorithmic parameters. In structural dynamics problems, good accuracy is obtained with a constant time step integration. For heat conduction problems accuracy limitations suggest the use of a step-changing technique. When this is done, numerical tests indicate the good behavior of the method. The case in which the mesh is partitioned into a small number of subdomains, typically as many as processors in the computer, is also explored in detail. Good accuracy is obtained over a wide range of time steps. Finally, extensions to second- and higher-order accuracy methods are discussed.  相似文献   

6.
Multidimensional binary search tree (abbreviated k-d tree) is a popular data structure for the organization and manipulation of spatial data. The data structure is useful in several applications including graph partitioning, hierarchical applications such as molecular dynamics and n-body simulations, and databases. In this paper, we study efficient parallel construction of k-d trees on coarse-grained distributed memory parallel computers. We consider several algorithms for parallel k-d tree construction and analyze them theoretically and experimentally, with a view towards identifying the algorithms that are practically efficient. We have carried out detailed implementations of all the algorithms discussed on the CM-5 and report on experimental results  相似文献   

7.
A comprehensive survey of direct time-integration methods and computational solution procedures for easier computer implementation is given in four parts for dynamic analysis of linear and nonlinear structures.

Part I is exclusively devoted to explicit methods. Popular second order central difference methods (formulation, step-by-step solution procedures, recent developments, computational and stability aspects) are described in detail. Other explicit methods, viz. Runge-Kutta methods, stiffly stable methods, Predictor-Corrector methods and Taylor series schemes are also presented. Techniques for stabilizing numerical computations are given.

In Part II, conventional implicit methods, viz. the Newmark, Wilson-θ and Houbolt methods and their step-by-step solution procedures are given with reference to solution of linear and nonlinear structural dynamics problems. Also presented are Trujillo's modified Newmark-beta method and implicit formulae via weighted residual approach. Computational and stability aspects, desirable characteristics of an ideal solution procedure and salient features of conventional implicit algorithms are discussed.

Part III reviews further developments in implicit methods. In Part IV, mixed implicit-explicit finite element methods and operator-splitting methods are described.

Numerical solution methods surveyed here will be of much use to practicing computational/finite element/structural engineers working in the area of dynamics of structures.  相似文献   


8.
Numerical solution of nonlinear equilibrium problems of structures by means of Newton-Raphson type iterations is reviewed. Each step of the iteration is shown to correspond to the solution of a linear problem, therefore the feasibility of the finite element method for nonlinear analysis is established. Organization and flow of data for various types of digital computers, such as single-processor/single-level memory, single-processor/two-level-memory, vector-processor/two-level-memory, and parallel-processors, with and without sub-structuring (i.e. partitioning) are given. The effect of the relative costs of computation, memory and data transfer on substructuring is shown. The idea of assigning comparable size substructures to parallel processors is exploited. Under Cholesky type factorization schemes, the efficiency of parallel processing is shown to decrease due to the occasional shared data, just as that due to the shared facilities.  相似文献   

9.
Parallel computers are having a profound impact on computational science. Recently highly parallel machines have taken the lead as the fastest supercomputers, a trend that is likely to accelerate in the future. We describe some of these new computers, and issues involved in using them. We present elliptic PDE solutions currently running at 3.8 gigaflops, and an atmospheric dynamics model running at 1.7 gigaflops, on a 65 536-processor computer.

One intrinsic disadvantage of a parallel machine is the need to perform inter-processor communication. It is important to ensure that such communication time is maintained at a small fraction of computation time. We analyze standard multigrid algorithms in two and three dimensions from this point of view, indicating that performance efficiencies in excess of 95% are attainable under suitable conditions on moderately parallel machines. We also demonstrate that such performance is not attainable for multigrid on massively parallel computers, as indicated by an example of poor multigrid efficiency on 65 536 processors. The fundamental difficulty is the inability to keep 65 536 processors busy when operating on very coarse grids.

Most algorithms used for implementing applications on parallel machines have been derived directly from algorithms designed for serial machines. The previously mentioned multigrid example indicates that such ‘parallelized’ algorithms may not always be optimal. Parallel machines open the possibility of finding totally new approaches to solving standard tasks—intrinsically parallel algorithms. In particular, we present a class of superconvergent multiple scale methods that were motivated directly by massevely parallel machines. These methods differ from standard multigrid methods in an intrinsic way, and allow all processors to be used at all times, even when processing on the coarsest grid levels. Their serial versions are not sensible algorithms. The idea that parallel hardware—the Connection Machine in this case—can lead to discovery of new mathematical algorithms was surprising for us.  相似文献   


10.
A collection of results is presented regarding the consistency, stability and accuracy of operator split methods and product formula algorithms for general nonlinear equations of evolution. These results are then applied to the structural dynamics problem. The basic idea is to exploit an element-by-element additive decomposition of a particular form of the discrete dynamic equations resulting from a finite element discretization. It is shown that such a particular form of the discrete dynamic equations is obtained when velocity and stress are taken as unknowns. By applying the general product formula technique to the element-by-element decomposition, unconditionally stable algorithms are obtained that involve only element coefficient matrices. The storage requirements and operation counts are comparable to those of explicit methods. The method places no restriction on the topology of the finite element mesh.  相似文献   

11.
Computational Fluid Dynamics (CFD) methods for solving traffic flow continuum models have been studied and efficiently implemented in traffic simulation codes in the past. This is the first time that such methods are studied from the point of view of parallel computing. We studied and implemented an implicit numerical method for solving the high-order flow conservation traffic model on parallel computers. Implicit methods allow much larger time-step than explicit methods, for the same accuracy. However, at each time-step a nonlinear system must be solved. We used the Newton method coupled with a linear iterative method (Orthomin). We accelerated the convergence of Orthomin with parallel incomplete LU factorization preconditionings. We ran simulation tests with real traffic data from an 12-mile freeway section (in Minnesota) on the nCUBE2 parallel computer. These tests gave the same accuracy as past tests, which were performed on one-processor computers, and the overall execution time was significantly reduced.  相似文献   

12.
多体问题(N-body)是力学的基本问题之一,研究N个质点互相作用的运动规律。结合分子动力学计算模拟软件LAMMPS和天体多体物理模拟软件Gadget-2这两个有广泛应用的多体并行计算软件,分析其基本算法和实现,讨论这两个有代表性的并行计算软件在GPU等加速部件上移植的基本思路。  相似文献   

13.
This paper develops algorithms for filtering and smoothing for parallel computers. Numerical results are presented and implementation details are discussed. In the example it is illustrated that parallel methods have better convergence properties than nonparallel methods for nonlinear problems.  相似文献   

14.
In this paper we benchmark the performance of the Cray T3D, IBM 9076 SP/1 and Intel Paragon XP/S parallel computers, using implementations of parallel algorithms for the computation of the vector outer-product A = uvT operation. The vector outer-product operation, although very simple in nature, requires the computation of a large number of floating-point operations and its parallelization induces a great level of communication between the processors. It is thus suited to measure the relative speed of the processor, memory subsystem and network capabilities of a parallel computer. It should not be considered a ‘toy problem’, since it arises in numerical methods in the context of the solution of systems of non-linear equations – still a difficult problem to solve. We present algorithms for both the explicit shared-memory and message-passing programming models together with theoretical computation models for those algorithms. Actual experiments were run on those computers, using Fortran 77 implementations of the algorithms. The results obtained with these experiments show that due to the high degree of communication between the processors one needs a parallel computer with fast communications and carefully implemented data exchange routines. The theoretical computation model allows prediction of the speed-up to be obtained for some problem size on a given number of processors. © 1997 John Wiley & Sons, Ltd.  相似文献   

15.
针对大规模结构非线性动力问题的有限元分析非常耗时,基于消息传递接口(MPI)机群环境,提出多种基于并行求解策略的显式有限元并行算法。基于显式消息传递的区域分解技术,采取重叠、非重叠区域分解技术及动态任务分配方法,通过将计算与通信重叠,优化处理器间的通信,对非重叠通信区域分解并行算法、重叠通信区域分解并行算法、群动态任务分配算法、动态任务分配算法及动态负载平衡算法进行研究。为在机群环境下实现非线性动力有限元分析,开发了基于有效并行求解策略的显式有限元并行算法。编写了基于消息传递编程模式的并行有限元程序,在工作站机群上实现了数值算例,分析了算法的性能,并与传统的Newmark算法进行了比较。算例表明:群动态任务分配算法的性能优于动态任务分配算法,低于区域分解算法的性能,动态负载平衡算法最优。对相同规模的问题提出的算法比Newmark算法快,优于Newmark算法。对结构非线性动力问题的有限元分析,所提出的并行算法是可行有效的。  相似文献   

16.
Spectral elements combine the accuracy and exponential convergence of conventional spectral methods with the geometric flexibility of finite elements. Additionally, there are several apparent computational advantages to using spectral element methods on microprocessors. In particular, the computations are naturally cache-blocked and derivatives may be computed using nearest neighbor communications. Thus, an explicit spectral element atmospheric model has demonstrated close to linear scaling on a variety of distributed memory computers including the IBM SP and Linux Clusters. Explicit formulations of PDE's arising in geophysical fluid dynamics, such as the primitive equations on the sphere, are time-step limited by the phase speed of gravity waves. Semi-implicit time integration schemes remove the stability restriction but require the solution of an elliptic BVP. By employing a weak formulation of the governing equations, it is possible to obtain a symmetric Helmholtz operator that permits the solution of the implicit problem using conjugate gradients. We find that a block-Jacobi preconditioned conjugate gradient solver accelerates the simulation rate of the semi-implicit relative to the explicit formulation for practical climate resolutions by about a factor of three.  相似文献   

17.
《Computers & Structures》2002,80(16-17):1337-1350
The paper discusses the parallelisation of complex three-dimensional software for nonlinear analysis of R/C buildings structures. It presents a comparative study for handling the nonlinear response in different parallel architectures. The nonlinear finite element model adopts a fiber decomposition approach for the cross-section of beam elements to capture nonlinear behavior of concrete. The parallelisation strategy is designed regarding three items: the numerical stability of the nonlinear procedure, the parallel sparse equation solver and the application on heterogeneous hardware: dedicated shared memory machines or clusters of networked personal computers.  相似文献   

18.
The use of accelerators such as graphics processing units (GPUs) has become popular in scientific computing applications due to their low cost, impressive floating-point capabilities, high memory bandwidth, and low electrical power requirements. Hybrid high-performance computers, machines with more than one type of floating-point processor, are now becoming more prevalent due to these advantages. In this work, we discuss several important issues in porting a large molecular dynamics code for use on parallel hybrid machines – (1) choosing a hybrid parallel decomposition that works on central processing units (CPUs) with distributed memory and accelerator cores with shared memory, (2) minimizing the amount of code that must be ported for efficient acceleration, (3) utilizing the available processing power from both multi-core CPUs and accelerators, and (4) choosing a programming model for acceleration. We present our solution to each of these issues for short-range force calculation in the molecular dynamics package LAMMPS, however, the methods can be applied in many molecular dynamics codes. Specifically, we describe algorithms for efficient short range force calculation on hybrid high-performance machines. We describe an approach for dynamic load balancing of work between CPU and accelerator cores. We describe the Geryon library that allows a single code to compile with both CUDA and OpenCL for use on a variety of accelerators. Finally, we present results on a parallel test cluster containing 32 Fermi GPUs and 180 CPU cores.  相似文献   

19.
This paper describes the parallel implementation of a numerical model for the simulation of problems from fluid dynamics on distributed memory multiprocessors. The basic procedure is to apply a fully explicit upwind finite difference approximation on a staggered grid. A theoretical time complexity analysis shows that a perfect speedup is achieved asympotically. Experimental results on the Intel Touchstone Delta System confirm the analytical performance model. © 1997 John Wiley & Sons, Ltd.  相似文献   

20.
This paper addresses the theoretical development and numerical implementation of energy consistent algorithms for dynamic elastoplasticity, emphasizing finite strain constitutive formulations so that unconditional stability of the algorithms is assured even in the fully nonlinear regime. The key concept behind energy consistency is the requirement that the discretized system obey an a priori stability estimate, which may be derived in general using the first and second laws of thermodynamics. This approach to computational dynamic plasticity differs from typical application of traditional algorithms (such as Newmark or Hilber–Hughes–Taylor-α methods), where local time integration schemes for plasticity laws are developed somewhat independently from the global time integration scheme for the equations of motion, without explicit consideration of thermodynamical restrictions. Two algorithms based on both additive and multiplicative finite deformation plasticity model are formulated within the energy consistent framework. Both algorithms possess the desirable feature of nonlinear stability of previous energy–momentum algorithms for elastodynamics.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号