首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 328 毫秒
1.
Recently much attention has been paid to high-performance computing and the development of parallel computational strategies and numerical algorithms for large-scale problems. In this present study, a finite element procedure for the dynamic analyses of anisotropic viscoelastic composite shell structures by using degenerated 3-D elements has been studied on vector and coarse grained and massively parallel machines. CRAY hardware performance monitors such as Flowtrace and Perftrace tools are used to obtain performance data for subroutine program modules and specified code segments. The performances of conjugated gradient method, the Cray sparse matrix solver and the Feable solver are evaluated. SIMD and MIMD parallel implementation of the finite element algorithm for dynamic simulation of viscoelastic composite structures on the CM-5 is also presented. The performance studies have been conducted in order to evaluate efficiency of the numerical algorithm on this architecture versus vector processing CRAY systems. Parametric studies on the CM-5 as well as the CRAY system and benchmarks for various problem sizes are shown. The second study is to evaluate how effectively the finite element procedures for viscoelastic composite structures can be solved in the Single Instruction Multiple Data (SIMD) parallel environment. CM-FORTRAN is used. A conjugate gradient method is employed for the solution of systems. In the third study, we propose to implement the finite element algorithm in a scalable distributed parallel environment using a generic message passing library such as PVM. The code is portable to a range of current and future parallel machines. We also introduced the domain decomposition scheme to reduce the communication time. The parallel scalability of the dynamic viscoelastic finite element algorithm in data parallel and scalable distributed parallel environments is also discussed. © 1997 by John Wiley & Sons, Ltd.  相似文献   

2.
This paper presents a scalable parallel variational time integration algorithm for nonlinear elastodynamics with the distinguishing feature of allowing each element in the mesh to have a possibly different time step. Furthermore, the algorithm is obtained from a discrete variational principle, and hence it is termed parallel asynchronous variational integrator (PAVI). The underlying variational structure grants it outstanding conservation properties. Based on a domain decomposition strategy, PAVI combines a careful scheduling of computations with fully asynchronous communications to provide a very efficient methodology for finite element models with even mild distributions of time step sizes. Numerical tests are shown to illustrate PAVI's performance on both slow and fast networks, showing scalability properties similar to the best parallel explicit synchronous algorithms, with lower execution time. Finally, a numerical example in which PAVI needs ≈100 times less computing than an explicit synchronous algorithm is shown. Copyright © 2006 John Wiley & Sons, Ltd.  相似文献   

3.
Abstract

To efficiently execute a finite element program on a hypercube, we need to map nodes of the corresponding finite element graph to processors of a hypercube such that each processor has approximately the same amount of computational load and the communication among processors is minimized. If the number of nodes of a finite element graph will not be increased during the execution of a program, the mapping only needs to be performed once. However, if a finite element graph is solution‐adaptive, that is, the number of nodes will be increased discretely due to the refinement of some finite elements during the execution of a program, a run‐time load balancing algorithm has to be performed many times in order to balance the computational load of processors while keeping the communication cost as low as possible. In this paper, we propose a parallel iterative load balancing algorithm (ILB) to deal with the load imbalancing problem of a solution‐adaptive finite element program. The proposed algorithm has three properties. First, the algorithm is simple and easy to be implemented. Second, the execution of the algorithm is fast. Third, it guarantees that the computational load will be balanced after the execution of the algorithm. We have implemented the proposed algorithm along with two parallel mapping algorithms, parallel orthogonal recursive bisection (ORB) [19] and parallel recursive mincut bipartitioning (MC) [8], on a 16‐node NCUBE‐2. Three criteria, the execution time of load balancing algorithms, the computation time of an application program under different load balancing algorithms, and the total execution time of an application program (under several refinement phases) are used for performance evaluation. Experimental results show that (1) the execution time of ILB is very short compared to those of MC and ORB; (2) the mappings produced by ILB are better than those of ORB and MC; and (3) the speedups produced by ILB are better than those of ORB and MC.  相似文献   

4.
The FETI algorithms are a family of numerically scalable substructuring methods with Lagrange multipliers that have been designed for solving iteratively large-scale systems of equations arising from the finite element discretization of structural engineering, solid mechanics, and structural dynamics problems. In this paper, we present a unified framework that simplifies the interpretation of several of the previously presented FETI concepts. This framework has enabled the improvement of the robustness and performance of the transient FETI method, and the design of a new family of coarse operators for iterative substructuring algorithms with Lagrange multipliers. We report on both of these new developments, discuss their impact on the iterative solution of large-scale finite element systems of equations by the FETI method, and illustrate them with a few static and dynamic structural analyses on an IBM SP2 parallel processor. © 1998 John Wiley & Sons, Ltd.  相似文献   

5.
A parallel implementation of the contact algorithm discussed in Part I of this paper has been developed for a non-linear dynamic explicit finite element program to analyse the response of three-dimensional shell structures. The parallel contact algorithm takes advantage of the fact that in general only some parts of the structure will actually be in contact at any given time. Special interprocessor communication routines and a method which enables individual processors to dynamically build local contact domains during execution have been developed. The performance of the parallel contact algorithm has been studied by executing the program on 128 processors of a distributed-memory multiprocessor computer.  相似文献   

6.
A framework for the construction of node-centred schemes to solve the compressible Euler and Navier–Stokes equations is presented. The metric quantities are derived by exploiting some properties of C0 finite element shape functions. The resulting algorithm allows to implement both artificial diffusion and one-dimensional upwind-type discretizations. The proposed methodology adopts a uniform data structure for diverse grid topologies (structured, unstructured and hybrid) and different element shapes, thus easing code development and maintenance. The final schemes are well suited to run on vector/parallel computer architectures. In the case of linear elements, the equivalence of the proposed method with a particular finite volume formulation is demonstrated.  相似文献   

7.
The storage requirements and performance consequences of a few different data parallel implementations of the finite element method for domains discretized by three-dimensional brick elements are reviewed. Letting a processor represent a nodal point per unassembled finite element yields a concurrency that may be one to two orders of magnitude higher for common elements than if a processor represents an unassembled finite element. The former representation also allows for higher order elements with a limited amount of storage per processor. A totally parallel stiffness matrix generation algorithm is presented. The equilibrium equations are solved by a conjugate gradient method with diagonal scaling. The results from several simulations designed to show the dependence of the number of iterations to convergence upon the Poisson ratio, the finite element discretization and the element order are reported. The domain was discretized by three-dimensional Lagrange elements in all cases. The number of iterations to convergence increases with the Poisson ratio. Increasing the number of elements in one special dimension increases the number of iterations to convergence, linearly. Increasing the element order p in one spatial dimension increases the number of iterations to convergence as pα, where α is 1·4–1·5 for the model problems.  相似文献   

8.
We present a new efficient and scalable domain decomposition method for solving implicitly linear and non-linear time-dependent problems in computational mechanics. The method is derived by adding a coarse problem to the recently proposed transient FETI substructuring algorithm in order to propagate the error globally and accelerate convergence. It is proved that in the limit for large time steps, the new method converges toward the FETI algorithm for time-independent problems. Computational results confirm that the optimal convergence properties of the time-independent FETI method are preserved in the time-dependent case. We employ an iterative scheme for solving efficiently the coarse problem on massively parallel processors, and demonstrate the effective scalability of the new transient FETI method with the large-scale finite element dynamic analysis on the Paragon XP/S and IBM SP2 systems of several diffraction grating finite element structural models. We also show that this new domain decomposition method outperforms the popular direct skyline solver. The coarse problem presented herein is applicable and beneficial to a large class of Lagrange multiplier based substructuring algorithms for time-dependent problems, including the fictitious domain decomposition method.  相似文献   

9.
Recently, graphics processing units (GPUs) have been increasingly leveraged in a variety of scientific computing applications. However, architectural differences between CPUs and GPUs necessitate the development of algorithms that take advantage of GPU hardware. As sparse matrix vector (SPMV) multiplication operations are commonly used in finite element analysis, a new SPMV algorithm and several variations are developed for unstructured finite element meshes on GPUs. The effective bandwidth of current GPU algorithms and the newly proposed algorithms are measured and analyzed for 15 sparse matrices of varying sizes and varying sparsity structures. The effects of optimization and differences between the new GPU algorithm and its variants are then subsequently studied. Lastly, both new and current SPMV GPU algorithms are utilized in the GPU CG solver in GPU finite element simulations of the heart. These results are then compared against parallel PETSc finite element implementation results. The effective bandwidth tests indicate that the new algorithms compare very favorably with current algorithms for a wide variety of sparse matrices and can yield very notable benefits. GPU finite element simulation results demonstrate the benefit of using GPUs for finite element analysis and also show that the proposed algorithms can yield speedup factors up to 12‐fold for real finite element applications. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

10.
The dual‐primal finite element tearing and interconnecting (FETI‐DP) domain decomposition method (DDM) is extended to address the iterative solution of a class of indefinite problems of the form ( K ?σ2 M ) u = f , and a class of complex problems of the form ( K ?σ2 M +iσ D ) u = f , where K , M , and D are three real symmetric matrices arising from the finite element discretization of solid and shell dynamic problems, i is the imaginary complex number, and σ is a real positive number. A key component of this extension is a new coarse problem based on the free‐space solutions of Navier's equations of motion. These solutions are waves, and therefore the resulting DDM is reminiscent of the FETI‐H method. For this reason, it is named here the FETI‐DPH method. For a practically large σ range, FETI‐DPH is shown numerically to be scalable with respect to all of the problem size, substructure size, and number of substructures. The CPU performance of this iterative solver is illustrated on a 40‐processor computing system with the parallel solution, for various σ ranges, of several large‐scale, indefinite, or complex‐valued systems of equations associated with shifted eigenvalue and forced frequency response structural dynamics problems. Copyright © 2005 John Wiley & Sons, Ltd.  相似文献   

11.
有限元并行计算中网格自动区域划分的研究   总被引:6,自引:0,他引:6  
本文针对集群系统下大规模并行有限元分析,设计了简单实用的网格自动区域划分算法,以使并行计算时减少由于负载不均衡所引起的效率下降,并应用于“汽车碰撞模拟”项目中的车架模型的分割。重点讨论并改进了贪婪算法和ANP算法,并比较了两种算法各自的特点及其适用性。通过数值算例证明,对于不同类型的有限元网格都得到了满意的结果,本文的算法具有广泛的适用性。且对该算法稍加改进,则可应用于各类动态并行计算问题所提出的动态负载均衡要求。  相似文献   

12.
This paper discusses the implementation aspects and our experiences towards a data parallel explicit self-starting finite element transient methodology with emphasis on the Connection Machine (CM-5) for linear and non-linear computational structural dynamic applications involving structured and unstructured grids. The parallel implementation criteria that influence the efficiency of an algorithm include the amount of communication, communication routing, and load balancing. To provide simplicity, high level of accuracy, and to retain the generality of the finite element implementation for both linear and non-linear transient explicit problems on a data parallel computer which permit optimum amount of communications, we implemented the present self-starting dynamic formulations (in comparison to the traditional approaches) based on nodal displacements, nodal velocities, and elemental stresses on the CM-5. Data parallel language CMFortran is employed with virtual processor constructs and with:SERIAL and:PARALLEL layout directives for arrays. The communications via the present approach involve only one gather operation (extraction of element nodal displacements or velocities from global displacement vector) and one scatter operation (dispersion of element forces onto global force vector) for each time step. These gather and scatter operations are implemented using the Connection Machine Scientific Software Library communication primitives for both structured and unstructured finite element meshes. The implementation aspects of the present self-starting formulations for linear and elastoplastic applications on serial and data parallel machines are discussed. Numerical test models for linear and non-linear one-dimensional applications and a two-dimensional unstructured finite element mesh are then illustrated and their performance studies are discussed.  相似文献   

13.
A Finite Element Graph (FEG) is defined here as a nodal graph (G), a dual graph (G*), or a communication graph (G˙) associated with a generic finite element mesh. The Laplacian matrix ( L (G), L (G*) or L (G˙)), used for the study of spectral properties of an FEG, is constructed from usual vertex and edge connectivities of a graph. An automatic algorithm, based on spectral properties of an FEG (G, G* or G˙), is proposed to reorder the nodes and/or elements of the associated finite element mesh. The new algorithm is called Spectral PEG Resequencing (SFR). This algorithm uses global information in the graph, it does not depend on a pseudoperipheral vertex in the resequencing process, and it does not use any kind of level structure of the graph. Moreover, the SFR algorithm is of special advantage in computing environments with vector and parallel processing capabilities. Nodes or elements in the mesh can be reordered depending on the use of an adequate graph representation associated with the mesh. If G is used, then the nodes in the mesh are properly reordered for achieving profile and wavefront reduction of the finite element stiffness matrix. If either G* or G˙ is used, then the elements in the mesh are suitably reordered for a finite element frontai solver, A unified approach involving FEGs and finite element concepts is presented. Conclusions are inferred and possible extensions of this research are pointed out. In Part II of this work,1 the computational implementation of the SFR algorithm is described and several numerical examples are presented. The examples emphasize important theoretical, numerical and practical aspects of the new resequencing method.  相似文献   

14.
A unified framework of dual‐primal finite element tearing and interconnecting (FETI‐DP) algorithms is proposed for solving the system of linear equations arising from the mixed finite element approximation of incompressible Stokes equations. A distinctive feature of this framework is that it allows using both continuous and discontinuous pressures in the algorithm, whereas previous FETI‐DP methods only apply to discontinuous pressures. A preconditioned conjugate gradient method is used in the algorithm with either a lumped or a Dirichlet preconditioner, and scalable convergence rates are proved. This framework is also used to describe several previously developed FETI‐DP algorithms and greatly simplifies their analysis. Numerical experiments of solving a two‐dimensional incompressible Stokes problem demonstrate the performances of the discussed FETI‐DP algorithms represented under the same framework.Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

15.
张衡  张武 《工程数学学报》2007,24(6):1080-1090
基于并行计算的分治思想,对块三对角线性方程组的求解提出了一个块重叠分割无通信的高效可扩展并行算法(PBOPUC算法)。当系统严格块对角占优时,在机器精度内,得到与精确解等价的近似解。通过精度分析,得到子方程组的阶数与精度的关系,并用它来控制精度和并行效率。本文的算法已经在上海大学的高性能并行计算机"自强3000"上实现,结果说明,并行计算效率接近100%,加速比几乎是线性的。  相似文献   

16.
An integrated framework and computational technology is described that addresses the issues to foster absolute scalability (A‐scalability) of the entire transient duration of the simulations of implicit non‐linear structural dynamics of large scale practical applications on a large number of parallel processors. Whereas the theoretical developments and parallel formulations were presented in Part 1, the implementation, validation and parallel performance assessments and results are presented here in Part 2 of the paper. Relatively simple numerical examples involving large deformation and elastic and elastoplastic non‐linear dynamic behaviour are first presented via the proposed framework for demonstrating the comparative accuracy of methods in comparison to available experimental results and/or results available in the literature. For practical geometrically complex meshes, the A‐scalability of non‐linear implicit dynamic computations is then illustrated by employing scalable optimal dissipative zero‐order displacement and velocity overshoot behaviour time operators which are a subset of the generalized framework in conjunction with numerically scalable spatial domain decomposition methods and scalable graph partitioning techniques. The constant run times of the entire simulation of ‘fixed‐memory‐use‐per‐processor’ scaling of complex finite element mesh geometries is demonstrated for large scale problems and large processor counts on at least 1024 processors. Copyright © 2003 John Wiley & Sons, Ltd.  相似文献   

17.
In finite element simulations, the overall computing time is dominated by the time needed to solve large sparse linear systems of equations. We report on the design and development of a parallel frontal code that can significantly reduce the wallclock time needed for the solution of these systems. The algorithm used is based on dividing the finite element domain into subdomains and applying the frontal method to each subdomain in parallel. The so‐called multiple front approach is shown to reduce the amount of work and memory required compared with the frontal method and, when run on a small number of processes, achieves good speedups. The code, HSL_MP42, has been developed for the Harwell Subroutine Library (http://www.numerical.rl.ac.uk/hsl). It is written in Fortran 90 and, by using MPI for message passing, achieves portability across a wide range of modern computer architectures. Copyright © 2001 John Wiley & Sons, Ltd.  相似文献   

18.
胡宁 《工程力学》1992,9(1):65-71
本文提出了一种求解大型结构动力方程的新的并行直接积分方法。该方法在L.Brusa和L.Nigro提出的一步(one-step)直接积分方法的基础上,引进并行运算步骤。并行运算步骤是通过将动力积分方程子结构化,同时进行组集和凝聚实现的。该方法在西安交通大学ELXSI-6400并行机上程序实现,计算结果表明能有效地求解大型结构有限元动力方程的并行直接积分方法。  相似文献   

19.
We consider the iterative solution by a class of substructuring methods of the large-scale systems of equations arising from the finite element discretization of structural models with an arbitrary set of linear multipoint constraints. We present a methodology for generalizing to such problems numerically scalable substructure based iterative solvers, without interfering with their formulations and their well-established local and global preconditioners. We apply this methodology to the FETI method, and show that the resulting algorithm is numerically scalable with respect to both the substructure and problem sizes. © 1998 John Wiley & Sons, Ltd.  相似文献   

20.
A two‐level domain decomposition method is introduced for general shape optimization problems constrained by the incompressible Navier–Stokes equations. The optimization problem is first discretized with a finite element method on an unstructured moving mesh that is implicitly defined without assuming that the computational domain is known and then solved by some one‐shot Lagrange–Newton–Krylov–Schwarz algorithms. In this approach, the shape of the domain, its corresponding finite element mesh, the flow fields and their corresponding Lagrange multipliers are all obtained computationally in a single solve of a nonlinear system of equations. Highly scalable parallel algorithms are absolutely necessary to solve such an expensive system. The one‐level domain decomposition method works reasonably well when the number of processors is not large. Aiming for machines with a large number of processors and robust nonlinear convergence, we introduce a two‐level inexact Newton method with a hybrid two‐level overlapping Schwarz preconditioner. As applications, we consider the shape optimization of a cannula problem and an artery bypass problem in 2D. Numerical experiments show that our algorithm performs well on a supercomputer with over 1000 processors for problems with millions of unknowns. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号