首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Recently, graphics processing units (GPUs) have been increasingly leveraged in a variety of scientific computing applications. However, architectural differences between CPUs and GPUs necessitate the development of algorithms that take advantage of GPU hardware. As sparse matrix vector (SPMV) multiplication operations are commonly used in finite element analysis, a new SPMV algorithm and several variations are developed for unstructured finite element meshes on GPUs. The effective bandwidth of current GPU algorithms and the newly proposed algorithms are measured and analyzed for 15 sparse matrices of varying sizes and varying sparsity structures. The effects of optimization and differences between the new GPU algorithm and its variants are then subsequently studied. Lastly, both new and current SPMV GPU algorithms are utilized in the GPU CG solver in GPU finite element simulations of the heart. These results are then compared against parallel PETSc finite element implementation results. The effective bandwidth tests indicate that the new algorithms compare very favorably with current algorithms for a wide variety of sparse matrices and can yield very notable benefits. GPU finite element simulation results demonstrate the benefit of using GPUs for finite element analysis and also show that the proposed algorithms can yield speedup factors up to 12‐fold for real finite element applications. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

2.
Most of the recently proposed computational methods for solving partial differential equations on multiprocessor architectures stem from the 'divide and conquer' paradigm and involve some form of domain decomposition. For those methods which also require grids of points or patches of elements, it is often necessary to explicitly partition the underlying mesh, especially when working with local memory parallel processors. In this paper, a family of cost-effective algorithms for the automatic partitioning of arbitrary two- and three-dimensional finite element and finite difference meshes is presented and discussed in view of a domain decomposed solution procedure and parallel processing. The influence of the algorithmic aspects of a solution method (implicit/explicit computations), and the architectural specifics of a multiprocessor (SIMD/MIMD, startup/transmission time), on the design of a mesh partitioning algorithm are discussed. The impact of the partitioning strategy on load balancing, operation count, operator conditioning, rate of convergence and processor mapping is also addressed. Finally, the proposed mesh decomposition algorithms are demonstrated with realistic examples of finite element, finite volume, and finite difference meshes associated with the parallel solution of solid and fluid mechanics problems on the iPSC/2 and iPSC/860 multiprocessors.  相似文献   

3.
We present, discuss, and report on the performance of two combined parallel/vector frontal algorithms which have been incorporated in a production finite element code. Two parallelization strategies are described. The first approach is algebra driven and is recommended for the solution of problems with a large bandwidth on a coarse grain configuration. The second strategy is targeted for finer grain systems; it blends the first one with a substructuring technique that is based on a careful partitioning of the finite element mesh into a series of subdomains. Using only 4 IBM 3090/VF processors, the proposed algorithms are shown to deliver speed-ups as high as 17 with respect to a serial non-vectorized frontal solver.  相似文献   

4.
Multigrid is a popular solution method for the set of linear algebraic equations that arise from PDEs discretized with the finite element method. The application of multigrid to unstructured grid problems, however, is not well developed. We discuss a method, that uses many of the same techniques as the finite element method itself, to apply standard multigrid algorithms to unstructured finite element problems. We use maximal independent sets (MISs) as a mechanism to automatically coarsen unstructured grids; the inherent flexibility in the selection of an MIS allows for the use of heuristics to improve their effectiveness for a multigrid solver. We present parallel algorithms, based on geometric heuristics, to optimize the quality of MISs and the meshes constructed from them, for use in multigrid solvers for 3D unstructured problems. We discuss parallel issues of our algorithms, multigrid solvers in general, and the parallel finite element application that we have developed to test our solver on challenging problems. We show that our solver, and parallel finite element architecture, does indeed scale well, with test problems in 3D large deformation elasticity and plasticity, with 40 million degree of freedom problem on 240 IBM four‐way SMP PowerPC nodes. Copyright © 2000 John Wiley & Sons, Ltd.  相似文献   

5.
Recently much attention has been paid to high-performance computing and the development of parallel computational strategies and numerical algorithms for large-scale problems. In this present study, a finite element procedure for the dynamic analyses of anisotropic viscoelastic composite shell structures by using degenerated 3-D elements has been studied on vector and coarse grained and massively parallel machines. CRAY hardware performance monitors such as Flowtrace and Perftrace tools are used to obtain performance data for subroutine program modules and specified code segments. The performances of conjugated gradient method, the Cray sparse matrix solver and the Feable solver are evaluated. SIMD and MIMD parallel implementation of the finite element algorithm for dynamic simulation of viscoelastic composite structures on the CM-5 is also presented. The performance studies have been conducted in order to evaluate efficiency of the numerical algorithm on this architecture versus vector processing CRAY systems. Parametric studies on the CM-5 as well as the CRAY system and benchmarks for various problem sizes are shown. The second study is to evaluate how effectively the finite element procedures for viscoelastic composite structures can be solved in the Single Instruction Multiple Data (SIMD) parallel environment. CM-FORTRAN is used. A conjugate gradient method is employed for the solution of systems. In the third study, we propose to implement the finite element algorithm in a scalable distributed parallel environment using a generic message passing library such as PVM. The code is portable to a range of current and future parallel machines. We also introduced the domain decomposition scheme to reduce the communication time. The parallel scalability of the dynamic viscoelastic finite element algorithm in data parallel and scalable distributed parallel environments is also discussed. © 1997 by John Wiley & Sons, Ltd.  相似文献   

6.
The numerical solution of Maxwell's curl equations in the time domain is achieved by combining an unstructured mesh finite element algorithm with a cartesian finite difference method. The practical problem area selected to illustrate the application of the approach is the simulation of three‐dimensional electromagnetic wave scattering. The scattering obstacle and the free space region immediately adjacent to it are discretized using an unstructured mesh of linear tetrahedral elements. The remainder of the computational domain is filled with a regular cartesian mesh. These two meshes are overlapped to create a hybrid mesh for the numerical solution. On the cartesian mesh, an explicit finite difference method is adopted and an implicit/explicit finite element formulation is employed on the unstructured mesh. This approach ensures that computational efficiency is maintained if, for any reason, the generated unstructured mesh contains elements of a size much smaller than that required for accurate wave propagation. A perfectly matched layer is added at the artificial far field boundary, created by the truncation of the physical domain prior to the numerical solution. The complete solution approach is parallelized, to enable large‐scale simulations to be effectively performed. Examples are included to demonstrate the numerical performance that can be achieved. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

7.
This paper presents a parallel implementation of the finite element method designed for coarse-grain distributed memory architectures. The MPI standard is used for message passing and tests are run on a PC cluster and on an SGI Altix 350. Compressed data structures are employed to store the coefficient matrix and obtain iterative solutions, based on Krylov methods, in a subdomain-by-subdomain approach. Two mesh partitioning schemes are compared: non-overlapping and overlapping. The pros and cons of these partitioning methods are discussed. Numerical examples of symmetric and non-symmetric problems in two and three dimensions are presented.  相似文献   

8.
This paper describes a neural network graph partitioning algorithm which partitions unstructured finite element/volume meshes as a precursor to a parallel domain decomposition solution method. The algorithm works by first constructing a coarse graph approximation using an automatic graph coarsening method. The coarse graph is partitioned and the results are interpolated onto the original graph to initialize an optimization of the graph partition problem. In practice, a hierarchy of (usually more than two) graphs are used to help obtain the final graph partition. A mean field theorem neural network is used to perform all partition optimization. The partitioning method is applied to graphs derived from unstructured finite element meshes and in this context it can be viewed as a multi‐grid partitioning method. Copyright © 1999 John Wiley & Sons, Ltd.  相似文献   

9.
One of the major problems in fluid–structure interaction using the arbitrary Lagrangian Eulerian approach lies in the area of dynamic mesh generation. For accurate fluid-dynamic computations, meshes must be generated at each time step. The fluid mesh must be regenerated in the deformed fluid domain in order to account for the displacements of the elastic body computed by the structural dynamics solver. In the elasticity-based computational dynamic mesh procedure, the fluid mesh is modeled as a pseudo-elastic solid the deformation of which is based on the displacement boundary conditions, resulting from the solution of the computational structural dynamics problem. This approach has a distinct advantage over other mesh-movement algorithms in that it is a very general, physically based approach that can be applied to both structured and unstructured meshes. The major drawback of the linear elastostatic solver is that it does not guarantee the absence of severe element distortion. This paper describes a novel mesh-movement procedure for mesh quality control of 2-D and 3-D dynamic meshes based on solving a pseudo-nonlinear elastostatic problem. An inexpensive distortion measure for different types of elements is introduced and used for controlling the element shape quality. The mesh-movement procedure is illustrated with several examples (large-displacement and free-boundary problems) that highlight its advantages in terms of performance, mesh quality, and robustness. It is believed that the resulting scheme will result in a more economical simulation of the motion of complex geometry, 3-D elastic bodies immersed in temporally and spatially evolving flows. Received 20 April 2000  相似文献   

10.
Finite element discretizations of flow problems involving multiaquifer systems deliver large, sparse, unstructured matrices, whose partial eigenanalysis is important for both solving the flow problem and analysing its main characteristics. We studied and implemented an effective preconditioning of the Jacobi–Davidson algorithm by FSAI‐type preconditioners. We developed efficient parallelization strategies in order to solve very large problems, which could not fit into the storage available to a single processor. We report our results about the solution of multiaquifer flow problems on an SP4 machine and a Linux Cluster. We analyse the sequential and parallel efficiency of our algorithm, also compared with standard packages. Questions regarding the parallel solution of finite element eigenproblems are addressed and discussed. Copyright © 2005 John Wiley & Sons, Ltd.  相似文献   

11.
 This work presents a novel iterative approach for mesh partitioning optimization to promote the efficiency of parallel nonlinear dynamic finite element analysis with the direct substructure method, which involves static condensation of substructures' internal degrees of freedom. The proposed approach includes four major phases – initial partitioning, substructure workload prediction, element weights tuning, and partitioning results adjustment. The final three phases are performed iteratively until the workloads among the substructures are balanced reasonably. A substructure workload predictor that considers the sparsity and ordering of the substructure matrix is used in the proposed approach. Several numerical experiments conducted herein reveal that the proposed iterative mesh partitioning optimization often results in a superior workload balance among substructures and reduces the total elapsed time of the corresponding parallel nonlinear dynamic finite element analysis. Received 22 August 2001 / Accepted 20 January 2002  相似文献   

12.
In this work we present a parallel solver for the Poisson equation for 3D abrupt heterojunction bipolar transistors (HBT). Three‐dimensional simulation is essential for studying devices of small geometry as in the case we have studied. We have used an unstructured tetrahedral mesh and we have applied the finite method element (FEM), making a specific formulation for the nodes located on the interface of the regions with different characteristics. For HBT devices, it is necessary to take into account that on both sides of the interface between the different regions exist materials with different properties. Our formulation implies situating pairs of nodes in the same physical positions of the interface, associating each nodes to a region of the HBT. This way, the effects due to thermionic emission and the tunnel effect may be simulated when the Poisson and the electron and hole equations are solved in an abrupt HBT. We have applied domain decomposition methods to solve the associate linear systems. This code has been implemented for distributed memory multicomputers, making use of a message passing standard library, MPI. Copyright © 2000 John Wiley & Sons, Ltd.  相似文献   

13.
In this paper we propose an unstructured hybrid tessellation of a scattered point set that minimally covers the proximal space around each point. The mesh is automatically obtained in a bounded period of time by transforming an initial Delaunay tessellation. Novel types of polygonal interpolants are used for interpolation applications and the geometric qualities of the elements make them also useful for discretization schemes. The approach proves to be superior to classical Delaunay one in a finite element context.  相似文献   

14.
This paper discusses the implementation aspects and our experiences towards a data parallel explicit self-starting finite element transient methodology with emphasis on the Connection Machine (CM-5) for linear and non-linear computational structural dynamic applications involving structured and unstructured grids. The parallel implementation criteria that influence the efficiency of an algorithm include the amount of communication, communication routing, and load balancing. To provide simplicity, high level of accuracy, and to retain the generality of the finite element implementation for both linear and non-linear transient explicit problems on a data parallel computer which permit optimum amount of communications, we implemented the present self-starting dynamic formulations (in comparison to the traditional approaches) based on nodal displacements, nodal velocities, and elemental stresses on the CM-5. Data parallel language CMFortran is employed with virtual processor constructs and with:SERIAL and:PARALLEL layout directives for arrays. The communications via the present approach involve only one gather operation (extraction of element nodal displacements or velocities from global displacement vector) and one scatter operation (dispersion of element forces onto global force vector) for each time step. These gather and scatter operations are implemented using the Connection Machine Scientific Software Library communication primitives for both structured and unstructured finite element meshes. The implementation aspects of the present self-starting formulations for linear and elastoplastic applications on serial and data parallel machines are discussed. Numerical test models for linear and non-linear one-dimensional applications and a two-dimensional unstructured finite element mesh are then illustrated and their performance studies are discussed.  相似文献   

15.
We address the problem of automatic partitioning of unstructured finite element meshes in the context of parallel numerical algorithms based on domain decomposition. A two-step approach is proposed, which combines a direct partitioning scheme with a non-deterministic procedure of combinatorial optimization. In contrast with previously published experiments with non-deterministic heuristics, the optimization step is shown to produce high-quality decompositions at a reasonable compute cost. We also show that the optimization approach can accommodate complex topological constraints and minimization objectives. This is illustrated by considering the particular case of topologically one-dimensional partitions, as well as load balancing of frontal subdomain solvers. Finally, the optimization procedure produces, in most cases, decompositions endowed with geometrically smooth interfaces. This contrasts with available partitioning schemes, and is crucial to some modern numerical techniques based on domain decomposition and a Lagrange multiplier treatment of the interface conditions.  相似文献   

16.
This paper presents two immersed finite element (IFE) methods for solving the elliptic interface problem arising from electric field simulation in composite materials. The meshes used in these IFE methods can be independent of the interface geometry and position; therefore, if desired, a structured mesh such as a Cartesian mesh can be used in an IFE method to simulate 3‐D electric field in a domain with non‐trivial interfaces separating different materials. Numerical examples are provided to demonstrate that the accuracies of these IFE methods are comparable to the standard linear finite element method with unstructured body‐fit mesh. Copyright © 2005 John Wiley & Sons, Ltd.  相似文献   

17.
有限元网格修正的自适应分析及其应用   总被引:1,自引:0,他引:1  
本文在对有限元变量连续条件分析的基础上,将应力误差范数用于计算结果的误差估计,使非结构化网格生成系统与有限元计算有机地结合起来,并将网格单元修正的自适应分析应用于二维应力集中问题的研究,从而实现了有限元最佳化离散,提高了有限元数值求解的可靠性和近似程度。  相似文献   

18.
A two‐level domain decomposition method is introduced for general shape optimization problems constrained by the incompressible Navier–Stokes equations. The optimization problem is first discretized with a finite element method on an unstructured moving mesh that is implicitly defined without assuming that the computational domain is known and then solved by some one‐shot Lagrange–Newton–Krylov–Schwarz algorithms. In this approach, the shape of the domain, its corresponding finite element mesh, the flow fields and their corresponding Lagrange multipliers are all obtained computationally in a single solve of a nonlinear system of equations. Highly scalable parallel algorithms are absolutely necessary to solve such an expensive system. The one‐level domain decomposition method works reasonably well when the number of processors is not large. Aiming for machines with a large number of processors and robust nonlinear convergence, we introduce a two‐level inexact Newton method with a hybrid two‐level overlapping Schwarz preconditioner. As applications, we consider the shape optimization of a cannula problem and an artery bypass problem in 2D. Numerical experiments show that our algorithm performs well on a supercomputer with over 1000 processors for problems with millions of unknowns. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

19.
 A meshless modeling procedure of three-dimensional targets for penetration analysis on parallel computing systems is described. Buried structures are modeled by arbitrary layers of concrete and geologic materials, and the projectile is modeled by standard finite elements. Penetration resistance of the buried structure is provided by functions derived from principles of dynamic cavity expansion. The resistance functions are influenced by the target material properties and projectile kinematics. Additional capabilities accommodate the varying structural and geometrical characteristics of the target. Coupling between the finite elements and the meshless target model is made by applying resistance loads to elements on the outer surface of the projectile mesh. Penetration experiments verify the approach. In this manner, the target is effectively modeled and the strategy is well suited for parallel processing. The procedure is incorporated into an explicit transient dynamics code, using mesh partitioning for a coarse grain parallel processing paradigm. Message Passing Interface (MPI) is used for all interprocessor communication. Large detailed finite element analyses of projectiles are performed on up to several hundred processors with excellent scalability. The efficiency of the strategy is demonstrated by analyses executed on several types of scalable computing platforms.  相似文献   

20.
Many resequencing algorithms for reducing the bandwidth, profile and wavefront of sparse symmetric matrices have been published. In finite element applications, the sparsity of a matrix is related to the nodal ordering of the finite element mesh. Some of the most successful algorithms, which are based on graph theory, require a pair of starting pseudoperipheral nodes. These nodes, located at nearly maximal distance apart, are determined using heuristic schemes. This paper presents an alternative pseadoperipheral node finder, which is based on the algorithm developed by Gibbs, Poole and Stockmeyer. This modified scheme is suitable for nodal reordering of finite meshes and provides more consistency in the effective selection of the starting nodes in problems where the selection becomes arbitrary due to the number of candidates for these starting nodes. This case arises, in particular, for square meshes. The modified scheme was implemented in Gibbs-Poole-Stockmeyer, Gibbs-King and Sloan algorithms. Test problems of these modified algorithms include: (1) Everstine's 30 benchmark problems; (2) sets of square, rectangular and annular (cylindrical) finite element meshes with quadrilateral and triangular elements; and (3) additional examples originating from mesh refinement schemes. The results demonstrate that the modifications to the original algorithms contribute to the improvement of the reliability of all the resequencing algorithms tested herein for the nodal reordering of finite element meshes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号