首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We present a multigrid approach for simulating elastic deformable objects in real time on recent NVIDIA GPU architectures. To accurately simulate large deformations we consider the co-rotated strain formulation. Our method is based on a finite element discretization of the deformable object using hexahedra. It draws upon recent work on multigrid schemes for the efficient numerical solution of partial differential equations on such discretizations. Due to the regular shape of the numerical stencil induced by the hexahedral regime, and since we use matrix-free formulations of all multigrid steps, computations and data layout can be restructured to avoid execution divergence of parallel running threads and to enable coalescing of memory accesses into single memory transactions. This enables to effectively exploit the GPU’s parallel processing units and high memory bandwidth via the CUDA parallel programming API. We demonstrate performance gains of up to a factor of 27 and 4 compared to a highly optimized CPU implementation on a single CPU core and 8 CPU cores, respectively. For hexahedral models consisting of as many as 269,000 elements our approach achieves physics-based simulation at 11 time steps per second.  相似文献   

2.
We present a GPU‐based streaming algorithm to perform high‐resolution and accurate cloth simulation. We map all the components of cloth simulation pipeline, including time integration, collision detection, collision response, and velocity updating to GPU‐based kernels and data structures. Our algorithm perform intra‐object and inter‐object collisions, handles contacts and friction, and is able to accurately simulate folds and wrinkles. We describe the streaming pipeline and address many issues in terms of obtaining high throughput on many‐core GPUs. In practice, our algorithm can perform high‐fidelity simulation on a cloth mesh with 2M triangles using 3GB of GPU memory. We highlight the parallel performance of our algorithm on three different generations of GPUs. On a high‐end NVIDIA Tesla K20c, we observe up to two orders of magnitude performance improvement as compared to a single‐threaded CPU‐based algorithm, and about one order of magnitude improvement over a 16‐core CPU‐based parallel implementation.  相似文献   

3.
In this paper, we present a novel parallel implementation of extrinsic initially rigid cohesive elements in an explicit finite element solver designed for the simulation of dynamic fracture events. The implementation is based on activating instead of inserting the cohesive elements and uses ParFUM, a parallel framework specifically developed for simulations involving unstructured meshes. Aspects of the parallel implementation are described, along with an analysis of its performance on 1 to 512 processors. Important cache effects and communication costs are included in this analysis. The implementation is validated by simulating the trapping of a crack along an inclined material interface.  相似文献   

4.
The method of discontinuous finite element discrete ordinates which involves inverting an operator by iteratively sweeping across a mesh from multiple directions is commonly used to solve the time-dependent particle transport equation. Graphics Processing Unit (GPU) provides great faculty in solving scientific applications. The particle transport with unstructured grid bringing forward several challenges while implemented on GPU. This paper presents an efficient implementation of particle transport with unstructured grid under 2D cylindrical Lagrange coordinates system on a fine-grained data level parallelism GPU platform from three aspects. The first one is determining the sweep order of elements from different angular directions. The second one is mapping the sweep calculation onto the GPU thread execution model. The last one is efficiently using the on-chip memory to improve performance. As to the authors? knowledge, this is the first implementation of a general purpose particle transport simulation with unstructured grid on GPU. Experimental results show that the performance speedup of NVIDIA M2050 GPU with double precision floating operations ranges from 11.03 to 17.96 compared with the serial implementation on Intel Xeon X5355 and Core Q6600.  相似文献   

5.
Application of biomechanical modeling techniques in the area of medical image analysis and surgical simulation implies two conflicting requirements: accurate results and high solution speeds. Accurate results can be obtained only by using appropriate models and solution algorithms. In our previous papers we have presented algorithms and solution methods for performing accurate nonlinear finite element analysis of brain shift (which includes mixed mesh, different non-linear material models, finite deformations and brain-skull contacts) in less than a minute on a personal computer for models having up to 50.000 degrees of freedom. In this paper we present an implementation of our algorithms on a Graphics Processing Unit (GPU) using the new NVIDIA Compute Unified Device Architecture (CUDA) which leads to more than 20 times increase in the computation speed. This makes possible the use of meshes with more elements, which better represent the geometry, are easier to generate, and provide more accurate results.  相似文献   

6.
空间插值是地理信息系统(GIS)空间分析中计算复杂且耗时的操作,因此无法满足实时性的要求。随着图形处理器(GPU)浮点计算能力的大幅提高,GPU通用计算已成为处理GIS领域内复杂计算的研究热点。为实时化一些传统低效的算法提供了良好的契机。利用GPU在并行计算上的优势,将反距离加权法插值算法映射到了统一计算设备架构(CUDA)并行编程架构。首先在GPU中建立二级索引使计算层次得到了合理的划分,然后利用多线程分块策略执行并行插值计算。最后通过实验表明,该方法的插值误差与CPU方法相比能控制在10-6数量级,并且在插值半径较大插值数据较多的情况下,该算法可达到40倍以上的加速比。充分证明了该方法的正确性及高效性。  相似文献   

7.
The one‐step leapfrog alternative‐direction‐implicit finite‐difference time‐domain (ADI‐FDTD), free from the Courant‐Friedrichs‐Lewy (CFL) stability condition and sub‐step computations, is efficient when dealing with fine grid problems. However, solution of the numerous tridiagonal systems still imposes a great computational burden and makes the method hard to execute in parallel. In this paper, we proposed an efficient graphic processing unit (GPU)‐based parallel implementation of the one‐step leapfrog ADI‐FDTD for the far‐field EM scattering simulation of objects, in which we present and analyze the manners of calculation area division and thread allocation and a data layout transformation of z components is proposed to achieve better memory access mode, which is a key factor affecting GPU execution efficiency. The simulation experiment is carried out to verify the accuracy and efficiency of the GPU‐based implementation. The simulation results show that there is a good agreement between the proposed one‐step leapfrog ADI‐FDTD method and Yee's FDTD in solving the far‐field scattering problem and huge benefits in performance were encountered when the method was accelerated using GPU technology.  相似文献   

8.
A software framework taking advantage of parallel processing capabilities of CPUs and GPUs is designed for the real‐time interactive cutting simulation of deformable objects. Deformable objects are modelled as voxels connected by links. The voxels are embedded in an octree mesh used for deformation. Cutting is performed by disconnecting links swept by the cutting tool and then adaptively refining octree elements near the cutting tool trajectory. A surface mesh used for visual display is reconstructed from disconnected links using the dual contour method. Spatial hashing of the octree mesh and topology‐aware interpolation of distance field are used for collision. Our framework uses a novel GPU implementation for inter‐object collision and object self collision, while tool‐object collision, cutting and deformation are assigned to CPU, using multiple threads whenever possible. A novel method that splits cutting operations into four independent tasks running in parallel is designed. Our framework also performs data transfers between CPU and GPU simultaneously with other tasks to reduce their impact on performances. Simulation tests show that when compared to three‐threaded CPU implementations, our GPU accelerated collision is 53–160% faster; and the overall simulation frame rate is 47–98% faster.  相似文献   

9.
Stress near a crack tip in plasticity was analyzed using three different finite element modelings; a constant strain triangle, eight-noded quadrilaterial, and a crack tip singularity element all considering viscoplasticity. The specimen under consideration was a center cracked plate made from IN-100, a nickel-base superalloy containing a half-crack length equal to 0.1367 in. (3.472 mm). An elastic solution was formulated in conjunction with two different loadings to generate plasticity. Fine mesh and coarse mesh solutions for the higher order elements were generated and compared considering equal number of degrees of freedom in two specific regions referred to as the near field and the far field regions.

The authors determined that the elements whose elastic solutions conformed best to linear elastic fracture mechanics predictions were the constant strain triangle and the eight-noded quadrilateral in a fine mesh. The crack tip element did not perform as well as expected. For the plastic analysis, the constant strain triangle exhibited the largest plastic region. The eight-noded isoparametric element came within 15% of the stress levels generated from the constant strain triangle. The stress singularity that is characteristic of the crack tip element forced that element to behave unnaturally stiff immediately adjacent to the crack tip.

Because it is not as stiff as either the crack tip element or the eight-noded element, the constant strain triangle offered the most promising solutions as verified through experimentation. It was therefore determined that the constant strain triangle offered the best solution to elastic-plastic finite element problems for the center cracked plate.  相似文献   


10.
The natural element method (NEM) is one of the members of the large family of meshless methods, with clear advantages over the finite element method (FEM) in problems involving large mesh distortions or complex geometries where the design of the mesh is costly. These problems are found in many applications like, for instance, simulation of biological structures involving soft tissues, such as, articular joints. One additional advantage of NEM is that it can be easily coupled with finite elements and implemented into any FE framework, including well-known commercial packages. NEM as most other spatial approximation approaches can be applied to evolution problems in two types of time (or pseudo-time) integration schemes, namely implicit and explicit. However, the NEM explicit version has neither been implemented nor sufficiently analyzed, so a comparative study of those two types of NEM time integration schemes is still missing. The main aim of this paper is to discuss issues related to NEM accuracy and stability in its explicit version, and problems related to its implementation into an explicit FE commercial code. Finally, a comparative study addressing the main properties, advantages and disadvantages of both types of NE schemes, implicit and explicit, is presented. Several examples of application are discussed including aspects where NEM is competitive with FEM including modeling of human articular joints like the knee. Explicit NEM allows achieving accurate results for high distortions and complex contact conditions although constraints on time step still are a major drawback and comparable to those known in finite elements to keep stability and accuracy despite the less NEM sensitivity to mesh distortion.  相似文献   

11.
The simulation of a manufacturing process chain with the finite element method requires the selection of an appropriate finite element solver, element type and mesh density for each process of the chain. When the simulation results of one step are needed in a subsequent one, they have to be interpolated and transferred to another model. This paper presents an in-core grid index that can be created on a mesh represented by a list of nodes/elements. Finite element data can thus be transferred across different models in a process chain by mapping nodes or elements in indexed meshes. For each nodal or integration point of the target mesh, the index on the source mesh is searched for a specific node or element satisfying certain conditions, based on the mapping method. The underlying space of an indexed mesh is decomposed into a grid of variable-sized cells. The index allows local searches to be performed in a small subset of the cells, instead of linear searches in the entire mesh which are computationally expensive. This work focuses on the implementation and computational efficiency of indexing, searching and mapping. An experimental evaluation on medium-sized meshes suggests that the combination of index creation and mapping using the index is much faster than mapping through sequential searches.  相似文献   

12.
Large-scale simulation of separation phenomena in solids such as fracture, branching, and fragmentation requires a scalable data structure representation of the evolving model. Modeling of such phenomena can be successfully accomplished by means of cohesive models of fracture, which are versatile and effective tools for computational analysis. A common approach to insert cohesive elements in finite element meshes consists of adding discrete special interfaces (cohesive elements) between bulk elements. The insertion of cohesive elements along bulk element interfaces for fragmentation simulation imposes changes in the topology of the mesh. This paper presents a unified topology-based framework for supporting adaptive fragmentation simulations, being able to handle two- and three-dimensional models, with finite elements of any order. We represent the finite element model using a compact and “complete” topological data structure, which is capable of retrieving all adjacency relationships needed for the simulation. Moreover, we introduce a new topology-based algorithm that systematically classifies fractured facets (i.e., facets along which fracture has occurred). The algorithm follows a set of procedures that consistently perform all the topological changes needed to update the model. The proposed topology-based framework is general and ensures that the model representation remains always valid during fragmentation, even when very complex crack patterns are involved. The framework correctness and efficiency are illustrated by arbitrary insertion of cohesive elements in various finite element meshes of self-similar geometries, including both two- and three-dimensional models. These computational tests clearly show linear scaling in time, which is a key feature of the present data-structure representation. The effectiveness of the proposed approach is also demonstrated by dynamic fracture analysis through finite element simulations of actual engineering problems.
Glaucio H. PaulinoEmail:
  相似文献   

13.
Efficient SIMP and level set based topology optimization schemes are proposed based on the computation framework of the multiscale finite element method (MsFEM). In the proposed optimization schemes, the equilibrium equations are solved on a coarse-scale mesh and the design variables are updated on a fine-scale mesh. To describe more complex deformation, a multi-node coarse element is also presented in the MsFEM computation. In the MsFEM, a multiscale shape function is constructed numerically and employed to obtain the equivalent stiffness matrix and load vector of the multi-node coarse element. In the optimization schemes with the MsFEM, the coarse elements are divided into two categories: homogeneous and heterogeneous. For the homogeneous coarse elements, their multiscale shape functions are constructed only once before the iterations. Since the material distribution is varying locally in most of the iterations, one only needs to reconstruct them of a small part of the coarse elements where the material distribution is changed by comparison with that in the previous iteration step. This will save lots of computational cost. In addition, due to the independence of each coarse element, the constructions of the multiscale shape functions could be easily proceeded in parallel. In this work, the computational accuracy and efficiency of this method is investigated in detail, as well as the speedup ratio and parallel efficiency when using multiple processors to construct the multiscale shape functions simultaneously. Furthermore, several 2D and 3D examples show the effectiveness and efficiency of the proposed optimization schemes based on the MsFEM analysis framework.  相似文献   

14.
The basis of mapped finite element methods are reference elements where the components of a local finite element are defined. The local finite element on an arbitrary mesh cell will be given by a map from the reference mesh cell. This paper describes some concepts of the implementation of mapped finite element methods. From the definition of mapped finite elements, only local degrees of freedom are available. These local degrees of freedom have to be assigned to the global degrees of freedom which define the finite element space. We will present an algorithm which computes this assignment. The second part of the paper shows examples of algorithms which are implemented with the help of mapped finite elements. In particular, we explain how the evaluation of integrals and the transfer between arbitrary finite element spaces can be implemented easily and computed efficiently. Communicated by: M.S. Espedal, A. Quarteroni, A. Sequeira  相似文献   

15.
The basis of mapped finite element methods are reference elements where the components of a local finite element are defined. The local finite element on an arbitrary mesh cell will be given by a map from the reference mesh cell. This paper describes some concepts of the implementation of mapped finite element methods. From the definition of mapped finite elements, only local degrees of freedom are available. These local degrees of freedom have to be assigned to the global degrees of freedom which define the finite element space. We will present an algorithm which computes this assignment. The second part of the paper shows examples of algorithms which are implemented with the help of mapped finite elements. In particular, we explain how the evaluation of integrals and the transfer between arbitrary finite element spaces can be implemented easily and computed efficiently.  相似文献   

16.
《国际计算机数学杂志》2012,89(11):2308-2325
The goal of this article is to study the boundary layers of reaction–diffusion equations in a circle and provide some numerical applications which utilize the so-called boundary layer elements. Via the boundary layer analysis, we obtain the valid asymptotic expansions at any order and devise boundary layer elements to be conveniently used in the finite element schemes. Using boundary layer elements incorporated in the finite element space, we obtain accurate numerical solutions in a quasi-uniform mesh with convergence of order 2.  相似文献   

17.
A Modern Graphics Processing unit (GPU) is able to perform massively parallel scientific computations at low cost. We extend our implementation of the checkerboard algorithm for the two-dimensional Ising model [T. Preis et al., Journal of Chemical Physics 228 (2009) 4468-4477] in order to overcome the memory limitations of a single GPU which enables us to simulate significantly larger systems. Using multi-spin coding techniques, we are able to accelerate simulations on a single GPU by factors up to 35 compared to an optimized single Central Processor Unit (CPU) core implementation which employs multi-spin coding. By combining the Compute Unified Device Architecture (CUDA) with the Message Parsing Interface (MPI) on the CPU level, a single Ising lattice can be updated by a cluster of GPUs in parallel. For large systems, the computation time scales nearly linearly with the number of GPUs used. As proof of concept we reproduce the critical temperature of the 2D Ising model using finite size scaling techniques.  相似文献   

18.
This work parallelized a widely used structural analysis platform called OpenSees using graphical processing units (GPU). This paper presents task decomposition diagrams with data flow and the sequential and parallel flowcharts for element matrix/vector calculations. It introduces a Bulk Model to ease the parallelization of the element matrix/vector calculations. An implementation of this model for shell elements is presented. Three versions of the Bulk Model—sequential, OpenMP multi-threaded, and CUDA GPU parallelized—were implemented in this work. Nonlinear dynamic analyses of two building models subjected to a tri-axial earthquake were tested. The results demonstrate speedups higher than four on a 4-core system, while the GPU parallelism achieves speedups higher than 7.6 on a single GPU device in comparison to the original sequential implementation.  相似文献   

19.
A finite element method often leads to large sparse symmetric and positive definite systems of linear equations. We consider parallel solvers based on the Schur complement method on homogeneous parallel machines with distributed memory. A finite element mesh is partitioned by graph partitioning. Such partitioning results in submeshes with similar numbers of elements and, consequently, submatrices of similar sizes. The submatrices are partially factorised. The time spent on the partial factorisation can be different, i.e., disbalanced, because methods exploiting the sparsity of submatrices are used. This paper proposes a Quality Balancing heuristic that modifies classic mesh partitioning so that the partial factorisation times are balanced, which saves overall computation time, especially for time dependent mechanical and nonstationary transport problems.  相似文献   

20.
Graphics processing units (GPUs) offer parallel computing power that usually requires a cluster of networked computers or a supercomputer to accomplish. While writing kernel code is fairly straightforward, achieving efficiency and performance requires very careful optimisation decisions and changes to the original serial algorithm. We introduce a parallel canonical ensemble Monte Carlo (MC) simulation that runs entirely on the GPU. In this paper, we describe two MC simulation codes of Lennard-Jones particles in the canonical ensemble, a single CPU core and a parallel GPU implementations. Using Compute Unified Device Architecture, the parallel implementation enables the simulation of systems containing over 200,000 particles in a reasonable amount of time, which allows researchers to obtain more accurate simulation results. A remapping algorithm is introduced to balance the load of the device resources and demonstrate by experimental results that the efficiency of this algorithm is bounded by available GPU resource. Our parallel implementation achieves an improvement of up to 15 times on a commodity GPU over our efficient single core implementation for a system consisting of 256k particles, with the speedup increasing with the problem size. Furthermore, we describe our methods and strategies for optimising our implementation in detail.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号