首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This paper presents techniques for generating very large finite‐element matrices on a multicore workstation equipped with several graphics processing units (GPUs). To overcome the low memory size limitation of the GPUs, and at the same time to accelerate the generation process, we propose to generate the large sparse linear systems arising in finite‐element analysis in an iterative manner on several GPUs and to use the graphics accelerators concurrently with CPUs performing collection and addition of the matrix fragments using a fast multithreaded procedure. The scheduling of the threads is organized in such a way that the CPU operations do not affect the performance of the process, and the GPUs are idle only when data are being transferred from GPU to CPU. This approach is verified on two workstations: the first consists of two 6‐core Intel Xeon X5690 processors with two Fermi GPUs: each GPU is a GeForce GTX 590 with two graphics processors and 1.5 GB of fast RAM; the second workstation is equipped with two Tesla C2075 boards carrying 6 GB of RAM each and two 12‐core Opteron 6174s. For the latter setup, we demonstrate the fast generation of sparse finite‐element matrices as large as 10 million unknowns, with over 1 billion nonzero entries. Comparing with the single‐threaded and multithreaded CPU implementations, the GPU‐based version of the algorithm based on the ideas presented in this paper reduces the finite‐element matrix‐generation time in double precision by factors of 100 and 30, respectively. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

2.
Recently, the application of graphics processing units (GPUs) to scientific computations is attracting a great deal of attention, because GPUs are getting faster and more programmable. In particular, NVIDIA's GPUs called compute unified device architecture enable highly mutlithreaded parallel computing for non‐graphic applications. This paper proposes a novel way to accelerate the boundary element method (BEM) for three‐dimensional Helmholtz' equation using CUDA. Adopting the techniques for the data caching and the double–single precision floating‐point arithmetic, we implemented a GPU‐accelerated BEM program for GeForce 8‐series GPUs. The program performed 6–23 times faster than a normal BEM program, which was optimized for an Intel's quad‐core CPU, for a series of boundary value problems with 8000–128000 unknowns, and it sustained a performance of 167 Gflop/s for the largest problem (1 058 000 unknowns). The accuracy of our BEM program was almost the same as that of the regular BEM program using the double precision floating‐point arithmetic. In addition, our BEM was applicable to solve realistic problems. In conclusion, the present GPU‐accelerated BEM works rapidly and precisely for solving large‐scale boundary value problems for Helmholtz' equation. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

3.
The existing global–local multiscale computational methods, using finite element discretization at both the macro‐scale and micro‐scale, are intensive both in terms of computational time and memory requirements and their parallelization using domain decomposition methods incur substantial communication overhead, limiting their application. We are interested in a class of explicit global–local multiscale methods whose architecture significantly reduces this communication overhead on massively parallel machines. However, a naïve task decomposition based on distributing individual macro‐scale integration points to a single group of processors is not optimal and leads to communication overheads and idling of processors. To overcome this problem, we have developed a novel coarse‐grained parallel algorithm in which groups of macro‐scale integration points are distributed to a layer of processors. Each processor in this layer communicates locally with a group of processors that are responsible for the micro‐scale computations. The overlapping groups of processors are shown to achieve optimal concurrency at significantly reduced communication overhead. Several example problems are presented to demonstrate the efficiency of the proposed algorithm. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

4.
Recently, graphics processing units (GPUs) have been increasingly leveraged in a variety of scientific computing applications. However, architectural differences between CPUs and GPUs necessitate the development of algorithms that take advantage of GPU hardware. As sparse matrix vector (SPMV) multiplication operations are commonly used in finite element analysis, a new SPMV algorithm and several variations are developed for unstructured finite element meshes on GPUs. The effective bandwidth of current GPU algorithms and the newly proposed algorithms are measured and analyzed for 15 sparse matrices of varying sizes and varying sparsity structures. The effects of optimization and differences between the new GPU algorithm and its variants are then subsequently studied. Lastly, both new and current SPMV GPU algorithms are utilized in the GPU CG solver in GPU finite element simulations of the heart. These results are then compared against parallel PETSc finite element implementation results. The effective bandwidth tests indicate that the new algorithms compare very favorably with current algorithms for a wide variety of sparse matrices and can yield very notable benefits. GPU finite element simulation results demonstrate the benefit of using GPUs for finite element analysis and also show that the proposed algorithms can yield speedup factors up to 12‐fold for real finite element applications. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

5.
We consider the numerical approximation of singularly perturbed problems, and in particular reaction–diffusion problems, by the h version of the finite element method. We present guidelines on how to design non‐uniform meshes both in one and two dimensions that are asymptotically optimal as the meshwidth tends to zero. We also present the results of numerical computations showing that robust, optimal rates can be achieved even in the pre‐asymptotic range. Copyright © 2001 John Wiley & Sons, Ltd.  相似文献   

6.
In this paper, a comprehensive account on using mesh‐free methods to simulate strain localization in inelastic solids is presented. Using an explicit displacement‐based formulation in mesh‐free computations, high‐resolution shear‐band formations are obtained in both two‐dimensional (2‐D) and three‐dimensional (3‐D) simulations without recourse to any mixed formulation, discontinuous/incompatible element or special mesh design. The numerical solutions obtained here are insensitive to the orientation of the particle distributions if the local particle distribution is quasi‐uniform, which, to a large extent, relieves the mesh alignment sensitivity that finite element methods suffer. Moreover, a simple h‐adaptivity procedure is implemented in the explicit calculation, and by utilizing a mesh‐free hierarchical partition of unity a spectral (wavelet) adaptivity procedure is developed to seek high‐resolution shear‐band formations. Moreover, the phenomenon of multiple shear band and mode switching are observed in numerical computations with a relatively coarse particle distribution in contrast to the costly fine‐scale finite element simulations. Copyright © 2000 John Wiley & Sons, Ltd.  相似文献   

7.
Parallelization of the finite-element method (FEM) has been contemplated by the scientific and high-performance computing community for over a decade. Most of the computations in the FEM are related to linear algebra that includes matrix and vector computations. These operations have the single-instruction multiple-data (SIMD) computation pattern, which is beneficial for shared-memory parallel architectures. General-purpose graphics processing units (GPGPUs) have been effectively utilized for the parallelization of FEM computations ever since 2007. The solver step of the FEM is often carried out using conjugate gradient (CG)-type iterative methods because of their larger convergence rates and greater opportunities for parallelization. Although the SIMD computation patterns in the FEM are intrinsic for GPU computing, there are some pitfalls, such as the underutilization of threads, uncoalesced memory access, lower arithmetic intensity, limited faster memories on GPUs and synchronizations. Nevertheless, FEM applications have been successfully deployed on GPUs over the last 10 years to achieve a significant performance improvement. This paper presents a comprehensive review of the parallel optimization strategies applied in each step of the FEM. The pitfalls and trade-offs linked to each step in the FEM are also discussed in this paper. Furthermore, some extraordinary methods that exploit the tremendous amount of computing power of a GPU are also discussed. The proposed review is not limited to a single field of engineering. Rather, it is applicable to all fields of engineering and science in which FEM-based simulations are necessary.  相似文献   

8.
The aim of the paper is to study the capabilities of the extended finite element method (XFEM) to achieve accurate computations in non‐smooth situations such as crack problems. Although the XFEM method ensures a weaker error than classical finite element methods, the rate of convergence is not improved when the mesh parameter h is going to zero because of the presence of a singularity. The difficulty can be overcome by modifying the enrichment of the finite element basis with the asymptotic crack tip displacement solutions as well as with the Heaviside function. Numerical simulations show that the modified XFEM method achieves an optimal rate of convergence (i.e. like in a standard finite element method for a smooth problem). Copyright © 2005 John Wiley & Sons, Ltd.  相似文献   

9.
An integrated framework and computational technology is described that addresses the issues to foster absolute scalability (A‐scalability) of the entire transient duration of the simulations of implicit non‐linear structural dynamics of large scale practical applications on a large number of parallel processors. Whereas the theoretical developments and parallel formulations were presented in Part 1, the implementation, validation and parallel performance assessments and results are presented here in Part 2 of the paper. Relatively simple numerical examples involving large deformation and elastic and elastoplastic non‐linear dynamic behaviour are first presented via the proposed framework for demonstrating the comparative accuracy of methods in comparison to available experimental results and/or results available in the literature. For practical geometrically complex meshes, the A‐scalability of non‐linear implicit dynamic computations is then illustrated by employing scalable optimal dissipative zero‐order displacement and velocity overshoot behaviour time operators which are a subset of the generalized framework in conjunction with numerically scalable spatial domain decomposition methods and scalable graph partitioning techniques. The constant run times of the entire simulation of ‘fixed‐memory‐use‐per‐processor’ scaling of complex finite element mesh geometries is demonstrated for large scale problems and large processor counts on at least 1024 processors. Copyright © 2003 John Wiley & Sons, Ltd.  相似文献   

10.
A class of parallel multiple‐front solution algorithms is developed for solving linear systems arising from discretization of boundary value problems and evolution problems. The basic substructuring approach and frontal algorithm on each subdomain are first modified to ensure stable factorization in situations where ill‐conditioning may occur due to differing material properties or the use of high degree finite elements (p methods). Next, the method is implemented on distributed‐memory multiprocessor systems with the final reduced (small) Schur complement problem solved on a single processor. A novel algorithm that implements a recursive partitioning approach on the subdomain interfaces is then developed. Both algorithms are implemented and compared in a least‐squares finite‐element scheme for viscous incompressible flow computation using h‐ and p‐finite element schemes. Copyright © 2003 John Wiley & Sons, Ltd.  相似文献   

11.
We present a computational framework for the simulation of J2‐elastic/plastic materials in complex geometries based on simple piecewise linear finite elements on tetrahedral grids. We avoid spurious numerical instabilities by means of a specific stabilization method of the variational multiscale kind. Specifically, we introduce the concept of subgrid‐scale displacements, velocities, and pressures, approximated as functions of the governing equation residuals. The subgrid‐scale displacements/velocities are scaled using an effective (tangent) elastoplastic shear modulus, and we demonstrate the beneficial effects of introducing a subgrid‐scale pressure in the plastic regime. We provide proofs of stability and convergence of the proposed algorithms. These methods are initially presented in the context of static computations and then extended to the case of dynamics, where we demonstrate that, in general, naïve extensions of stabilized methods developed initially for static computations seem not effective. We conclude by proposing a dynamic version of the stabilizing mechanisms, which obviates this problematic issue. In its final form, the proposed approach is simple and efficient, as it requires only minimal additional computational and storage cost with respect to a standard finite element relying on a piecewise linear approximation of the displacement field.  相似文献   

12.
Several performance improvements for finite‐element edge‐based sparse matrix–vector multiplication algorithms on unstructured grids are presented and tested. Edge data structures for tetrahedral meshes and triangular interface elements are treated, focusing on nodal and edges renumbering strategies for improving processor and memory hierarchy use. Benchmark computations on Intel Itanium 2 and Pentium IV processors are performed. The results show performance improvements in CPU time ranging from 2 to 3. Copyright © 2005 John Wiley & Sons, Ltd.  相似文献   

13.
This paper presents a number of algorithms to run the fast multipole method (FMM) on NVIDIA CUDA‐capable graphical processing units (GPUs) (Nvidia Corporation, Sta. Clara, CA, USA). The FMM is a class of methods to compute pairwise interactions between N particles for a given error tolerance and with computational cost of . The methods described in the paper are applicable to any FMMs in which the multipole‐to‐local (M2L) operator is a dense matrix and the matrix is precomputed. This is the case for example in the black‐box fast multipole method (bbFMM), which is a variant of the FMM that can handle large class of kernels. This example will be used in our benchmarks. In the FMM, two operators represent most of the computational cost, and an optimal implementation typically tries to balance those two operators. One is the nearby interaction calculation (direct sum calculation, line 29 in Listing 1), and the other is the M2L operation. We focus on the M2L. By combining multiple M2L operations and reordering the primitive loops of the M2L so that CUDA threads can reuse or share common data, these approaches reduce the movement of data in the GPU. Because memory bandwidth is the primary bottleneck of these methods, significant performance improvements are realized. Four M2L schemes are detailed and analyzed in the case of a uniform tree. The four schemes are tested and compared with an optimized, OpenMP parallelized, multi‐core CPU code. We consider high and low precision calculations by varying the number of Chebyshev nodes used in the bbFMM. The accuracy of the GPU codes is found to be satisfactory and achieved performance over 200 Gflop/s on one NVIDIA Tesla C1060 GPU (Nvidia Corporation, Sta. Clara, CA, USA). This was compared against two quad‐core Intel Xeon E5345 processors (Intel Corporation, Sta. Clara, CA, USA) running at 2.33 GHz, for a combined peak performance of 149 Gflop/s for single precision. For the low FMM accuracy case, the observed performance of the CPU code was 37 Gflop/s, whereas for the high FMM accuracy case, the performance was about 8.5 Gflop/s, most likely because of a higher frequency of cache misses. We also present benchmarks on an NVIDIA C2050 GPU (a Fermi processor)(Nvidia Corporation, Sta. Clara, CA, USA) in single and double precision. Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   

14.
This paper presents a study of the performance of the non‐linear co‐ordinate transformations in the numerical integration of weakly singular boundary integrals. A comparison of the smoothing property, numerical convergence and accuracy of the available non‐linear polynomial transformations is presented for two‐dimensional problems. Effectiveness of generalized transformations valid for any type and location of singularity has been investigated. It is found that weakly singular integrals are more efficiently handled with transformations valid for end‐point singularities by partitioning the element at the singular point. Further, transformations which are excellent for CPV integrals are not as accurate for weakly singular integrals. Connection between the maximum permissible order of polynomial transformations and precision of computations has also been investigated; cubic transformation is seen to be the optimum choice for single precision, and quartic or quintic one, for double precision computations. A new approach which combines the method of singularity subtraction with non‐linear transformation has been proposed. This composite approach is found to be more accurate, efficient and robust than the singularity subtraction method and the non‐linear transformation methods. Copyright © 2001 John Wiley & Sons, Ltd.  相似文献   

15.
Abstract: We present a new method that combines the fringe projection and the digital image correlation (DIC) techniques on a single hardware platform to simultaneously measure both shape and deformation fields of three‐dimensional (3‐D) surfaces with complex geometries. The method in its basic form requires only a single camera and single projector, but this can be easily extended to a multi‐camera multi‐projector system to obtain complete 360° measurements. Multiple views of the surface profile and displacement field are automatically co‐registered in a unified global coordinate system, thereby avoiding the significant errors that can arise through the use of statistical point cloud stitching techniques. Experimental results from a two‐camera two‐projector sensor are presented and compared with results from both a standard stereo‐DIC approach and a finite element model.  相似文献   

16.
In this paper, we model crack discontinuities in two‐dimensional linear elastic continua using the extended finite element method without the need to partition an enriched element into a collection of triangles or quadrilaterals. For crack modeling in the extended finite element, the standard finite element approximation is enriched with a discontinuous function and the near‐tip crack functions. Each element that is fully cut by the crack is decomposed into two simple (convex or nonconvex) polygons, whereas the element that contains the crack tip is treated as a nonconvex polygon. On using Euler's homogeneous function theorem and Stokes's theorem to numerically integrate homogeneous functions on convex and nonconvex polygons, the exact contributions to the stiffness matrix from discontinuous enriched basis functions are computed. For contributions to the stiffness matrix from weakly singular integrals (because of enrichment with asymptotic crack‐tip functions), we only require a one‐dimensional quadrature rule along the edges of a polygon. Hence, neither element‐partitioning on either side of the crack discontinuity nor use of any cubature rule within an enriched element are needed. Structured finite element meshes consisting of rectangular elements, as well as unstructured triangular meshes, are used. We demonstrate the flexibility of the approach and its excellent accuracy in stress intensity factor computations for two‐dimensional crack problems. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

17.
A methodology is presented for generating enrichment functions in generalized finite element methods (GFEM) using experimental and/or simulated data. The approach is based on the proper orthogonal decomposition (POD) technique, which is used to generate low‐order representations of data that contain general information about the solution of partial differential equations. One of the main challenges in such enriched finite element methods is knowing how to choose, a priori, enrichment functions that capture the nature of the solution of the governing equations. POD produces low‐order subspaces, that are optimal in some norm, for approximating a given data set. For most problems, since the solution error in Galerkin methods is bounded by the error in the best approximation, it is expected that the optimal approximation properties of POD can be exploited to construct efficient enrichment functions. We demonstrate the potential of this approach through three numerical examples. Best‐approximation studies are conducted that reveal the advantages of using POD modes as enrichment functions in GFEM over a conventional POD basis. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

18.
A model of pseudoelasticity in shape memory alloys is developed within the incremental energy minimization framework. Three constitutive functions are involved: the Helmholtz free energy and rate‐independent dissipation that enter incrementally the minimized energy function, and the constraint function that defines the limit transformation strains. The proposed implementation is based on a unified augmented Lagrangian treatment of both the constitutive constraints and nonsmooth dissipation function. A methodology for easy reformulation of the model from the small‐strain to finite‐deformation regime is presented. Finite element computations demonstrate robustness of the finite‐strain version of the model and illustrate the effects of tension–compression asymmetry and transversal isotropy of the surface of limit transformation strains. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

19.
Particle-mesh interpolations are fundamental operations for particle-in-cell codes, as implemented in vortex methods, plasma dynamics and electrostatics simulations. In these simulations, the mesh is used to solve the field equations and the gradients of the fields are used in order to advance the particles. The time integration of particle trajectories is performed through an extensive resampling of the flow field at the particle locations. The computational performance of this resampling turns out to be limited by the memory bandwidth of the underlying computer architecture. We investigate how mesh-particle interpolation can be efficiently performed on graphics processing units (GPUs) and multicore central processing units (CPUs), and we present two implementation techniques. The single-precision results for the multicore CPU implementation show an acceleration of 45-70×, depending on system size, and an acceleration of 85-155× for the GPU implementation over an efficient single-threaded C++ implementation. In double precision, we observe a performance improvement of 30-40× for the multicore CPU implementation and 20-45× for the GPU implementation. With respect to the 16-threaded standard C++ implementation, the present CPU technique leads to a performance increase of roughly 2.8-3.7× in single precision and 1.7-2.4× in double precision, whereas the GPU technique leads to an improvement of 9× in single precision and 2.2-2.8× in double precision.  相似文献   

20.
Shape‐memory polymers (SMPs) belong to a class of smart materials that have shown promise for a wide range of applications. They are characterized by their ability to maintain a temporary deformed shape and return to an original parent permanent shape. In this paper, we consider the coupled photomechanical behavior of light activated shape‐memory polymers (LASMPs), focusing on the numerical aspects for finite element simulations at the engineering scale. The photomechanical continuum framework is summarized, and some specific constitutive equations for LASMPs are described. Numerical implementation of the multiphysics governing partial differential equations takes the form of a user defined element subroutine within the commercial software package ABAQUS . We verify our two‐dimensional and three‐dimensional finite element procedure for multiple analytically tractable cases. To show the robustness of the numerical implementation, simulations are performed under various geometries and complex photomechanical loading. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号