首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper presents techniques for generating very large finite‐element matrices on a multicore workstation equipped with several graphics processing units (GPUs). To overcome the low memory size limitation of the GPUs, and at the same time to accelerate the generation process, we propose to generate the large sparse linear systems arising in finite‐element analysis in an iterative manner on several GPUs and to use the graphics accelerators concurrently with CPUs performing collection and addition of the matrix fragments using a fast multithreaded procedure. The scheduling of the threads is organized in such a way that the CPU operations do not affect the performance of the process, and the GPUs are idle only when data are being transferred from GPU to CPU. This approach is verified on two workstations: the first consists of two 6‐core Intel Xeon X5690 processors with two Fermi GPUs: each GPU is a GeForce GTX 590 with two graphics processors and 1.5 GB of fast RAM; the second workstation is equipped with two Tesla C2075 boards carrying 6 GB of RAM each and two 12‐core Opteron 6174s. For the latter setup, we demonstrate the fast generation of sparse finite‐element matrices as large as 10 million unknowns, with over 1 billion nonzero entries. Comparing with the single‐threaded and multithreaded CPU implementations, the GPU‐based version of the algorithm based on the ideas presented in this paper reduces the finite‐element matrix‐generation time in double precision by factors of 100 and 30, respectively. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

2.
Recently, graphics processing units (GPUs) have had great success in accelerating many numerical computations. We present their application to computations on unstructured meshes such as those in finite element methods. Multiple approaches in assembling and solving sparse linear systems with NVIDIA GPUs and the Compute Unified Device Architecture (CUDA) are created and analyzed. Multiple strategies for efficient use of global, shared, and local memory, methods to achieve memory coalescing, and optimal choice of parameters are introduced. We find that with appropriate preprocessing and arrangement of support data, the GPU coprocessor using single‐precision arithmetic achieves speedups of 30 or more in comparison to a well optimized double‐precision single core implementation. We also find that the optimal assembly strategy depends on the order of polynomials used in the finite element discretization. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

3.
Finite element method (FEM) is a well‐developed method to solve real‐world problems that can be modeled with differential equations. As the available computational power increases, complex and large‐size problems can be solved using FEM, which typically involves multiple degrees of freedom (DOF) per node, high order of elements, and an iterative solver requiring several sparse matrix‐vector multiplication operations. In this work, a new storage scheme is proposed for sparse matrices arising from FEM simulations with multiple DOF per node. A sparse matrix‐vector multiplication kernel and its variants using the proposed scheme are also given for CUDA‐enabled GPUs. The proposed scheme and the kernels rely on the mesh connectivity data from FEM discretization and the number of DOF per node. The proposed kernel performance was evaluated on seven test matrices for double‐precision floating point operations. The performance analysis showed that the proposed GPU kernel outperforms the ELLPACK (ELL) and CUSPARSE Hybrid (HYB) format GPU kernels by an average of 42% and 32%, respectively, on a Tesla K20c card. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

4.
This paper discusses a three-dimensional fast multipole boundary integral equation method for crack problems for Laplace's equation. The proposed implementation uses collocation and piecewise constant shape functions to discretise the hypersingular boundary integral equation for crack problems. The resulting numerical equation is solved with GMRES (generalised minimum residual method) in connection with FMM (fast multipole method). It is found that the obtained code is faster than a conventional one when the number of unknowns is greater than about 1300.  相似文献   

5.
A new mathematical model for accurately computing currents flowing along the high‐voltage ac substation's grounding system and nearby floating metallic conductors buried in the multilayer earth model has been developed in this paper, which is a hybrid of the Galerkin‐type boundary element method (BEM) and the conventional nodal analysis method. Only the propagation effect of electromagnetic waves within the substation's limited area has been neglected in this model. The quasi‐static complex image method and the closed form of Green's function are introduced into this model to accelerate the mutual impedance and induction coefficients calculation. The model is then implemented in a computer program, which can be used to calculate current distribution of any configuration of the grounding system, with or without floating metallic conductors'. Copyright © 2007 John Wiley & Sons, Ltd.  相似文献   

6.
Fast multipole method (FMM) has been developed as a technique to reduce the computational cost and memory requirements in solving large‐scale problems. This paper discusses an application of FMM to three‐dimensional boundary integral equation method for elastostatic crack problems. The boundary integral equation for many crack problems is discretized with FMM and Galerkin's method. The resulting algebraic equation is solved with generalized minimum residual method (GMRES). The numerical results show that FMM is more efficient than conventional methods when the number of unknowns is more than about 1200 and, therefore, can be useful in large‐scale analyses of fracture mechanics. Copyright © 2001 John Wiley & Sons, Ltd.  相似文献   

7.
A lattice Boltzmann method (LBM) for solving the shallow water equations (SWEs) and the advection–dispersion equation is developed and implemented on graphics processing unit (GPU)‐based architectures. A generalized lattice Boltzmann equation (GLBE) with a multiple‐relaxation‐time (MRT) collision method is used to simulate shallow water flow. A two‐relaxation‐time (TRT) method with two speed‐of‐sound techniques is used to solve the advection–dispersion equation. The proposed LBM is implemented to an NVIDIA ® Computing Processor in a single GPU workstation. GPU computing is performed using the Jacket GPU engine for MATLAB ® and CUDA. In the numerical examples, the MRT‐LBM model and the TRT‐LBM model are verified and show excellent agreement to exact solutions. The MRT outperforms the single‐relaxation‐time (SRT) collision operator in terms of stability and accuracy when the SRT parameter is close to the stability limit of 0.5. Mass transport with velocity‐dependent dispersion in shallow water flow is simulated by combining the MRT‐LBM model and the TRT‐LBM model. GPU performance with CUDA code shows an order of magnitude higher than MATLAB‐Jacket code. Moreover, the GPU parallel performance increases as the grid size increases. The results indicate the promise of the GPU‐accelerated LBM for modeling mass transport phenomena in shallow water flows. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

8.
Applied to solid mechanics problems with geometric nonlinearity, current finite element and boundary element methods face difficulties if the domain is highly distorted. Furthermore, current boundary element method (BEM) methods for geometrically nonlinear problems are implicit: the source term depends on the unknowns within the arguments of domain integrals. In the current study, a new BEM method is formulated which is explicit and whose stiffness matrices require no domain function evaluations. It exploits a rigorous incremental equilibrium equation. The method is also based on a Domain Integral Reduction Algorithm (DIRA), exploiting the Helmholtz decomposition to obviate domain function evaluations. The current version of DIRA introduces a major improvement compared to the initial version.  相似文献   

9.
In finite element formulations for poroelastic continua a representation of Biot's theory using the unknowns solid displacement and pore pressure is preferred. Such a formulation is possible either for quasi‐static problems or for dynamic problems if the inertia effects of the fluid are neglected. Contrary to these formulations a boundary element method (BEM) for the general case of Biot's theory in time domain has been published (Wave Propagation in Viscoelastic and Poroelastic Continua: A Boundary Element Approach. Lecture Notes in Applied Mechanics. Springer: Berlin, Heidelberg, New York, 2001.). If the advantages of both methods are required it is common practice to couple both methods. However, for such a coupled FE/BE procedure a BEM for the simplified dynamic Biot theory as used in FEM must be developed. Therefore, here, the fundamental solutions as well as a BE time stepping procedure is presented for the simplified dynamic theory where the inertia effects of the fluid are neglected. Further, a semi‐analytical one‐dimensional solution is presented to check the proposed BE formulation. Finally, wave propagation problems are studied using either the complete Biot theory as well as the simplified theory. These examples show that no significant differences occur for the selected material. Copyright © 2005 John Wiley & Sons, Ltd.  相似文献   

10.
The Boundary Contour Method (BCM) is a recent variant of the Boundary Element Method (BEM) resting on the use of boundary approximations which a-priori satisfy the field equations. For two-dimensional problems, the evaluation of all the line-integrals involved in the collocation BCM reduces to function evaluations at the end-points of each element, thus completely avoiding numerical integrations. With reference to 2-D linear elasticity, this paper develops a variational version of BCM by transferring to the BCM context the ingredients which characterize the Galerkin-Symmetric BEM (GSBEM). The method proposed herein requires no numerical integrations: all the needed double line-integrals over boundary elements pairs can be evaluated by generating appropriate “potential functions” (in closed form) and computing their values at the element end-points. This holds for straight as well as curved elements; however the coefficient matrix of the equation system in the boundary unknowns turns out to be fully symmetric only when all the elements are straight. The numerical results obtained for some benchmark problems, for which analytical solutions are available, validate the proposed formulation and the corresponding solution procedure.  相似文献   

11.
The boundary integral equation method is used for the solution of three‐dimensional elastostatic problems in transversely isotropic solids using closed‐form fundamental solutions. The previously published point force solutions for such solids were modified and are presented in a convenient form, especially suitable for use in the boundary integral equation method. The new presentations are used as a basis for accurate numerical computations of all Green's functions necessary in the BEM process without inaccuracy and redundant computations. The validity of the new presentation is shown through three numerical examples. Copyright © 2000 John Wiley & Sons, Ltd.  相似文献   

12.
The fast multipole method (FMM) is a very effective way to accelerate the numerical solutions of the methods based on Green's functions or fundamental solutions. Combined with the FMM, the boundary element method (BEM) can now solve large-scale problems with several million unknowns on a desktop computer. The method of fundamental solutions (MFS), also called superposition or source method and based on the fundamental solutions but without using integrals, has been studied for several decades along with the BEM. The MFS is a boundary meshless method in nature and offers more flexibility in modeling of a problem. It also avoids the singularity of the kernel by placing the source at some auxiliary points off the problem domain. However, like the traditional BEM, the conventional MFS also requires O(N2) operations to compute the system of equations and another O(N3) operations to solve the system using direct solvers, with N being the number of unknowns. Combining the FMM and MFS can potentially reduce the operations in formation and solution of the MFS system, as well as the memory requirement, all to O(N). This paper is an attempt in this direction. The FMM formulations for the MFS is presented for 2D potential problem. Issues in implementation of the FMM for the MFS are discussed. Numerical examples with up to 200,000 DOF's are solved successfully on a Pentium IV PC using the developed FMM MFS code. These results clearly demonstrate the efficiency, accuracy and potentials of the fast multipole accelerated MFS.  相似文献   

13.
The use of Green's functions has been considered a powerful technique in the solution of fracture mechanics problems by the boundary element method (BEM). Closed‐form expressions for Green's function components, however, have only been available for few simple 2‐D crack geometry applications and require complex variable theory. The present authors have recently introduced an alternative numerical procedure to compute the Green's function components that produced BEM results for 2‐D general geometry multiple crack problems, including static and dynamic applications. This technique is not restricted to 2‐D problems and the computational aspects of the 3‐D implementation of the numerical Green's function approach are now discussed, including examples. Copyright © 2000 John Wiley & Sons, Ltd.  相似文献   

14.
We present a new solution to accelerate the boundary integral equation method (BIEM). The calculation time of the BIEM is dominated by the evaluation of the layer potential in the boundary integral equation. We performed this task using MDGRAPE‐2, a special‐purpose computer designed for molecular dynamics simulations. MDGRAPE‐2 calculates pairwise interactions among particles (e.g. atoms and ions) using hardwired‐pipeline processors. We combined this hardware with an iterative solver. During the iteration process, MDGRAPE‐2 evaluates the layer potential. The rest of the calculation is performed on a conventional PC connected to MDGRAPE‐2. We applied this solution to the Laplace and Helmholtz equations in three dimensions. Numerical tests showed that BIEM is accelerated by a factor of 10–100. Our rather naive solution has a calculation cost of O(N2 × Niter), where N is the number of unknowns and Niter is the number of iterations. Copyright © 2005 John Wiley & Sons, Ltd.  相似文献   

15.
The proposed spectral element method implementation is based on sparse matrix storage of local shape function derivatives calculated at Gauss–Lobatto–Legendre points. The algorithm utilizes two basic operations: multiplication of sparse matrix by vector and element‐by‐element vectors multiplication. Compute‐intensive operations are performed for a part of equation of motion derived at the degree of freedom level of 3D isoparametric spectral elements. The assembly is performed at the force vector in such a way that atomic operations are minimized. This is achieved by a new mesh coloring technique The proposed parallel implementation of spectral element method on GPU is applied for the first time for Lamb wave simulations. It has been found that computation on multicore GPU is up to 14 times faster than on single CPU. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

16.
The structural step problem for elastic-plastic internal-variable materials is addressed in the presence of frictionless unilateral contact conditions. Basing on the BIEM (boundary integral equation method) and making use of deformation-theory plasticity (through the backward-difference method of computational plasticity), two variational principles are shown to characterize the solution to the step problem: one is a stationarity principle having as unknowns all the problem variables, the other is a saddle-point principle having as unknowns the increments of the boundary tractions and displacements, along with the plastic strain increments in the domain. The discretization by boundary and interior elements transforms the above principles into well-posed mathematical programming formulations belonging to the symmetric Galerkin BEM formulations (with features such as a symmetric sign-definite coefficient matrix, double integrations, and hypersingular integrals).  相似文献   

17.
Three formulations of the boundary element method (BEM) and one of the Galerkin finite element method (FEM) are compared according to accuracy and efficiency for the spatial discretization of two-dimensional, moving-boundary problems based on Laplace's equation. The same Euler-predictor, trapezoid-corrector scheme for time integration is used for all four methods. The model problems are on either a bounded or a semi-infinite strip and are formulated so that closed-form solutions are known. Infinite elements are used with both the BEM and FEM techniques for the unbounded domain. For problems with the bounded region, the BEM using the free-space Green's function and piecewise quadratic interpolating functions (QBEM) is more accurate and efficient than the BEM with linear interpolation. However, the FEM with biquadratic basis functions is more efficient for a given accuracy requirement than the QBEM, except when very high accuracy is demanded. For the unbounded domain, the preferred method is the BEM based on a Green's function that satisfies the lateral symmetry conditions and which leads to discretization of the potential only along the moving surface. This last formulation is the only one that reliably satisfies the far-field boundary condition.  相似文献   

18.
The paper presents the non‐singular forms of Green's formula and its normal derivative of exterior problems for three‐dimensional Laplace's equation. The main advantage of these modified formulations is that they are amenable to solution by directly using quadrature formulas. Thus, the conventional boundary element approximation, which locally regularizes the singularities in each element, is not required. The weak singularities are treated by both the Gauss flux theorem and the property of the associated equipotential body. The hypersingularities are treated by further using the boundary formula for the associated interior problems. The efficacy of the modified formulations is examined by a sphere, in an infinite domain, subject to Neumann and Dirichlet conditions, respectively. The modified integral formulations are further applied to a practical problem, i.e. surface‐wave–body interactions. Using the conventional boundary integral equation formulation is known to break down at certain discrete frequencies for such a problem. Removing the ‘irregular’ frequencies is performed by linearly combining the standard integral equation with its normal derivative. Computations are presented of the added‐mass and damping coefficients and wave exciting forces on a floating hemisphere. Comparing the numerical results with that by other approaches demonstrates the effectiveness of the method. Copyright © 2000 John Wiley & Sons, Ltd.  相似文献   

19.
This paper presents a number of algorithms to run the fast multipole method (FMM) on NVIDIA CUDA‐capable graphical processing units (GPUs) (Nvidia Corporation, Sta. Clara, CA, USA). The FMM is a class of methods to compute pairwise interactions between N particles for a given error tolerance and with computational cost of . The methods described in the paper are applicable to any FMMs in which the multipole‐to‐local (M2L) operator is a dense matrix and the matrix is precomputed. This is the case for example in the black‐box fast multipole method (bbFMM), which is a variant of the FMM that can handle large class of kernels. This example will be used in our benchmarks. In the FMM, two operators represent most of the computational cost, and an optimal implementation typically tries to balance those two operators. One is the nearby interaction calculation (direct sum calculation, line 29 in Listing 1), and the other is the M2L operation. We focus on the M2L. By combining multiple M2L operations and reordering the primitive loops of the M2L so that CUDA threads can reuse or share common data, these approaches reduce the movement of data in the GPU. Because memory bandwidth is the primary bottleneck of these methods, significant performance improvements are realized. Four M2L schemes are detailed and analyzed in the case of a uniform tree. The four schemes are tested and compared with an optimized, OpenMP parallelized, multi‐core CPU code. We consider high and low precision calculations by varying the number of Chebyshev nodes used in the bbFMM. The accuracy of the GPU codes is found to be satisfactory and achieved performance over 200 Gflop/s on one NVIDIA Tesla C1060 GPU (Nvidia Corporation, Sta. Clara, CA, USA). This was compared against two quad‐core Intel Xeon E5345 processors (Intel Corporation, Sta. Clara, CA, USA) running at 2.33 GHz, for a combined peak performance of 149 Gflop/s for single precision. For the low FMM accuracy case, the observed performance of the CPU code was 37 Gflop/s, whereas for the high FMM accuracy case, the performance was about 8.5 Gflop/s, most likely because of a higher frequency of cache misses. We also present benchmarks on an NVIDIA C2050 GPU (a Fermi processor)(Nvidia Corporation, Sta. Clara, CA, USA) in single and double precision. Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   

20.
An accelerated boundary cloud method (BCM) for boundary‐only analysis of exterior electrostatic problems is presented in this paper. The BCM uses scattered points instead of the classical boundary elements to discretize the surface of the conductors. The dense linear system of equations generated by the BCM are solved by a GMRES iterative solver combined with a singular value decomposition based rapid matrix–vector multiplication technique. The accelerated technique takes advantage of the fact that the integral equation kernel (2D Green's function in our case) is locally smooth and, therefore, can be dramatically compressed by using a singular value decomposition technique. The acceleration technique greatly speeds up the solution phase of the linear system by accelerating the computation of the dense matrix–vector product and reducing the storage required by the BCM. Copyright © 2002 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号