首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 343 毫秒
1.
The fast multipole method (FMM) is a fast summation algorithm capable of accelerating pairwise interaction calculations, known as N‐body problems, from an algorithmic complexity of ??(N2) to ??(N) for N particles. The algorithm has brought a dramatic increase in the capability of particle simulations in many application areas, such as electrostatics, particle formulations of fluid mechanics, and others. Although the literature on the subject provides theoretical error bounds for the FMM approximation, there are not many reports on the measured errors in a suite of computational experiments that characterize the accuracy of the method in relation with the different parameters available to the user. We have performed such an experimental investigation, and summarized the results of about 1500 calculations using the FMM algorithm, applied to the 2D vortex particle method. In addition to the more standard diagnostic of the maximum error, we supply illustrations of the spatial distribution of the errors, offering visual evidence of all the contributing factors to the overall approximation accuracy: multipole expansion, local expansion, hierarchical spatial decomposition (interaction lists, local domain, far domain). This presentation is a contribution to any researcher wishing to incorporate the FMM acceleration to their application code, as it aids in understanding where accuracy is gained or compromised. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

2.
This article presents a wideband fast multipole method (FMM) to accelerate the boundary integral equation method for two‐dimensional elastodynamics in frequency domain. The present wideband FMM is established by coupling the low‐frequency FMM and the high‐frequency FMM that are formulated on the ingenious decomposition of the elastodynamic fundamental solution developed by Nishimura's group. For each of the two FMMs, we estimated the approximation parameters, that is, the expansion order for the low‐frequency FMM and the quadrature order for the high‐frequency FMM according to the requested accuracy, considering the coexistence of the derivatives of the Helmholtz kernels for the longitudinal and transcendental waves in the Burton–Muller type boundary integral equation of interest. In the numerical tests, the error resulting from the fast multipole approximation was monotonically decreased as the requested accuracy level was raised. Also, the computational complexity of the present fast boundary integral equation method agreed with the theory, that is, Nlog N, where N is the number of boundary elements in a series of scattering problems. The present fast boundary integral equation method is promising for simulations of the elastic systems with subwavelength structures. As an example, the wave propagation along a waveguide fabricated in a finite‐size phononic crystal was demonstrated. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

3.
We present a new solution to accelerate the boundary integral equation method (BIEM). The calculation time of the BIEM is dominated by the evaluation of the layer potential in the boundary integral equation. We performed this task using MDGRAPE‐2, a special‐purpose computer designed for molecular dynamics simulations. MDGRAPE‐2 calculates pairwise interactions among particles (e.g. atoms and ions) using hardwired‐pipeline processors. We combined this hardware with an iterative solver. During the iteration process, MDGRAPE‐2 evaluates the layer potential. The rest of the calculation is performed on a conventional PC connected to MDGRAPE‐2. We applied this solution to the Laplace and Helmholtz equations in three dimensions. Numerical tests showed that BIEM is accelerated by a factor of 10–100. Our rather naive solution has a calculation cost of O(N2 × Niter), where N is the number of unknowns and Niter is the number of iterations. Copyright © 2005 John Wiley & Sons, Ltd.  相似文献   

4.
This paper presents a number of algorithms to run the fast multipole method (FMM) on NVIDIA CUDA‐capable graphical processing units (GPUs) (Nvidia Corporation, Sta. Clara, CA, USA). The FMM is a class of methods to compute pairwise interactions between N particles for a given error tolerance and with computational cost of . The methods described in the paper are applicable to any FMMs in which the multipole‐to‐local (M2L) operator is a dense matrix and the matrix is precomputed. This is the case for example in the black‐box fast multipole method (bbFMM), which is a variant of the FMM that can handle large class of kernels. This example will be used in our benchmarks. In the FMM, two operators represent most of the computational cost, and an optimal implementation typically tries to balance those two operators. One is the nearby interaction calculation (direct sum calculation, line 29 in Listing 1), and the other is the M2L operation. We focus on the M2L. By combining multiple M2L operations and reordering the primitive loops of the M2L so that CUDA threads can reuse or share common data, these approaches reduce the movement of data in the GPU. Because memory bandwidth is the primary bottleneck of these methods, significant performance improvements are realized. Four M2L schemes are detailed and analyzed in the case of a uniform tree. The four schemes are tested and compared with an optimized, OpenMP parallelized, multi‐core CPU code. We consider high and low precision calculations by varying the number of Chebyshev nodes used in the bbFMM. The accuracy of the GPU codes is found to be satisfactory and achieved performance over 200 Gflop/s on one NVIDIA Tesla C1060 GPU (Nvidia Corporation, Sta. Clara, CA, USA). This was compared against two quad‐core Intel Xeon E5345 processors (Intel Corporation, Sta. Clara, CA, USA) running at 2.33 GHz, for a combined peak performance of 149 Gflop/s for single precision. For the low FMM accuracy case, the observed performance of the CPU code was 37 Gflop/s, whereas for the high FMM accuracy case, the performance was about 8.5 Gflop/s, most likely because of a higher frequency of cache misses. We also present benchmarks on an NVIDIA C2050 GPU (a Fermi processor)(Nvidia Corporation, Sta. Clara, CA, USA) in single and double precision. Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   

5.
The parallel implementation of the element free Galerkin (EFG) method for heat transfer and fluid flow problems on MIMD type parallel computer is treated. A new parallel algorithm has been proposed in which parallelization is performed by row-wise data distribution among the processors. The codes have been developed in FORTRAN language using MPI message passing library. Two model (one each in heat transfer and fluid flow) problems have been solved to validate the proposed algorithm. The total time, communication time, user time, speedup and efficiency have been estimated for heat transfer and fluid flow problems. For eight processors, the speedup and efficiency are obtained to be 7.11 and 88.87% respectively in heat transfer problems for a data size of N=2,116 whereas 7.20 and 90.04% respectively in fluid flow problems for a data size of N=2,378.  相似文献   

6.
We present efficient parallel algorithms for the maximum empty rectangle problem in this paper. On crew pram, we solve the area version of this problem inO(log 2 n) time usingO(nlogn) processors. The perimeter version of this problem is solved inO(logn) time usingO(nlog 2 n) processors. On erew pram, we solve both the problems inO(logn) time usingO(n 2/logn) processors. We also present anO(logn) time algorithm on a mesh-of-trees architecture.  相似文献   

7.
In this paper, a framework to construct higher‐order‐accurate time‐step‐integration algorithms based on the post‐integration techniques is presented. The prescribed initial conditions are naturally incorporated in the formulations and can be strongly or weakly enforced. The algorithmic parameters are chosen such that unconditionally A‐stable higher‐order‐accurate time‐step‐integration algorithms with controllable numerical dissipation can be constructed for linear problems. Besides, it is shown that the order of accuracy for non‐linear problems is maintained through the relationship between the present formulation and the Runge–Kutta method. The second‐order differential equations are also considered. Numerical examples are given to illustrate the validity of the present formulation. Copyright © 2001 John Wiley & Sons, Ltd.  相似文献   

8.
For digital images and patterns under the nonlinear geometric transformation, T: (ξ, η) → (x, y), this study develops the splitting algorithms (i.e., the pixel‐division algorithms) that divide a 2D pixel into N × N subpixels, where N is a positive integer chosen as N = 2 k(k ≥ 0) in practical computations. When the true intensity values of pixels are known, this method makes it easy to compute the true intensity errors. As true intensity values are often unknown, the proposed approaches can compute the sequential intensity errors based on the differences between the two approximate intensity values at N and N/2. This article proposes the new splitting–shooting method, new splitting integrating method, and their combination. These methods approximate results show that the true errors of pixel intensity are O(H), where H is the pixel size. Note that the algorithms in this article do not produce any sequential errors as NN0, where N0 (≥2) is an integer independent of N and H. This is a distinctive feature compared to our previous papers on this subject. The other distinct feature of this article is that the true error bound O(H) is well suited to images with all kinds of discontinuous intensity, including scattered pixels. © 2011 Wiley Periodicals, Inc. Int J Imaging Syst Technol, 21, 323–335, 2011  相似文献   

9.
A parallel multigrid (MG) method is developed to reduce the large computational costs involved by the finite element simulation of highly viscous fluid flows, especially those resulting from metal forming applications, which are characterized by using a mixed velocity/pressure implicit formulation, unstructured meshes of tetrahedra, and frequent remeshings. The developed MG method follows a hybrid approach where the different levels of nonnested meshes are geometrically constructed by mesh coarsening, while the linear systems of the intermediate levels result from the Galerkin algebraic approach. A linear O(N) convergence rate is expected (with N being the number of unknowns), while keeping software parallel efficiency. These objectives lead to selecting unusual MG smoothers (iterative solvers) for the upper grid levels and to developing parallel mesh coarsening algorithms along with parallel transfer operators between the different levels of partitioned meshes. Within the utilized PETSc library, the developed MG method is employed as a preconditioner for the usual conjugate residual algorithm because of the symmetric undefinite matrix of the system to solve. It shows a convergence rate close to optimal, an excellent parallel efficiency, and the ability to handle the complex forming problems encountered in 3‐dimensional hot forging, which involve large material deformations and frequent remeshings.  相似文献   

10.
Second‐order, two‐point boundary‐value problems are encountered in many engineering applications including the study of beam deflections, heat flow, and various dynamic systems. Two classical numerical techniques are widely used in the engineering community for the solution of such problems; the shooting method and finite difference method. These methods are suited for linear problems. However, when solving the non‐linear problems, these methods require some major modifications that include the use of some root‐finding technique. Furthermore, they require the use of other basic numerical techniques in order to obtain the solution. In this paper, the author introduces a novel method based on continuous genetic algorithms for numerically approximating a solution to this problem. The new method has the following characteristics; first, it does not require any modification while switching from the linear to the non‐linear case; as a result, it is of versatile nature. Second, this approach does not resort to more advanced mathematical tools and is thus easily accepted in the engineering application field. Third, the proposed methodology has an implicit parallel nature which points to its implementation on parallel machines. However, being a variant of the finite difference scheme with truncation error of the order O(h2), the method provides solutions with moderate accuracy. Numerical examples presented in the paper illustrate the applicability and generality of the proposed method. Copyright © 2004 John Wiley & Sons, Ltd.  相似文献   

11.
Parallelized FVM algorithm for three-dimensional viscoelastic flows   总被引:1,自引:0,他引:1  
 A parallel implementation for the finite volume method (FVM) for three-dimensional (3D) viscoelastic flows is developed on a distributed computing environment through Parallel Virtual Machine (PVM). The numerical procedure is based on the SIMPLEST algorithm using a staggered FVM discretization in Cartesian coordinates. The final discretized algebraic equations are solved with the TDMA method. The parallelisation of the program is implemented by a domain decomposition strategy, with a master/slave style programming paradigm, and a message passing through PVM. A load balancing strategy is proposed to reduce the communications between processors. The three-dimensional viscoelastic flow in a rectangular duct is computed with this program. The modified Phan-Thien–Tanner (MPTT) constitutive model is employed for the equation system closure. Computing results are validated on the secondary flow problem due to non-zero second normal stress difference N 2. Three sets of meshes are used, and the effect of domain decomposition strategies on the performance is discussed. It is found that parallel efficiency is strongly dependent on the grid size and the number of processors for a given block number. The convergence rate as well as the total efficiency of domain decomposition depends upon the flow problem and the boundary conditions. The parallel efficiency increases with increasing problem size for given block number. Comparing to two-dimensional flow problems, 3D parallelized algorithm has a lower efficiency owing to largely overlapped block interfaces, but the parallel algorithm is indeed a powerful means for large scale flow simulations. Received: 2 July 2002 / Accepted: 15 November 2002 This research is supported by an ASTAR Grant EMT/00/011.  相似文献   

12.
In the present paper one‐step implicit integration algorithms for the N‐body problem are developed. The time‐stepping schemes are based on a Petrov–Galerkin finite element method applied to the Hamiltonian formulation of the N‐body problem. The approach furnishes algorithmic energy conservation in a natural way. The proposed time finite element method facilitates a systematic implementation of a family of time‐stepping schemes. A particular algorithm is specified by the associated quadrature rule employed for the evaluation of time integrals. The influence of various standard as well as non‐standard quadrature formulas on algorithmic energy conservation and conservation of angular momentum is examined in detail for linear and quadratic time elements. Copyright © 2000 John Wiley & Sons, Ltd.  相似文献   

13.
In this paper, unconditionally stable higher-order accurate time-step integration algorithms with controllable numerical dissipation are presented. The algorithms are based on the Newmark method with complex time steps. The ultimate spectral radius (μ), the sub-step locations (βj) and the weighting factors (αj) are the algorithmic parameters. For an algorithm that is (2n−1)th order accurate, the sub-step locations which may be complex, are shown to be the roots of an nth degree polynomial. The polynomial is given explicitly in terms of n and μ. The weighting factors are then obtained by solving a system of n simultaneous equations. It is further shown that the order of accuracy is increased by one for the non-dissipative algorithms with μ=1. The stability properties of the present algorithms are studied. It is shown that if the ultimate spectral radius is set between −1 and 1, the eigenvalues of the numerical amplification matrix are complex with magnitude less than or equal to unity. The algorithms are therefore unconditionally C-stable. When the ultimate spectral radius is set to 0 or 1, the algorithms are found to be equivalent to the first sub-diagonal and diagonal Padé approximations, respectively. The present algorithms are more general as the numerical dissipation is controllable and are very suitable for parallel computers. The accuracy of the excitation responses is found to be enhanced by the present complex-time-step procedure. To maintain high-order accuracy, the excitation may need some modifications. © 1998 John Wiley & Sons, Ltd.  相似文献   

14.
Abstract

A processor array with a reconfigurable bus system (abbreviated to PARBS) is a computation model which consists of a processor array and a reconfigurable bus system. It is a very powerful computation model in that it possesses the ability to solve many problems efficiently. However, most existing efficient algorithms on PARBS's use a large number of processors to solve problems. For example, to determine the maximum (minimum) of n data items in O(l) time, O(n 2) processors are required [12]. To solve the all‐pairs shortest paths and the minimum spanning tree problems in O(log n) time, O(n 4) processors are required [20]. These networks will therefore become very expensive for large n. In this paper, we introduce the concept of iterative‐PARBS, which is similar to the FOR‐loop construct in sequential programming languages. The iterative‐PARBS is a building block through which the processing data can be routed several times. We can think of it as a “hardware subroutine.’’ Based on this scheme, it is possible to explore more cost‐effective, time‐efficient parallel algorithms for use in a PARBS. The following new results are derived in this study: 1. The minimum (maximum) of n data items can be determined in O(l) time on a PARBS with O(n 1+? ) processors for any fixed 8 > 0.

2. The all‐pairs shortest paths and the minimum spanning tree problems can be solved in O (log n) time on a PARBS with O(n 3+? ) processors for any fixed 8 > 0.

  相似文献   

15.
The hybrid‐mixed assumed natural strain four‐node quadrilateral element using the sampling surfaces (SaS) technique is developed. The SaS formulation is based on choosing inside the plate body N not equally spaced SaS parallel to the middle surface in order to introduce the displacements of these surfaces as basic plate variables. Such choice of unknowns with the consequent use of Lagrange polynomials of degree N–1 in the thickness direction permits the presentation of the plate formulation in a very compact form. The SaS are located at Chebyshev polynomial nodes that allow one to minimize uniformly the error due to the Lagrange interpolation. To avoid shear locking and have no spurious zero energy modes, the assumed natural strain concept is employed. The developed hybrid‐mixed four‐node quadrilateral plate element passes patch tests and exhibits a superior performance in the case of coarse distorted mesh configurations. It can be useful for the 3D stress analysis of thin and thick plates because the SaS formulation gives the possibility to obtain solutions with a prescribed accuracy, which asymptotically approach the 3D exact solutions of elasticity as the number of SaS tends to infinity. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

16.
Fast multipole DBEM analysis of fatigue crack growth   总被引:3,自引:1,他引:2  
A fast multipole method (FMM) based on complex Taylor series expansions is applied to the dual boundary element method (DBEM) for large-scale crack analysis in linear elastic fracture mechanics. Combining multipole expansions with local expansions, both the computational complexity and memory requirement are reduced to O(N), where N is the number of DOF. An incremental crack-extension analysis based on the maximum principal stress criterion and the Paris law is used to simulate the fatigue growth of numerous cracks in a 2D solid. Some examples are presented to validate the numerical scheme.  相似文献   

17.
A new algorithm is developed to evaluate the time convolution integrals that are associated with boundary element methods (BEM) for transient diffusion. This approach, which is based upon the multi‐level multi‐integration concepts of Brandt and Lubrecht, provides a fast, accurate and memory efficient time domain method for this entire class of problems. Conventional BEM approaches result in operation counts of order O(N2) for the discrete time convolution over N time steps. Here we focus on the formulation for linear problems of transient heat diffusion and demonstrate reduced computational complexity to order O(N3/2) for three two‐dimensional model problems using the multi‐level convolution BEM. Memory requirements are also significantly reduced, while maintaining the same level of accuracy as the conventional time domain BEM approach. Copyright © 2005 John Wiley & Sons, Ltd.  相似文献   

18.
The fast multipole method (FMM) is very efficient in solving integral equations. This paper applies the method to solve large solid-solid boundary integral equations for elastic waves in two dimensions. The scattering problem is first formulated with the boundary element method. FMM is then introduced to expedite the solution process. By using the FMM technique, the number of floating-point operations of the matrix-vector multiplication in a standard conjugate gradient algorithm is reduced from O(N 2) to O(N 1.5), where N is the number of unknowns. The matrix-filling time and the memory requirement are also of the order N 1.5. The computational complexity of the algorithm is further reduced to O(N 4/3) by using a ray propagation technique. Numerical results are given to show the accuracy and efficiency of FMM compared to the boundary element method with dense matrix.  相似文献   

19.
This paper presents a parallel generalized finite element method (GFEM) that uses customized enrichment functions for applications where limited a priori knowledge about the solution is available. The procedure involves the parallel solution of local boundary value problems using boundary conditions from a coarse global problem. The local solutions are in turn used to enrich the global solution space using the partition of unity methodology. The parallel computation of local solutions can be implemented using a single pair of scatter–gather communications. Several numerical experiments demonstrate the high parallel efficiency of these computations. For problems requiring non-uniform mesh refinement and enrichment, load unbalance is addressed by defining a larger number of small local problems than the number of parallel processors and by sorting and solving the local problems based on estimates of their workload. A simple and effective estimate of the largest number of processors where load balance among processors is maintained is also proposed. Several three-dimensional fracture mechanics problems aiming at investigating the accuracy and parallel performance of the proposed GFEM are analyzed.  相似文献   

20.
The existing global–local multiscale computational methods, using finite element discretization at both the macro‐scale and micro‐scale, are intensive both in terms of computational time and memory requirements and their parallelization using domain decomposition methods incur substantial communication overhead, limiting their application. We are interested in a class of explicit global–local multiscale methods whose architecture significantly reduces this communication overhead on massively parallel machines. However, a naïve task decomposition based on distributing individual macro‐scale integration points to a single group of processors is not optimal and leads to communication overheads and idling of processors. To overcome this problem, we have developed a novel coarse‐grained parallel algorithm in which groups of macro‐scale integration points are distributed to a layer of processors. Each processor in this layer communicates locally with a group of processors that are responsible for the micro‐scale computations. The overlapping groups of processors are shown to achieve optimal concurrency at significantly reduced communication overhead. Several example problems are presented to demonstrate the efficiency of the proposed algorithm. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号