首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The paper presents parallel algorithms for solving Poisson equation at N2 mesh points. The methods based on marching techniques are structured for efficient parallel realization. Using orthogonal decomposition properties of arising matrices, the algorithms can be formulated in terms of transformed vectors. On a MIMD computer with not more than N processors, the computations can be performed in horizontal slices with minimal synchronization requirements. Considering an SIMD machine with N2 processors, the complexity bound O(log N) has been achieved, whereby the single marching requires 10 log N steps only.  相似文献   

2.
The methods of cyclic reduction, Fourier analysis and the resulting hybrid algorithm FACR(l) are formulated for a shared memory multiprocessor and applied to the two-dimensional Poisson equation. The complexity of the algorithms is analyzed, performance curves are given and optimal values of the parameter l are determined for various problem sizes and degrees of parallelism. The algorithms are implemented on a shared memory multiprocessor and performance results are discussed.  相似文献   

3.
In this paper we propose and evaluate a set of new strategies for the solution of three dimensional separable elliptic problems on CPU–GPU platforms. The numerical solution of the system of linear equations arising when discretizing those operators often represents the most time consuming part of larger simulation codes tackling a variety of physical situations. Incompressible fluid flows, electromagnetic problems, heat transfer and solid mechanic simulations are just a few examples of application areas that require efficient solution strategies for this class of problems. GPU computing has emerged as an attractive alternative to conventional CPUs for many scientific applications. High speedups over CPU implementations have been reported and this trend is expected to continue in the future with improved programming support and tighter CPU–GPU integration. These speedups by no means imply that CPU performance is no longer critical. The conventional CPU-control–GPU-compute pattern used in many applications wastes much of CPU’s computational power. Our proposed parallel implementation of a classical cyclic reduction algorithm to tackle the large linear systems arising from the discretized form of the elliptic problem at hand, schedules computing on both the GPU and the CPUs in a cooperative way. The experimental result demonstrates the effectiveness of this approach.  相似文献   

4.
We investigate the numerical solution of discrete-time algebraic Riccati equations on a parallel distributed architecture. Our solvers obtain an initial solution of the Riccati equation via the disc function method, and then refine this solution using Newton's method. The Smith iteration is employed to solve the Stein equation that arises at each step of Newton's method. The numerical experiments on an Intel Pentium-II cluster, connected via a Myrinet switch, report the performance and scalability of the new algorithms. Copyright © 2001 John Wiley & Sons, Ltd.  相似文献   

5.
Simchony, Chellappa, and Shao (1990) proposed a semi-direct method for computing area based optical flow. Their method is based on the iterative application of a direct Poisson solver. This method is restricted to Dirichlet boundary conditions, i.e., it is applicable only when velocity vectors at the boundary of the domain are known a priori. The authors show, both experimentally and through analysis, that the semi-direct method converges only for very large smoothness. At such levels of smoothness, the solution is obtained merely by filling in the known boundary values; the data from the image is almost totally ignored. Next, the authors consider the Concus and Golub method (1973), another semi-direct method, for computing optical flow. This method always converges, but the convergence is too slow to be of any practical value. The authors conclude that semi-direct methods are not suited for the computation of area based optical flow  相似文献   

6.
This paper is a survey of direct parallel algorithms for solving systems of linear equations. We present an up-to-date collection of algorithms for solving triangular, dense, and tridiagonal systems of equations. We state their resulting speedup over the corresponding sequential algorithms, and evaluate their numerical stability, whenever possible.  相似文献   

7.
This paper presents a general strategy, based on a high level of abstraction, designed in order to reduce the development time and cost of parallel explicit finite element codes. Such a level of abstraction provides a general, rigorous, and efficient framework to tackle the parallelisation of various solvers. The intention of this work is not to do an exhaustive study and make comparisons of different platforms, but rather to bring some quantitative input of the efficiency of the presented parallel implementation strategy. The results, obtained so far, show that a good scalability is achieved on parallel architectures such as the Cray T3D. They demonstrate that solutions that took days on a sequential machine can be reduced to hours or minutes on a parallel machine, such as the Cray T3D.  相似文献   

8.
A.  U.  M.  K. 《Future Generation Computer Systems》2005,21(8):1275-1284
For the solution of sparse linear systems from circuit simulation whose coefficient matrices include a few dense rows and columns, a parallel iterative algorithm with distributed Schur complement preconditioning is presented. The parallel efficiency of the solver is increased by transforming the equation system into a problem without dense rows and columns as well as by exploitation of parallel graph partitioning methods. The costs of local, incomplete LU decompositions are decreased by fill-in reducing reordering methods of the matrix and a threshold strategy for the factorization. The efficiency of the parallel solver is demonstrated with real circuit simulation problems on PC clusters.  相似文献   

9.
The Journal of Supercomputing - Modern GPUs can achieve high computing power at low cost, but still requires much time and effort. Tridiagonal system and scan solvers are one example of widely used...  相似文献   

10.
A new method, namely, the parallel two-level hybrid (PTH) method, is developed to solve tridiagonal systems on parallel computers. PTH has two levels of parallelism. The first level is based on algorithms developed from the Sherman-Morrison modification formula, and the second level can choose different parallel tridiagonal solvers for different applications. By choosing different outer and inner solvers and by controlling its two-level partition, PTH can deliver better performance for different applications on different machine ensembles and problem sizes. In an extreme case, the two levels of parallelism can be merged into one, and PTH can be the best algorithm otherwise available. Theoretical analyses and numerical experiments indicate that PTH is significantly better than existing methods on massively parallel computers. For instance, using PTH in a fast Poisson solver results in a 2-folds speedup compared to a conventional parallel Poisson solver on a 512 nodes IBM machine. When only the tridiagonal solver is considered, PTH is over 10 times faster than the currently used implementation.  相似文献   

11.
The Journal of Supercomputing - Numerical simulation of multi-physical processes requires a lot of processor time, especially when solving ill-conditional linear systems arising in fluid dynamics...  相似文献   

12.
We present the novel parallel linear least squares solvers ARPLS-IR and ARPLS-MPIR for dense overdetermined linear systems. All internode communication of our ARPLS solvers arises in the context of all-reduce operations across the parallel system and therefore they benefit directly from efficient implementations of such operations. Our approach is based on the semi-normal equations, which are in general not backward stable. However, the method is stabilised by using iterative refinement. We show that performing iterative refinement in mixed precision also increases the parallel performance of the algorithm. We consider different variants of the ARPLS algorithm depending on the conditioning of the problem and in this context also evaluate the method of normal equations with iterative refinement. For ill-conditioned systems, we demonstrate that the semi-normal equations with standard iterative refinement achieve the best accuracy compared to other parallel solvers.We discuss the conceptual advantages of ARPLS-IR and ARPLS-MPIR over alternative parallel approaches based on QR factorisation or the normal equations. Moreover, we analytically compare the communication cost to an approach based on communication-avoiding QR factorisation. Numerical experiments on a high performance cluster illustrate speed-ups up to 3820 on 2048 cores for ill-conditioned tall and skinny matrices over state-of-the-art solvers from DPLASMA or ScaLAPACK.  相似文献   

13.
We describe an implementation to solve Poisson?s equation for an isolated system on a unigrid mesh using FFTs. The method solves the equation globally on mesh blocks distributed across multiple processes on a distributed-memory parallel computer. Test results to demonstrate the convergence and scaling properties of the implementation are presented. The solver is offered to interested users as the library PSPFFT.

Program summary

Program title: PSPFFTCatalogue identifier: AEJK_v1_0Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEJK_v1_0.htmlProgram obtainable from: CPC Program Library, Queen?s University, Belfast, N. IrelandLicensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.htmlNo. of lines in distributed program, including test data, etc.: 110 243No. of bytes in distributed program, including test data, etc.: 16 332 181Distribution format: tar.gzProgramming language: Fortran 95Computer: Any architecture with a Fortran 95 compiler, distributed memory clustersOperating system: Linux, UnixHas the code been vectorized or parallelized?: Yes, using MPI. An arbitrary number of processors may be used (subject to some constraints). The program has been tested on from 1 up to ∼ 13 000 processors. RAM: Depends on the problem size, approximately 170 MBytes for 483 cells per process.Classification: 4.3, 6.5External routines: MPI (http://www.mcs.anl.gov/mpi/), FFTW (http://www.fftw.org), Silo (https://wci.llnl.gov/codes/silo/) (only necessary for running test problem).Nature of problem: Solving Poisson?s equation globally on unigrid mesh distributed across multiple processes on distributed memory system.Solution method: Numerical solution using multidimensional discrete Fourier Transform in a parallel Fortran 95 code.Unusual features: This code can be compiled as a library to be readily linked and used as a blackbox Poisson solver with other codes.Running time: Depends on the size of the problem, but typically less than 1 second per solve.  相似文献   

14.
该论文研究了利用并行共轭梯度算法求解二维泊松方程的方法,在由24台微机组成的机群上进行了实验。实验数据表明并行共轭梯度算法适用于求解二维泊松方程,它具有收敛快,可扩展性强的特点。在实验的基础上提出并验证了适用于并行共轭梯度算法的合理计算节点数的选择函数。  相似文献   

15.
A method is described to solve the systems of tridiagonal linear equations that result from discrete approximations of the Poisson or Helmholtz equation with either periodic, Dirichlet, Neumann, or shear-periodic boundary conditions. The problem is partitioned into a set of smaller Dirichlet problems which can be solved simultaneously on parallel or vector computers leaving a smaller tridiagonal system to be solved on one of the processors.  相似文献   

16.
This paper describes the parallelization of a strategy to speed up the convergence of iterative methods applied to boundary element method (BEM) systems arising from problems with non-smooth boundaries and mixed boundary conditions. The aim of the work is the application of fast wavelet transforms as a black box transformation in existing boundary element codes. A new strategy was proposed, applying wavelet transforms on the interval, so it could be used in case of non-smooth coefficient matrices. Here, we describe the parallel iterative scheme and we present some of the results we have obtained.  相似文献   

17.
Barycentric coordinates are an established mathematical tool in computer graphics and geometry processing, providing a convenient way of interpolating scalar or vector data from the boundary of a planar domain to its interior. Many different recipes for barycentric coordinates exist, some offering the convenience of a closed‐form expression, some providing other desirable properties at the expense of longer computation times. For example, harmonic coordinates, which are solutions to the Laplace equation, provide a long list of desirable properties (making them suitable for a wide range of applications), but lack a closed‐form expression. We derive a new type of barycentric coordinates based on solutions to the biharmonic equation. These coordinates can be considered a natural generalization of harmonic coordinates, with the additional ability to interpolate boundary derivative data. We provide an efficient and accurate way to numerically compute the biharmonic coordinates and demonstrate their advantages over existing schemes. We show that biharmonic coordinates are especially appealing for (but not limited to) 2D shape and image deformation and have clear advantages over existing deformation methods.  相似文献   

18.
随着获取设备的发展,大尺度、高分辫率数字图像已逐步进入人们的生活,大尺度图像的梯度域编辑显得更为重要,求解大规模未知数的泊松方程是大尺度图像梯度域编辑的关键。传统多重网格算法的迭代、约束和插值操作单独进行,内存和外存间通讯量大,算法效率低,为此提出了一种面向大尺度图像梯度域编辑的并行多重网格求解泊松方程的算法。该算法利用多重网格的迭代、约束和插值过程的内存数据访问局部性和更新相关性,构造滑动工作窗口,使迭代、约束和插值操作并行运行,提高了多重网格算法求解泊松方程的计算效率。全景图拼接实验表明,所提算法的运行效率高于超松弛迭代、高斯塞德尔迭代和传统多重网格算法。  相似文献   

19.
针对简单遗传算法采用固定的交叉概率和变异概率不能总是满足当前种群的需要,影响算法的性能及效率,采用自适应的交叉概率和变异概率,且将并行技术与遗传算法相结合,提出自适应并行遗传算法,用于泊松曲线沉降预测模型的优化。实验结果表明,该算法为泊松曲线沉降预测模型的参数估计提供了一种有效的方法。  相似文献   

20.
This paper introduces a general principle for constructing multiscale kernels on surface meshes, and presents a construction of the multiscale pre‐biharmonic and multiscale biharmonic kernels. Our construction is based on an optimization problem that seeks to minimize a smoothness criterion, the Laplacian energy, subject to a sparsity inducing constraint. Namely, we use the lasso constraint, which sets an upper bound on the l1 ‐norm of the solution, to obtain a family of solutions parametrized by this upper‐bound parameter. The interplay between sparsity and smoothness results in smooth kernels that vanish away from the diagonal. We prove that the resulting kernels have gradually changing supports, consistent behavior over partial and complete meshes, and interesting limiting behaviors (e.g. in the limit of large scales, the multiscale biharmonic kernel converges to the Green's function of the biharmonic equation); in addition, these kernels are based on intrinsic quantities and so are insensitive to isometric deformations. We show empirically that our kernels are shape‐aware, are robust to noise, tessellation, and partial object, and are fast to compute. Finally, we demonstrate that the new kernels can be useful for function interpolation and shape correspondence.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号