共查询到6条相似文献,搜索用时 15 毫秒
1.
Sudip K. Seal Kalyan S. Perumalla Steven P. Hirshman 《Journal of Parallel and Distributed Computing》2013
Direct solvers based on prefix computation and cyclic reduction algorithms exploit the special structure of tridiagonal systems of equations to deliver better parallel performance compared to those designed for more general systems of equations. This performance advantage is even more pronounced for block tridiagonal systems. In this paper, we re-examine the performances of these two algorithms taking the effects of block size into account. Depending on the block size, the parameter space spanned by the number of block rows, size of the blocks and the processor count is shown to favor one or the other of the two algorithms. A critical block size that separates these two regions is shown to emerge and its dependence both on problem dependent parameters and on machine-specific constants is established. Empirical verification of these analytical findings is carried out on up to 2048 cores of a Cray XT4 system. 相似文献
2.
《International Journal of Parallel, Emergent and Distributed Systems》2012,27(3-4):227-237
ABSTRACT This paper is concerned with the solution of block tridiagonal linear systems by the preconditioned conjugate gradient (PCG) method. If we consider a block AGE splitting of the coefficient matrix, it is possible to derive an additive polynomial preconditioner and to give conditions for such preconditioner to be symmetric positive definite. Numerical experiments on diffusion problem are carried out on Cray Y-MP in order to evaluate the effectiveness of the parallel polynomial preconditioner. 相似文献
3.
We construct a parallel algorithm, suitable for distributed memory architectures, of an explicit shock-capturing finite volume method for solving the two-dimensional shallow water equations. The finite volume method is based on the very popular approximate Riemann solver of Roe and is extended to second order spatial accuracy by an appropriate TVD technique. The parallel code is applied to distributed memory architectures using domain decomposition techniques and we investigate its performance on a grid computer and on a Distributed Shared Memory supercomputer. The effectiveness of the parallel algorithm is considered for specific benchmark test cases. The performance of the realization measured in terms of execution time and speedup factors reveals the efficiency of the implementation. 相似文献
4.
A. Gorobets 《Computers & Fluids》2010,39(3):525-801
A code for the direct numerical simulation (DNS) of incompressible flows with one periodic direction has been developed. It provides a fairly good performance on both Beowulf clusters and supercomputers. Since the code is fully explicit, from a parallel point-of-view, the main bottleneck is the Poisson equation. To solve it, a Fourier diagonalization is applied in the periodic direction to decompose the original 3D system into a set of mutually independent 2D systems. Then, different strategies can be used to solved them. In the previous version of the code, that was conceived for low-cost PC clusters with poor network performance, a Direct Schur-complement Decomposition (DSD) algorithm was used to solve them. Such a method, that is very efficient for PC clusters, cannot be used with an arbitrarily large number of processors and mesh sizes, mainly due to the RAM memory requirements. To do so, a new version of the solver is presented in this paper. It is based on the DSD algorithm that is used as a preconditioner for a Conjugate Gradient method. Numerical experiments showing the scalability and the flexibility of the method on both the MareNostrum supercomputer and a PC cluster with a conventional 100 Mbits/s network are presented and discussed. Finally, illustrative DNS results of an air-filled differentially heated cavity at Ra = 1011 are also presented. 相似文献
5.
Daniel Maurer Christian Wieners 《Parallel Computing》2011,37(12):742-758
In this work we present a new parallel direct linear solver for matrices resulting from finite element problems. The algorithm follows the nested dissection approach, where the resulting Schur complements are also distributed in parallel. The sparsity structure of the finite element matrices is used to pre-compute an efficient block structure for the LU factors. We demonstrate the performance and the parallel scaling behavior by several test examples. 相似文献
6.
The one-dimensional Vlasov–Poisson system is considered and a particle method is developed to approximate solutions without compact support which tend to a fixed background of charge as |x|→ ∞. Such a system of equations can be used to model kinetic phenomena occurring in plasma physics. A localized particle method is constructed and implemented using the fact that solutions to the Vlasov–Poisson system propagate at finite speeds. Finally, the numerical method is utilized to ascertain information regarding the time asymptotics of the generated electrostatic field. 相似文献