首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A parallel implementation of the finite volume method for three-dimensional, time-dependent, thermal convective flows is presented. The algebraic equations resulting from the finite volume discretization, including a pressure equation which consumes most of the computation time, are solved by a parallel multigrid method. A flexible parallel code has been implemented on the Intel Paragon, the Cray T3D, and the IBM SP2 by using domain decomposition techniques and the MPI communication software. The code can use 1D, 2D, or 3D partitions as required by different geometries, and is easily ported to other parallel systems. Numerical solutions for air (Prandtl number Pr = 0.733) with various Rayleigh numbers up to 107 are discussed.  相似文献   

2.
基于分块数据结构的冲击问题并行计算   总被引:1,自引:0,他引:1  
针对三维冲击问题,基于分块数据结构在共享内存并行机上实现OpenMP并行计算.分块数据结构不仅能有效利用计算机多层存储结构,而且增加OpenMP的并行粒度.数值实验表明:在使用分块数据结构后,串行程序的计算速度能提高3倍.通过柱体冲击平板数值模拟实验讨论并行程序的加速比和效率,表明并行程序能有效减少总计算时间.  相似文献   

3.
In a two- or three-dimensional image array, the computation of Euclidean distance transform (EDT) is an important task. With the increasing application of 3D voxel images, it is useful to consider the distance transform of a 3D digital image array. Because the EDT computation is a global operation, it is prohibitively time consuming when performing the EDT for image processing. In order to provide the efficient transform computations, parallelism is employed. We first derive several important geometry relations and properties among parallel planes. We then, develop a parallel algorithm for the three-dimensional Euclidean distance transform (3D-EDT) on the EREW PRAM computation model. The time complexity of our parallel algorithm is O(log/sup 2/ N) for an N/spl times/N/spl times/N image array and this is currently the best known result. A generalized parallel algorithm for the 3D-EDT is also proposed. We implement the proposed algorithms sequentially, the performance of which exceeds the existing algorithms (proposed by Yamada, 1984). Finally, we develop the corresponding parallel programs on both the emulated EREW PRAM model computer and the IBM SP2 to verify the speed-up properties of the proposed algorithms.  相似文献   

4.
The development and validation of a parallel unstructured tetrahedral non-nested multigrid (MG) method for simulation of unsteady 3D incompressible viscous flow is presented. The Navier-Stokes solver is based on the artificial compressibility method (ACM) and a higher-order characteristics-based finite-volume scheme on unstructured MG. Unsteady flow is calculated with an implicit dual time stepping scheme. The parallelization of the solver is achieved by a MG domain decomposition approach (MG-DD), using the Single Program Multiple Data (SPMD) programming paradigm. The Message-Passing Interface (MPI) Library is used for communication of data and loop arrays are decomposed using the OpenMP standard. The parallel codes using single grid and MG are used to simulate steady and unsteady incompressible viscous flows for a 3D lid-driven cavity flow for validation and performance evaluation purposes. The speedups and efficiencies obtained by both the parallel single grid and MG solvers are reasonably good for all test cases, using up to 32 processors on the SGI Origin 3400. The parallel results obtained agree well with those of serial solvers and with numerical solutions obtained by other researchers, as well as experimental measurements.  相似文献   

5.
Numerical experiments on multischeme computation to solve ordinary differential equation initial-value problems have been performed on a multiprocessor computer. A computation network of the schemes schedules the multischeme computation in parallel.  相似文献   

6.
Distance transforms are an important computational tool for the processing of binary images. For ann ×n image, distance transforms can be computed in time \(\mathcal{O}\) (n) on a mesh-connected computer and in polylogarithmic time on hypercube related structures. We investigate the possibilities of computing distance transforms in polylogarithmic time on the pyramid computer and the mesh of trees. For the pyramid, we obtain a polynomial lower bound using a result by Miller and Stout, so we turn our attention to the mesh of trees. We give a very simple \(\mathcal{O}\) (logn) algorithm for the distance transform with respect to theL 1-metric, an \(\mathcal{O}\) (log2 n) algorithm for the transform with respect to theL -metric, and find that the Euclidean metric is much more difficult. Based on evidence from number theory, we conjecture the impossibility of computing the Euclidean distance transform in polylogarithmic time on a mesh of trees. Instead, we approximate the distance transform up to a given error. This works for anyL k -metric and takes time \(\mathcal{O}\) (log3 n).  相似文献   

7.
Some contour properties can be derived in parallel by a string or cycle of automata in linear time, faster than can be done with a single processor. In particular, the intersection points of two contours, the straightness of a line, the union or intersection of two contours, and polygonal approximations of a contour are computed in linear time.  相似文献   

8.
Distance transforms are an important computational tool for the processing of binary images. For ann ×n image, distance transforms can be computed in time (n) on a mesh-connected computer and in polylogarithmic time on hypercube related structures. We investigate the possibilities of computing distance transforms in polylogarithmic time on the pyramid computer and the mesh of trees. For the pyramid, we obtain a polynomial lower bound using a result by Miller and Stout, so we turn our attention to the mesh of trees. We give a very simple (logn) algorithm for the distance transform with respect to theL 1-metric, an (log2 n) algorithm for the transform with respect to theL -metric, and find that the Euclidean metric is much more difficult. Based on evidence from number theory, we conjecture the impossibility of computing the Euclidean distance transform in polylogarithmic time on a mesh of trees. Instead, we approximate the distance transform up to a given error. This works for anyL k -metric and takes time (log3 n).This research was supported by the Deutsche Forschungsgemeinschaft under Grant Al 253/1-1, Schwerpunktprogramm Datenstrukturen und effiziente Algorithmen.  相似文献   

9.
Dr. R. L. Voller 《Computing》1989,42(2-3):245-258
An algorithm is presented to compute approximations as well as continuous bounds for solutions of weakly nonlinear elliptic boundary value problems. The given problem is majorized in some sense and the obtained new problem is solved by a finite element method. The finite element solution is computed by a monotone iteration process and at last transformed to a continuous (lower) bound for a solution. Convergence is proved and mesh refinement effects are discussed. Two numerical examples are given.  相似文献   

10.
A parallel multischeme computation in the solutions of differential equation initial-value problems has been studied. The mathematical switch of computation history is used successfully in the identification of the best approximation among all available ones at a computing step. A solution correction factor is also developed to achieve an extra four digits in solution accuracy. Based on our results, if the truncation error of computation history as we defined it can be properly utilized, then a computation engaging high-order schemes or using fine grids may be unnecessary.  相似文献   

11.
12.
Parallel computation for two-dimensional convective flows in cavities with adiabatic horizontal boundaries and driven by differential heating of the two vertical end walls are investigated using the Intel Paragon, Intel Touchstone Delta, Cray T3D and IBM SP2. The numerical scheme, including a parallel multigrid solver, and domain decomposition techniques for parallel computing are discussed in detail. Performance comparisons are made for the different parallel systems, and numerical results using various numbers of processors are discussed. © 1997 John Wiley & Sons, Ltd.  相似文献   

13.
We present an overview of the ACE system, a sound and complete parallel implementation of Prolog that exploits parallelism transparently (i.e., without any user intervention) from AI programs and symbolic applications coded in Prolog. ACE simultaneously exploits all the major forms of parallelism – Or-parallelism, Independent And-parallelism, and Dependent And-parallelism – found in Prolog programs. These three varieties of parallelism are discussed in detail, along with the problems encountered in their practical exploitation. Our solutions to these problems, incorporated in the ACE system, are presented. The ACE system has been implemented on Sequent Symmetry and Sun Sparc Multiprocessors; performance results from this implementation for several AI programs are presented, which confirm the effectiveness of the choices made. This revised version was published online in June 2006 with corrections to the Cover Date.  相似文献   

14.
We present a high-order method employing Jacobi polynomial-based shape functions, as an alternative to the typical Legendre polynomial-based shape functions in solid mechanics, for solving dynamic three-dimensional geometrically nonlinear elasticity problems. We demonstrate that the method has an exponential convergence rate spatially and a second-order accuracy temporally for the four classes of problems of linear/geometrically nonlinear elastostatics/elastodynamics. The method is parallelized through domain decomposition and message passing interface (MPI), and is scaled to over 2000 processors with high parallel performance.  相似文献   

15.
A strategy is presented for the solution of the fully nonlinear transient structural dynamics problem in a coarse-grained parallel processing environment. Emphasis is placed on the analysis of three-dimensional framed structures subjected to arbitrary dynamic loading and, in particular, steel building frames subject to earthquake loading. Concerns include long-duration dynamic loading, geometric and material nonlinearity, and the wide distribution of vibrational frequencies found in frame models. The implicit domain decomposition method described employs substructuring techniques and then a preconditioned conjugate gradient algorithm for the iterative solution of the reduced set of unknowns along the substructure interfaces. Substructuring is shown to provide a natural preconditioner for effective parallel iterative solution.  相似文献   

16.
In this article, parallel computation of manipulator inverse dynamics is investigated. A hierarchical graph-based mapping approach is devised to analyze the inherent parallelism in the Newton-Euler formulation at several computational levels, and to derive the features of an abstract architecture for exploitation of parallelism. At each level, a parallel algorithm represents the application of a parallel model of computation that transforms the computation into a graph whose structure defines the features of an abstract architecture, i.e., number of processors, communication structure, etc. Data flow analysis is employed to derive the time lower bound in the computation as well as the sequencing of the abstract architecture. The features of the target architecture are defined by optimization of the abstract architecture to exploit maximum parallelism while minimizing various overheads and architectural complexity. An algorithmically specialized, highly parallel, MIMD-SIMD architecture is designed and implemented that is capable of efficient exploitation of parallelism at several computational levels. The computation time of the Newton-Euler formulation for a 6-degree-of-freedom (dof) general manipulator is measured as 187 μs. The increase in computation time for each additional dof is 23 μs, which leads to a computation time of less than 500 μs, even for a 12-dof redundant arm.  相似文献   

17.
A new approach to calculate three-dimensional parabolic flows is presented. The flow field is computed by calculating velocity along a set of streamlines. The dependent variables commonly used in the computation of three-dimensional flows are the three velocity components. In contrast, the dependent variables in the present approach are the streamwise velocity and the two coordinates, in the cross-stream plane, of the chosen streamlines. The streamwise velocity is calculated from the finite difference equations obtained by applying Euler's momentum theorem to streamtubes constructed around the chosen streamlines; the streamline coordinates are calculated from the conservation of mass. Results of the calculations, based on the present approach, are compared with the experimental data for flow through rectangular ducts; the agreement is satisfactory.  相似文献   

18.
The nonlinear magnetostatic fields in a saturable reactor are calculated by the method of magnetic circuits.  相似文献   

19.
With the advent of multicore processors, it has become imperative to write parallel programs if one wishes to exploit the next generation of processors. This paper deals with skyline computation as a case study of parallelizing database operations on multicore architectures. First we parallelize three sequential skyline algorithms, BBS, SFS, and SSkyline, to see if the design principles of sequential skyline computation also extend to parallel skyline computation. Then we develop a new parallel skyline algorithm PSkyline based on the divide-and-conquer strategy. Experimental results show that all the algorithms successfully utilize multiple cores to achieve a reasonable speedup. In particular, PSkyline achieves a speedup approximately proportional to the number of cores when it needs a parallel computation the most.  相似文献   

20.
Cluster architectures are increasingly used to solve high‐performance computing applications. To build more computational power, sets of clusters, interconnected by high‐speed networks, can be used in an elaboration to form a cluster grid. In this type of architecture, it is difficult to exploit all the internal resources of a cluster, because each one can be shielded by a firewall and is usually configured with machines where there is only one visible IP front‐end node that hides all its internal nodes from the external world. The exploitation of resources is even more complicated if we consider the general case where each internal node of a cluster can be a front‐end node of an another cluster. This type of architecture has been defined as a multilayer cluster grid. In this paper, a Parallel Virtual Machine (PVM) extension is presented which, through a middleware solution based on the H2O distributed metacomputing framework, permits the building of a parallel virtual machine in a multilayer cluster grid environment. In addition, the existing code written for PVM can be executed into this environment without modifications. Copyright © 2007 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号