首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 734 毫秒
1.
The development of a parallel three-dimensional direct simulation Monte Carlo (DSMC) method using unstructured cells is reported. Variable hard sphere molecular model and no time counter method are used for the molecular collision kinetics, while the cell-by-cell ray-tracing technique is implemented for particle movement. Developed serial code has been verified by comparing the results of a supersonic corner flow with those of Bird’s three-dimensional structured DSMC code. In addition, a benchmark test is performed for an orifice expanding flow to verify the parallel implementation of DSMC method by comparing with available experimental data. Static physical domain decomposition is used to distribute the workload among multiple processors by considering the estimated particle weighting distribution. Two-step multi-level graph partitioning technique is used to perform the required domain decomposition. Completed code is then applied to compute a hypersonic flow over a sphere (external flow) and the flow field of a spiral drag pump (internal flow), respectively. Results of the former are in good agreement with previous numerical results using axisymmetric DSMC method and experimental data. Results of the latter also agree well with previous numerical results.  相似文献   

2.
We describe a parallel lattice-Boltzmann code for efficient simulation of fluid flow in complex geometries. The lattice-Boltzmann model and the structure of the code are discussed. The fluid solver is highly optimized and the resulting computational core is very fast. Furthermore, communication is minimized and the novel topology-aware domain decomposition technique is shown to be very effective for large systems, allowing us to tune code execution in geographically distributed cross-site simulations. The benchmarks presented indicate that very high performance can be achieved.  相似文献   

3.
While parallel computers offer significant computational performance, it is generally necessary to evaluate several programming strategies. Two programming strategies for a fairly common problem—a periodic tridiagonal solver—are developed and evaluated. Simple model calculations as well as timing results are presented to evaluate these strategies. The particular tridiagonal solver evaluated is used in many computational fluid dynamic simulation codes. The feature that makes this algorithm unique is that these simulation codes usually require simultaneous solution for multiple right-hand-sides (RHS) of the system of equations. Each RHS solutions is independent and thus can be computed in parallel. Thus, a Gaussian-elimination-type algorithm can be used in a parallel computation and more complicated approaches such as cyclic reduction are not required. The two strategies are a transpose strategy and a distributed solver strategy. For the transpose strategy, the data are moved so that a subset of all the RHS problems is solved on each of the several processors. This usually requires significant data movement between processor memories across a network. The second strategy attempts to have the algorithm follow the data across processor boundaries in a chained manner. This usually requires significantly less data movement. An approach to accomplish this second strategy in a near-perfect load-balanced manner is developed. In addition, an algorithm will be shown to directly transform a sequential Gaussian-elimination-type algorithm into the parallel, chained, load-balanced algorithm.  相似文献   

4.
在油藏数值模拟并行计算中,提高计算速度和资源利用率是一个重要的研究方向,给出分布式并行环境下一种多层油藏模拟并行计算的整体优化方法,其特点是使用高效的区域分解方法并行求解,动态选择两种不同的计算粒度,有效地克服了负载不均衡带来的性能下降问题,实际模型计算表明,此方法策略减少了整体模拟计算时间,并获得较高加速比,采用的算法适用于一类多层油藏模型问题。  相似文献   

5.
本文采用MPI消息传递模式自主开发出适用于高超声速流动数值模拟的并行计算软件,该软件以三维Navier-Stokes方程为基本控制方程来求解层流问题,应用基于结构网格的有限体积法对计算域进行离散,采用AUSMPW+格式求解对流通量,利用MUSCL插值方法获得高阶精度,时间格式上采用LU-SGS方法进行时间迭代以加快求解定常流动的收敛过程。在高性能计算机上针对不同高超声速流动进行大规模并行计算的结果表明,所开发的CFD并行计算软件具有较高的并行计算效率,为高超声速飞行器气动力/热的准确预测提供了高效工具。  相似文献   

6.
《Parallel Computing》1997,23(9):1249-1260
A parallel algorithm for direct simulation Monte Carlo calculation of diatomic molecular rarefied gas flows is presented. For reliable simulation of such flow, an efficient molecular collision model is required. Using the molecular dynamics method, the collision of N2 molecules is simulated. For this molecular dynamics simulation, the parameter decomposition method is applied for parallel computing. By using these results, the statistical collision model of diatomic molecule is constructed. For validation this model is applied to the direct simulation Monte Carlo method to simulate the energy distribution at equilibrium condition and the structure of normal shock wave. For this DSMC calculation, the domain decomposition is applied. It is shown that the collision process of diatomic molecules can be calculated precisely and the parallel algorithm can be efficiently implemented on the parallel computer.  相似文献   

7.
A pseudospectral matrix element (PSME) method, which extended the global pseudospectral method to a multi-element scheme, has been applied to the solution of the incompressible, primitive variable, Navier-Stokes equations for complex geometries with rectilinear or curvilinear boundaries. For a simple complex geometry, a direct solution for pressure Poisson equation is feasible, while in a much more complex geometry the pressure solution is accomplished by a new implementation of domain decomposition approach. According to this approach, the computational domain can be divided into a number of overlapping subdomains where the grid points inside the overlapping area may or may not be located at the same place. Each subdomain can be mapped onto a square domain by an algebraic (or isoparametric) mapping, of simpler geometry with patched elements, in which the pressure solution is more easily obtained by an eigenfunction expansion technique for cartesian-type geometries or a direct solver for noncartesian-type geometries with rectilinear (or curvilinear) boundaries. With an iterative Schwarz alternating procedure (SAP) between subdomains, the complete solution is found. The novel feature of this approach are (i) the continuity equation is satisfied everywhere, in the interior (including the inter-element points) and on the boundary; (ii) reducing the global storage size to local (subdomain) storage locations for which parallel computation is easily implemented; (iii) producing the desired grid points without solving any grid-generating equations is easy; and (iv) consistent mass conservation holds at geometrical singular points despite their discontinuous slope (i.e., singular vorticity). Numerical examples of flow over a triangular and parabolic bump as well as flow in a bifurcation with a daughter branch entering the main channel at angles 45° and 90° are presented in this paper.  相似文献   

8.
The objective of this paper is to describe a grid-efficient parallel implementation of the Aitken–Schwarz waveform relaxation method for the heat equation problem. This new parallel domain decomposition algorithm, introduced by Garbey [M. Garbey, A direct solver for the heat equation with domain decomposition in space and time, in: Springer Ulrich Langer et al. (Ed.), Domain Decomposition in Science and Engineering XVII, vol. 60, 2007, pp. 501–508], generalizes the Aitken-like acceleration method of the additive Schwarz algorithm for elliptic problems. Although the standard Schwarz waveform relaxation algorithm has a linear rate of convergence and low numerical efficiency, it can be easily optimized with respect to cache memory access and it scales well on a parallel system as the number of subdomains increases. The Aitken-like acceleration method transforms the Schwarz algorithm into a direct solver for the parabolic problem when one knows a priori the eigenvectors of the trace transfer operator. A standard example is the linear three dimensional heat equation problem discretized with a seven point scheme on a regular Cartesian grid. The core idea of the method is to postprocess the sequence of interfaces generated by the additive Schwarz wave relaxation solver. The parallel implementation of the domain decomposition algorithm presented here is capable of achieving robustness and scalability in heterogeneous distributed computing environments and it is also naturally fault tolerant. All these features make such a numerical solver ideal for computational grid environments. This paper presents experimental results with a few loosely coupled parallel systems, remotely connected through the internet, located in Europe, Russia and the USA.  相似文献   

9.
A two phase flow CFD model has been developed for 2D spilling breaking wave simulations. A mass conservative level set method similar to Olsson and Kreiss [Olsson E, Kreiss G. A conservative level set method for two phase flow. J Comput Phys 2005;210(1):225–46] is implemented for capturing the air–water interface. The solver is discretised using a finite volume method based on a curvilinear coordinate system. A fully implicit fractional step method is used to advance simulations in time. The solver has been tested and validated by repeating benchmark results of dam breaking simulation and travelling solitary wave simulation. Finally, we employ this solver to simulate spilling breaking waves in the surf zone. Our results show that surface elevations, the location of the breaking point and undertow profiles can generally be well captured. We have also found that temporal and spatial schemes may have significant impacts on computational results.  相似文献   

10.
A general-purpose Parallel Direct Simulation Monte Carlo Code, named PDSC, is used to simulate near-continuum subsonic flow past a 2D vertical plate for studying the vortex-shedding phenomena. An unsteady time-averaging sampling method and a post-processing procedure called DREAM (DSMC Rapid Ensemble Averaging Method) have also been implemented, reducing the overall computational expense and improving the sampling quality of time-dependent flow problems in the rarefied flow regime. Parametric studies, including the temporal variable time step (TVTS) factor, the number of particles per cell, the domain size, and the Reynolds number, have been conducted, obtaining the Strouhal number and various aerodynamic coefficients of the flow. Results are compared to experimental data in the continuum regime available in the literature, demonstrating the capacity of PDSC and DREAM to simulate near-continuum vortex-shedding problems within acceptable computational time.  相似文献   

11.
A hybrid dynamic grid generation technique for two-dimensional (2D) morphing bodies and a block lower-upper symmetric Gauss-Seidel (BLU-SGS) implicit dual-time-stepping method for unsteady incompressible flows are presented for external bio-fluid simulations. To discretize the complicated computational domain around 2D morphing configurations such as fishes and insect/bird wings, the initial grids are generated by a hybrid grid strategy firstly. Body-fitted quadrilateral (quad) grids are generated first near solid bodies. An adaptive Cartesian mesh is then generated to cover the entire computational domain. Cartesian cells which overlap the quad grids are removed from the computational domain, and a gap is produced between the quad grids and the adaptive Cartesian grid. Finally triangular grids are used to fill this gap. During the unsteady movement of morphing bodies, the dynamic grids are generated by a coupling strategy of the interpolation method based on ‘Delaunay graph’ and local remeshing technique. With the motion of moving/morphing bodies, the grids are deformed according to the motion of morphing body boundaries firstly with the interpolation strategy based on ‘Delaunay graph’ proposed by Liu and Qin. Then the quality of deformed grids is checked. If the grids become too skewed, or even intersect each other, the grids are regenerated locally. After the local remeshing, the flow solution is interpolated from the old to the new grid. Based on the hybrid dynamic grid technique, an efficient implicit finite volume solver is set up also to solve the unsteady incompressible flows for external bio-fluid dynamics. The fully implicit equation is solved using a dual-time-stepping approach, coupling with the artificial compressibility method (ACM) for incompressible flows. In order to accelerate the convergence history in each sub-iteration, a block lower-upper symmetric Gauss-Seidel implicit method is introduced also into the solver. The hybrid dynamic grid generator is tested by a group of cases of morphing bodies, while the implicit unsteady solver is validated by typical unsteady incompressible flow case, and the results demonstrate the accuracy and efficiency of present solver. Finally, some applications for fish swimming and insect wing flapping are carried out to demonstrate the ability for 2D external bio-fluid simulations.  相似文献   

12.
The neutronic simulation of a nuclear reactor core is performed using the neutron transport equation, and leads to an eigenvalue problem in the steady-state case. Among the deterministic resolution methods, simplified transport (SPNSPN) or diffusion approximations are often used. The MINOS solver developed at CEA Saclay uses a mixed dual finite element method for the resolution of these problems, and has shown his efficiency. In order to take into account the heterogeneities of the geometry, a very fine mesh is generally required, and leads to expensive calculations for industrial applications. In order to take advantage of parallel computers, and to reduce the computing time and the local memory requirement, we propose here two domain decomposition methods based on the MINOS solver. The first approach is a component mode synthesis method on overlapping subdomains: several eigenmodes solutions of a local problem on each subdomain are taken as basis functions used for the resolution of the global problem on the whole domain. The second approach is an iterative method based on a non-overlapping domain decomposition with Robin interface conditions. At each iteration, we solve the problem on each subdomain with the interface conditions given by the solutions on the adjacent subdomains estimated at the previous iteration. Numerical results on parallel computers are presented for the diffusion model on realistic 2D and 3D cores.  相似文献   

13.
《Parallel Computing》2007,33(7-8):541-560
A new parallel code for the simulation of the transient, 3D dispersal of volcanic particles in the atmosphere is presented. The model equations, describing the multiphase flow dynamics of gas and solid pyroclasts ejected from the volcanic vent during explosive eruptions, are solved by a finite-volume discretization scheme and a pressure-based iterative non-linear solver suited to compressible multiphase flows. The solution of the multiphase equation set is computationally so demanding that the simulation of the transient 3D dynamics of eruptive columns would not be cost-effective on a single workstation. The new code has been parallelized by adopting an ad hoc domain partitioning scheme that enforces the load balancing in the presence of a large number of topographic blocking-cells. An optimized communication layer has been built over the Message-Passing Interface. It is shown that the present code has a remarkable efficiency on several high-performance platforms and makes it possible, for the first time, to simulate fully 3D eruptive scenarios on realistic volcano topography.  相似文献   

14.

In this paper, an adjoint solver for the multigrid-in-time software library XBraid is presented. XBraid provides a non-intrusive approach for simulating unsteady dynamics on multiple processors while parallelizing not only in space but also in the time domain (XBraid: Parallel multigrid in time, http://llnl.gov/casc/xbraid). It applies an iterative multigrid reduction in time algorithm to existing spatially parallel classical time propagators and computes the unsteady solution parallel in time. Techniques from Automatic Differentiation are used to develop a consistent discrete adjoint solver which provides sensitivity information of output quantities with respect to design parameter changes. The adjoint code runs backwards through the primal XBraid actions and accumulates gradient information parallel in time. It is highly non-intrusive as existing adjoint time propagators can easily be integrated through the adjoint interface. The adjoint code is validated on advection-dominated flow with periodic upstream boundary condition. It provides similar strong scaling results as the primal XBraid solver and offers great potential for speeding up the overall computational costs for sensitivity analysis using multiple processors.

  相似文献   

15.
This article proposes a method to parallelize the process of generating fuzzy if-then rules for pattern classification problems in order to reduce the computational time. The proposed method makes use of general purpose computation on graphics processing units (GPGPUs)’ parallel implementation with compute unified device architecture (CUDA), a development environment. CUDA contains a library to perform matrix operations in parallel. In the proposed method, published source codes of matrix multiplication are modified so that the membership values of given training patterns with antecedent fuzzy sets are calculated. In a series of computational experiments, it is shown that the computational time is reduced for those problems that require high computational effort.  相似文献   

16.
《Parallel Computing》1997,23(13):2041-2065
A parallel diagonally scaled dynamic alternating-direction-implicit (DSDADI) method is shown to be an effective algorithm for solving the 2D and 3D steady-state diffusion equation on large uniform Cartesian grids. Empirical evidence from the parallel solution of large gridsize problems suggests that the computational work done by DSDADI to converge over an Nd grid with continuous diffusivity is of lower order than O(Nd+α) for any fixed α > 0. This is in contrast to the method of diagonally scaled conjugate gradients (DSCG), for which the computational work necessary for convergence is O(Nd+1). Furthermore, the combination of diagonal scaling, spatial domain decomposition (SDD), and distributed tridiagonal system solution gives the DSDADI algorithm reasonable scalability on distributed-memory multiprocessors such as the CRAY T3D. Finally, an approximate parallel tridiagonal system solver with diminished interprocessor communication exhibits additional utility for DSDADI.  相似文献   

17.
Peigin  S.  Epstein  B.  Rubin  T.  Seror  S. 《The Journal of supercomputing》2004,27(1):49-68
We present a highly scalable parallelization of a high-accuracy 3D serial multiblock Navier-Stokes solver. The code solves the full Navier-Stokes equations and is capable of performing large-scale computations for practical configurations in an industrial enviroment. The parallelization strategy is based on the geometrical domain decomposition principle, and on the overlapped communication and computation concept. The important advantage of the strategy is that the suggested type of message-passing ensures a very high scalability of the algorithm from the network point of view, because, on the average, the communication work per processor is not increased if the number of processors is increased. The parallel multiblock-structured Navier-Stokes solver based on the parallel virtual machine (PVM) routines was implemented on 106-processors distributed memory cluster managed by the MOSIX software package. Analysis of the results demonstrated a high level of parallel efficiency (speed up) of the computational algorithm. This allowed the reduction of the execution time for large-scale computations employing 10 million of grid points, from an estimated 46 days on the SGI ORIGIN 2000 computer (in the serial single-user mode) to 5–6 hours on 106-processors cluster. Thus, the parallel multiblock full Navier-Stokes code can be successfully used for large-scale practical aerodynamic simulations of a complete aircraft on millions-points grids on a daily basis, as needed in industry.  相似文献   

18.
A parallel finite element analysis based on a domain decomposition technique (DDT) is considered. In the present DDT, an analysis domain is divided into a number of smaller subdomains without overlap. Finite element analyses of the subdomains are performed under the constraint of both displacement continuity and force equivalence among them. The constraint is satisfied through iterative calculations based on either the Uzawa algorithm or the Conjugate Gradient (CG) method. Owing to the iterative algorithm, a large scale finite element analysis can be divided into a number of smaller ones which can be carried out in parallel.

The DDT is implemented on a parallel computer network composed of a number of 32-bit microprocessors, transputers. The developed parallel calculation system named the ‘FEM server type system’ involves peculiar features such as network independence and dynamic workload balance.

The characteristics of the domain decomposition method such as computational speed and memory requirement are first examined in detail through the finite element calculations of homogeneous or inhomogeneous cracked plate subjected to a tensile load on a single CPU computer.

The ‘speedup’ and ‘performance’ features of the FEM server type system are discussed on a parallel computer system composed of up to 16 transputers, with changing network types and domain decompositions. It is clearly demonstrated that the present parallel computing system requires a much smaller amount of computational memory than the conventional finite element method and also that, due to the feature of dynamic workload balancing, high performance (over 90%) is achieved even in a large scale finite element calculation with irregular domain decomposition.  相似文献   


19.
Parallel implementation of a three-dimensional direct simulation Monte Carlo (DSMC) code employing complex data structures and dynamic memory allocation is detailed for shared memory systems using Open Multi-Processing (OpenMP). Several techniques to optimize the serial implementation of the DSMC method are first discussed. Specifically for a 3-level Cartesian grid, a Cartesian-based movement technique including particle indexing is demonstrated to result in a modest decrease in overall simulation expense of 34% compared with a ray-tracing technique combined with stored cell-connectivity. Two strategies for data localization leading to optimal usage of cache memory are demonstrated to speed up certain cell-based functions (such as collision computations) by a factor of 3.38–4.36. The shared-memory parallel implementation using OpenMP is then described in detail. Synchronization points and related critical sections are identified as major factors that impact the OpenMP parallel performance. Techniques to remove all such synchronization points in the OpenMP implementation of the DSMC method are outlined. For dual-core and quad-core systems, speedups of 1.99 and 3.74, respectively, are obtained for a (free-stream flow) test simulation with low granularity. Finally, the parallel performance of identical source code employing OpenMP is shown to be strongly correlated to the underlying computer architecture. Both Symmetric Multiprocessor (SMP) and non-uniform memory access (NUMA) systems are studied in order to achieve a better understanding of their impacts on parallel scalability when using OpenMP.  相似文献   

20.
A novel gray-level image encryption/decryption scheme is proposed, which is based on quantum Fourier transform and double random-phase encoding technique. The biggest contribution of our work lies in that it is the first time that the double random-phase encoding technique is generalized to quantum scenarios. As the encryption keys, two phase coding operations are applied in the quantum image spatial domain and the Fourier transform domain respectively. Only applying the correct keys, the original image can be retrieved successfully. Because all operations in quantum computation must be invertible, decryption is the inverse of the encryption process. A detailed theoretical analysis is given to clarify its robustness, computational complexity and advantages over its classical counterparts. It paves the way for introducing more optical information processing techniques into quantum scenarios.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号