期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Optimization of multigrid based elliptic solver for large scale simulations in the FLASH code

Christopher Daley Marcos Vanella Anshu Dubey Klaus Weide Elias Balaras 《Concurrency and Computation》2012,24(18):2346-2361

FLASH is a multiphysics multiscale adaptive mesh refinement (AMR) code originally designed for simulation of reactive flows often found in Astrophysics. With its wide user base and flexible applications configuration capability, FLASH has a dual task of maintaining scalability and portability in all its solvers. The scalability of fully explicit solvers in the code is tied very closely to that of the underlying mesh. Others such as the Poisson solver based on a multigrid method have more complex scaling behavior. Multigrid methods suffer from processor starvation and dominating communication costs at coarser grids with increase in the number of processors. In this paper, we propose a combination of uniform grid mesh with AMR mesh, and the merger of two different sets of solvers to overcome the scalability limitation of the Poisson solver in FLASH. The principal challenge in the proposed merger is the efficiency of the communication algorithm to map the mesh back and forth between uniform grid and AMR. We present two different parallel mapping algorithms and also discuss results from performance studies of the two implementations. Copyright © 2012 John Wiley & Sons, Ltd. 相似文献

2.

SPH-GPU并行计算在风沙流中的应用

下载免费PDF全文

梁岚博金阿芳闻腾腾《计算机工程与应用》2022,58(1):248-254

为了实现小尺度范围风沙运动的真实感模拟,采用基于拉格朗日力学无网格形式的光滑粒子流体动力学（smooth particle hydrodynamics,SPH）方法解决了基于欧拉网格法因网格大变形或者变形边界等引起的各种问题,并克服了不能用固定欧拉网格追踪任意单颗粒子运动轨迹的困难,因此该方法在研究风沙运动方面有着独特的优势。然而,随着风沙流动中SPH粒子数目的增加,该方法计算效率低,计算规模大的缺陷在风沙模拟过程中尤为明显。为了提高其计算效率,在CUDA软硬件平台上,建立SPH-GPU并行加速的二维气沙两相耦合模型,对串行的热点程序进行分析,找出最耗时且适合并行的热点程序;其次对GPU并行计算模型进行验证,宏观上得到了沙粒群运动的时空变化规律,微观上得到了典型沙粒的跃移轨迹和变异的尖角轨迹;最后对比了三种不同粒子数下CPU与GPU的计算效率。模拟结果证明SPH-GPU并行计算方法能够进一步应用在风沙流的数值模拟研究中。相似文献

3.

Numerical simulation of three-dimensional thermal convection on the array processor DAP 510

W. Erhard M. Schfer 《Concurrency and Computation》1992,4(1):19-35

In this paper we deal with the numerical simulation of time dependent three-dimensional thermal convection on the array processor DAP 510. Applying finite differences in combination with a pressure correction method to the underlying non-linear system of partial differential equations, we reduce the numerical solution of the problem to the solution of a sequence of sparse linear systems. Using polynomial preconditioned conjugate gradient methods for the solution of these systems results in a highly parallel algorithm for the simulation of the considered flows on the DAP 510. Using this parallel algorithm, data can be mapped in different ways onto the processor array. Depending on the number of grid points, several methods are shown. Numerical experiments illustrate the capabilities of the proposed algorithm. 相似文献

4.

Lightweight particle-based real-time fluid simulation for mobile environment

《Simulation Modelling Practice and Theory》2017

This paper presents a real-time lightweight fluid simulation based on a particle fluid technique developed for mobile environment . The Bullet physics engine and smoothed particle hydrodynamic (SPH) fluid algorithm will be used for our lightweight fluid simulation. First, we describe an advanced collision detection mechanism that will be used. By using this method, less computational resources are required. Secondly, we present a simplified SPH algorithm where nearby particles are grouped together to minimize the number of calculations. By decreasing the number of particles, an improved computational performance is expected. Finally, the ARM NEON based parallel computing technique was enabled to reduce execution time by lowering the number of arithmetic instructions. Several experiments are carried out where the experimental results indicate the first technique led to a 50% improvement in performance. The second technique provided a 17% overall improvement. The third technique delivered a performance improvement within the range of 26%–40%. Overall, the experimental results show that the proposed techniques provided an accumulative performance improvement of approximately 120% for all applied methodologies. 相似文献

5.

基于SMP集群的激光化学反应模拟效率分析*

李鸿健唐红豆育升孙世新《计算机应用研究》2011,28(4):1232-1234

基于半经典分子动力学模型,在SMP集群中实现激光化学反应双层并行模拟系统。结合粗粒度的原子分解算法和细粒度的矩阵并行乘法实现激光化学反应模拟中力计算部分的并行化,分析粒度划分对半经典分子动力学模拟并行效率的影响。在SMP集群中测试表明,采用128个处理器模拟由500个C原子构成的分子体系,并行效率可达70%。在CPU数量固定的情况下,SMP节点内的细粒度的并行对提高半经典分子动力学模拟并行效率影响较大。该系统能够模拟大分子体系的激光化学反应,在提高加速比的同时保证计算资源的利用效率,满足激光化学反应模拟需求。相似文献

6.

基于粒子聚合重采样的移动机器人蒙特卡洛定位

李天成孙树栋司书宾王军强《机器人》2010,32(5):674-680

提出了一种基于粒子聚合重采样的移动机器人聚合蒙特卡洛定位（Merge Monte Carlo localization,Merge-MCL）方法．首先将移动机器人作业空间划分为离散栅格,建立栅格集,然后提出一种基于粒子空间相近性的粒子聚合技术, 在保证粒子空间分布合理性的同时自适应调整粒子集规模．提出的粒子聚合重采样方法能够缓解粒子权值退化问题, 并避免了传统重采样方法导致的多样性匮乏问题．仿真结果表明,粒子聚合重采样方法能够有效控制粒子集规模, 聚合蒙特卡洛定位方法是鲁棒、有效的．相似文献

7.

Computation and visualization of discrete particle systems on gLite-based grid

Arnas Ka?eniauskas Rimantas Ka?ianauskas 《Advances in Engineering Software》2011,42(5):237-246

Three-dimensional simulation of discrete particle systems is performed by the discrete element method (DEM) software on the gLite-based BalticGrid infrastructure. The performance of a parallel algorithm for particles exchanging processors is investigated by using a number of benchmarks. Polydispersed particle systems are visualized by a novel grid e-service VizLitG designed for convenient access and interactive visualization of remote data files located on the grid. Partial dataset transfer from the storage element is implemented in the visualization e-service. The efficiency tests of VizLitG are performed on the datasets of different sizes. Two granular problems associated with triaxial compaction and hopper discharge are solved. 相似文献

8.

Real-time Simulation of Gas Based on Smoothed Particle Hydrodynamics

ZHU Xiao-lin FAN Cheng-kai LIU Yang-yang 《计算机辅助绘图.设计与制造(英文版)》2015,(1):68-73

This paper extends the SPH method to gas simulation. The SPH(Smoothed Particles Hydrodynamics) method is the most popular method of flow simulation, which is widely used in large-scale liquid simulation. However, it is not found to apply to gas simulation, since those methods based on SPH can’t be used in real-time simulation due to their enormous particles and huge computation. This paper proposes a method for gas simulation based on SPH with a small number of particles. Firstly, the method computes the position and density of each particle in each point-in-time, and outlines the shape of the simulated gas based on those particles. Secondly the method uses the grid technique to refine the shape with the diffusion of particle’s density under the control of grid, and get more lifelike simulation result. Each grid will be assigned density according to the particles in it. The density determines the final appearance of the grid. For ensuring the natural transition of the color between adjacent grids, we give a diffuse process of density between these grids and assign appropriate values to vertexes of these grids. The experimental results show that the proposed method can give better gas simulation and meet the request of real-time. 相似文献

9.

分布式并行计算在交通网络仿真中的应用*

高林杰隽志才张伟华《计算机应用研究》2007,24(8):251-254

根据交通网络仿真的并行特征采用域分解方法设计交通并行仿真系统的框架,把交通网络分为几个子网,集群系统的每个节点机分别负责其中的一个子网,提出基于车辆数负载的网络分割算法来平衡各子网的负载量,并分析子网之间的通信机理.同时,在基于MPI 的并行计算平台上实现设计的并行仿真系统.通过实例表明,提出的并行算法能大大提高交通网络仿真的速度和效率. 相似文献

10.

A new parallel block aggregated algorithm for solving Markov chains

Abderezak Touzene 《The Journal of supercomputing》2012,62(1):573-587

In this paper, we propose a new scalable parallel block aggregated iterative method (PBA) for computing the stationary distribution of a Markov chain. The PBA technique is based on aggregation of groups (block) of Markov chain states. Scalability of the PBA algorithm depends on varying the number of blocks and their size, assigned to each processor. PBA solves the aggregated blocks very efficiently using a modified LU factorization technique. Some Markov chains have been tested to compare the performance of PBA algorithm with other block techniques such as parallel block Jacobi and block Gauss–Seidel. In all the tested models PBA outperforms the other parallel block methods. 相似文献

11.

Massively parallel SIMD simulation of markovian DEDS: Event and time synchronous methods

Stephen G. Strickland Robert G. Phelan 《Discrete Event Dynamic Systems》1995,5(2-3):141-166

We examine two schemes for parametric parallel simulation on SIMD supercomputers. In SIMD machines, the parallel processors execute a common instruction stream using local data-under the control of a front-end processor. In contrast to most parallel simulation approaches-which simulate a single system using multiple processors-we simulate distinct parametric variants at each processor. We extract some of the common computation embedded in these simulations and perform it on the front-end, leaving the rest to the parallel processors.The first simulation approach, which we call time synchronous, is essentially Vakili's standard clock. This approach generates a uniformized event process on the front-end processor which is thinned at each back-end processor based on local state information. The second scheme, which we call event synchronous, generates a standard Poisson process on the front-end, which is time-scaled and marked on the back-end processors.We develop a framawork for comparing these methods based on their simulated event rate (number of simulated events per real time unit). We show that the time synchronous method can be tuned to optimize the event rate for a given family of systems and we solve this optimal standard clock problem for several test cases. Finally we describe implementation issues peculiar to the SIMD architecture. Our focus is primarily on the M/M/1/K queue, but the methods extend to more general Jackson networks. 相似文献

12.

负载平衡无关的并行程序最适处理器网格选择

张云泉施巍松《软件学报》2000,11(12):1674-1680

用户在编写并行程序时,通常是把物理处理器看成逻辑的处理器(进程)网格,以便于算法的实现.随着用户可用处理器的不断增多,可选择的网格形状也随之增加,如何为基于消息传递的并行程序选择合适的、能发挥出并行机潜在性能的处理器网格形状,是一个迫切需要解决的问题.在提出基于通信点概念的最小度数通信点集合法之后,通过对并行程序通信模式的分析,试图解决与负载平衡无关的并行程序的最适处理器网格选择问题.通过对ScaLAPACK软件包中的一个并行测试程序——并行Cholesky(对称正定矩阵分解)通信点集合度的分析,此方法成功地选择了最适处理器网格形状,并与实验结果相一致. 相似文献

13.

A parallel monotone iterative method for the numerical solution of multi-dimensional semiconductor Poisson equation

Yiming Li 《Computer Physics Communications》2003,153(3):359-372

Various self-consistent semiconductor device simulation approaches require the solution of Poisson equation that describes the potential distribution for a specified doping profile (or charge density). In this paper, we solve the multi-dimensional semiconductor nonlinear Poisson equation numerically with the finite volume method and the monotone iterative method on a Linux-cluster. Based on the nonlinear property of the Poisson equation, the proposed method converges monotonically for arbitrary initial guesses. Compared with the Newton's iterative method, it is easy implementing, relatively robust and fast with much less computation time, and its algorithm is inherently parallel in large-scale computing. The presented method has been successfully implemented; the developed parallel nonlinear Poisson solver tested on a variety of devices shows it has good efficiency and robustness. Benchmarks are also included to demonstrate the excellent parallel performance of the method. 相似文献

14.

Parallelization support for coupled grid applications with small meshes

Lorie M. Liebrock Ken Kennedy 《Concurrency and Computation》1996,8(8):581-615

Composite grid problems arise in important application areas, e.g. reactor simulation. Related physical phenomena are inherently parallel and their simulations are computationally intensive. Unfortunately, parallel languages, such as High Performance Fortran, provide little support for these problems. We illustrate topological connections via a coupling statement, develop a programming style and transformation system to support composite grid code development, and develop an algorithm that automatically determines distributions for composite grid problems with small meshes. A mesh is classified as small if the amount of computational work associated with the mesh is less than the amount of work to be assigned to a single processor. Precompiler transformations, such as cloning for alignment specification, are described. Excerpts from a High Performance Fortran program before and after transformation illustrate user programming style and transformation issues. Our distribution algorithm's alignment and distribution specifications are input to the transformed High Performance Fortran programs which applies the mapping for execution of the simulation code. Some advantages of this approach are: transformations are applied before compilation and allow communication optimization; data distribution may be determined for any number of problems without recompilation; user determined distribution for parallelization is unnecessary; portability is improved. We validate the topology-based data distribution algorithm using a number of reactor configurations. Two random distribution algorithms provide a basis of comparison with measures of load balance and communication cost. Experiments show that the topology-based distribution algorithm almost always obtains load balance at least as good as, and often significantly better than, random algorithms while reducing the total communication per iteration from 50% to as much as a factor of ten. 相似文献

15.

PLUM : Parallel Load Balancing for Adaptive Unstructured Meshes

Leonid Oliker Rupak Biswas 《Journal of Parallel and Distributed Computing》1998,52(2):75

Mesh adaption is a powerful tool for efficient unstructured-grid computations but causes load imbalance among processors on a parallel machine. We present a novel method calledPLUMto dynamically balance the processor workloads with a global view. This paper describes the implementation and integration of all major components within our dynamic load balancing strategy for adaptive grid calculations. Mesh adaption, repartitioning, processor assignment, and remapping are critical components of the framework that must be accomplished rapidly and efficiently so as not to cause a significant overhead to the numerical simulation. A data redistribution model is also presented that predicts the remapping cost on the SP2. This model is required to determine whether the gain from a balanced workload distribution offsets the cost of data movement. Results presented in this paper demonstrate thatPLUMis an effective dynamic load balancing strategy which remains viable on a large number of processors. 相似文献

16.

一种面向众核处理器的嵌套循环多维并行识别方法*

李颖颖庞建民李雁冰翟胜伟《计算机应用研究》2018,35(11)

现有并行识别方法用于众核处理器时存在一定不足,当选择的循环并行维迭代数较少时可能导致严重地负载不均衡。针对这一问题,提出了一种面向众核处理器的多维并行识别方法,在现有并行识别方法无法做到较好的负载均衡时,选择嵌套循环的多个维进行并行,将多个并行维的迭代空间合并后再做任务划分,减少负载不均衡对程序并行效率的影响。此方法已在课题组开发的自动并行化系统中进行了实现,实际应用过程中能够提升一些应用程序在众核处理器上并行执行的效率。相似文献

17.

基于CUDA的弱可压SPH流体建模与仿真

段兴锋任鸿翔神和龙《计算机工程与科学》2018,40(8):1375-1382

为了实现小尺度范围流体场景的实时、真实感模拟,采用弱可压SPH方法对水体进行建模,提出了流体计算的CPU GPU混合架构计算方法。针对邻域粒子查找算法影响流体计算效率的问题,采用三维空间网格对整个模拟区域进行均匀网格划分,利用并行前缀求和和并行计数排序实现邻域粒子的查找。最后,采用基于CUDA并行加速的Marching Cubes算法实现流体表面提取,利用环境贴图表现流体的反射和折射效果,实现流体表面着色。实验结果表明,所提出的流体建模和模拟算法能实现小尺度范围流体的实时计算和渲染,绘制出水的波动、翻卷和木块在水中晃动的动态效果,当粒子数达到1 048 576个时,GPU并行计算方法相较CPU方法的加速比为60.7。相似文献

18.

基于申威众核处理器的HOG特征提取算法并行加速

赵美婷刘轶刘锐宋凯达钱德沛《计算机工程与科学》2017,39(4):611-618

HOG特征是一种简单高效的常用来进行物体检测的特征描述子,广泛应用于行人检测等领域,然而在处理海量图片时却面临着严峻的性能挑战。解决方法之一就是通过使用"神威太湖之光"超级计算机的处理器节点对海量图像背景下的行人检测算法进行加速。主要采用了两种并行方案:一种是一个处理器同时处理4张图片,另一种是同时处理256张图片。大量的串行和并行处理的实验测试结果表明,对高分辨率多幅图像的并行处理可采用第一种方案,加速比可达83倍;对低分辨率图像可采用第二种方案,加速比最高可达到95。两种并行设计方案在"神威太湖之光"的多处理器节点上具有很好的可扩展性能。相似文献

19.

OpenFoam中多面体网格生成的MPI+OpenMP混合并行方法

刘江刘文博张矩《计算机科学》2022,49(3):3-10

网格生成是计算流体力学中非常重要的一环,大规模数值模拟过程中对网格精度要求的提高会导致网格生成所耗的时间增加。文中基于OpenFoam开源软件中的网格生成算法,主要研究多面体网格的并行生成,并提出OpenMP和MPI混合并行的多面体网格生成方法。通过理论分析得到,使用混合并行方法生成相同质量的网格时,混合并行方法生成网格的时间消耗随着线程数量和网格单元数量的增加而减少。3组使用不同求解器的数值模拟实验结果表明,该混合并行方法不但可以保证生成网格的质量——可以正常进行数值计算模拟且模拟结果与原方法相比几乎没有差别,而且生成同样质量与数量网格的耗时最多可以缩短至未使用OpenMP并行方法之耗时的1/4以内。相似文献

20.

Vortex methods for incompressible flow simulations on the GPU

Diego Rossinelli Petros Koumoutsakos 《The Visual computer》2008,24(7-9):699-708

We present a remeshed vortex particle method for incompressible flow simulations on GPUs. The particles are convected in a Lagrangian frame and are periodically reinitialized on a regular grid. The grid is used in addition to solve for the velocity–vorticity Poisson equation and for the computation of the diffusion operators. In the present GPU implementation of particle methods, the remeshing and the solution of the Poisson equation rely on fast and efficient mesh-particle interpolations. We demonstrate that particle remeshing introduces minimal artificial dissipation, enables a faster computation of differential operators on particles over grid-free techniques and can be efficiently implemented on GPUs. The results demonstrate that, contrary to common practice in particle simulations, it is necessary to remesh the (vortex) particle locations in order to solve accurately the equations they discretize, without compromising the speed of the method. The present method leads to simulations of incompressible vortical flows on GPUs with unprecedented accuracy and efficiency. 相似文献