首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 116 毫秒
在千万亿次计算能力的驱动下,数值软件的发展进入了一个以海量并行为基本特征的历史转折期,可扩展和可容错成为大规模数值模拟的两大关键技术.petaPar模拟程序是以对传统数值技术形成优势互补的无网格类方法为切入点,面向千万亿次级计算而开发的下一代新兴通用数值模拟程序. petaPar在统一架构下实现了光滑粒子动力学(smoothed particle hydrodynamics ,SPH )和物质点法(material point method ,M PM )两种最为成熟和有效的无网格/粒子算法,支持多种强度、失效模型和状态方程;其中M PM 支持改进的接触算法,可以处理上百万离散物体的非连续变形和相互作用计算.系统具有以下特点:1)高可扩展.实现单核单Patch极端情形下计算和通信的完全重叠,支持动态负载均衡;2)可容错.支持无人值守变进程重启动,在系统硬件出现局部热故障时可以不中止计算;3)适应硬件体系结构异构架构的变化趋势,同时支持flat M PI和M PI+Pthreads并行模型.程序在Titan千万亿次超级计算机上进行了全系统规模的可扩展性测试,结果表明该代码可线性扩展到26万个CPU核,SPH和M PM的并行效率分别为100%和96%.  相似文献   

混合并行技术在激光化学反应模拟中的应用   总被引:2,自引:0,他引:2  
为提高激光化学反应模拟效率,在半经典分子动力学模拟中引入混合并行技术和双层并行思想。基于MPI+OpenMP混合模型设计并实现激光化学反应双层并行模拟算法,上层基于MPI实现节点间的原子分解并行,下层基于OpenMP实现节点内的多线程矩阵并行乘法。在SMP集群中测试表明,模拟大分子体系激光化学反应并行效率可达60%以上。因此,应用混合并行技术可有效提高激光化学反应模拟效率。  相似文献   

自行研制的三维并行全电磁PIC模拟软件UNIPIC-3D具有模拟高功率微波器件的能力。软件实现了并行的三维FDTD、粒子推进算法以及边界条件处理。软件通过读入输入文件进行规则与不规则两种区域划分方式,电磁场和粒子的并行化采用MPI机制,让粒子和电磁场的计算与通信同步,在高性能并行计算机上对软件的并行效率进行了测试。通过与2.5维UNIPIC软件的结果比较,验证了UNIPIC-3D软件并行模块的正确性。  相似文献   

基于二维/轴对称高精度可压缩多相流计算流体力学方法 MuSiC-CCASSIM的结构化网格部分,设计了区域并行分解方法;针对各处理器边界数据的通信,设计了阻塞式通信与非阻塞式通信并行算法;为了减少通信开销,设计了MPI/OpenMP混合并行优化算法。在天河二号超级计算机上进行了测试,每个核固定网格规模为625*250,最多调用8 192核。测试数据表明,采用MPI/OpenMP混合并行算法、纯MPI非阻塞式通信并行算法和纯MPI阻塞式通信并行算法的程序的平均并行效率分别达到86%、83%和77%,三种算法都具有良好的可扩展性。  相似文献   

光滑粒子流体动力学(Smoothed Particle Hydrodynamics,SPH)方法是一种无网格拉格朗日粒子法,目前在流体力学领域以及大变形和冲击载荷等问题的模拟方面具有广泛的应用,众多学者在SPH算法方面开展了大量的研究,以提高SPH算法的计算速度和精度.针对现有SPH方法在边界附近粒子近似精度下降的问题,本文在CSPH方法和MSPH方法基础上提出了一种改进的核近似形式,在求解场函数、一阶导数近似值以及二阶导数近似值过程中,对含二阶导数项的方程进行优化,减少了二阶导数项近似值的求解个数,相比MSPH方法减少了计算量.此外,本文基于改进的SPH算法,建立了二维数值波浪水槽模拟推板造波,通过数值模拟造波将SPH算法生成的波浪参数与理论值进行对比,验证了改进的SPH方法在波浪生成和传播上具有较好的模拟效果,为后续研究内波、畸形波以及非线性波相互作用提供了算法研究基础.  相似文献   

针对多块结构重叠网格并行装配的问题,设计了支持初始网格系统细分的多块结构重叠网格框架,并在此框架基础上提出了基于局部洞映射的并行挖洞算法、格心网格下可跨块寻点的并行搜索算法,使之可适应大规模并行数值模拟时的分布式计算环境。此算法被模块化的集成到了自主研发的大规模多块结构网格数值求解器(CCFD-MGMB)中,可支持大规模并行非定常多体分离数值模拟。并行测试结果表明,本文发展的算法具有良好的局部数据结构组织,数据可扩展性强。数值应用模拟结果表明了该算法的有效性及正确性,千核并行非定常数值计算效率(相对于64核)可达58%。  相似文献   

多层次并行体绘制算法的研究与应用   总被引:1,自引:0,他引:1  
三维数据场的体绘制技术是科学可视化中一个重要的研究方向,本文在研究和总结体绘制的发展历程与关键技术的基础之上,着重研究了体绘制中的光线投射算法,结合多核处理器机群系统,提出并实现了一种基于多层次并行编程模型的并行光线投射体绘制算法,并成功地将该算法应用于三维城市浅层地质模型,取得了良好的可视化效果。分别对MPI环境和多层次并行编程MPI+OpenMP环境下的光线投射算法进行了不同计算规模的性能比较实验。实验和分析表明,多层次并行光线投射体绘制算法加快了体绘制的速度,MPI+OpenMP多层次并行模型性能高于纯MPI编程模型的性能。  相似文献   

冲击爆炸问题的三维物质点法数值仿真   总被引:2,自引:0,他引:2  
基于物质点法(Material Point Method,MPM)模拟超高速碰撞和爆炸问题时呈现的特点,概述对MPM及其应用的扩展,包括:将MPM扩展应用于超高速碰撞问题,物质点有限元法(MaterialPoint Finite Element Method,MPFEM),杂交MPFEM,MPM质点自适应法,基于局部多重背景网格的接触算法和并行MPM算法.在此基础上开发针对冲击爆炸问题的三维显式并行MPM数值仿真软件MPM3D.MPM3D采用C++语言开发,并基于Qt和VTK开发图形用户界面PeneBlast,可在Windows,Linux和Mac OS等多种平台上运行.关于超高速碰撞、侵彻、爆炸、边坡失效和金属切削等问题的大量实例表明MPM3D的可靠性和准确性.MPM3D可作为航天器空间碎片防护、常规武器研发与防护等的有效设计工具.  相似文献   

为了提高分子动力学模拟在对称多处理(SMP)集群上的计算速度,在分子动力学并行方法中引入MPI+TBB的混合并行编程模型。基于该模型,在分子动力学软件LAMMPS中设计并实现混合并行算法,在节点间采用MPI及空间分解技术实施进程级并行,节点内采用TBB及临界区技术实施线程级并行。在SMP集群中的测试表明,该方法在体系较大以及节点数较多时可以明显减少通信时间,使加速比在纯MPI模型上提高45%。结果表明,MPI+TBB混合并行编程模型可促进分子动力学并行模拟且效率明显提升。  相似文献   

GPU中的流体场景实时模拟算法   总被引:2,自引:0,他引:2  
为了实时模拟真实的大规模流体场景,提出一种基于平滑粒子流体力学(SPH)进行流体场景模拟的算法.首先提出了新的精细程度函数作为非均匀采样的依据,以减少实际模拟时所需的粒子数,提高模拟的速度;然后引入一种三维空间网格划分算法和改进的并行基数排序算法,以加快模拟过程中对邻域粒子和边界的查找及其相互作用的计算;最后使用最新的NVIDIA(CUDA(架构,将SPH的全部模拟计算分配到GPU流处理器中,充分利用GPU的高并行性和可编程性,使得对SPH方法的流体计算和模拟达到实时.实验结果表明,采用文中算法能对流体场景的计算模拟达到实时,并实现比较真实的模拟效果.与已有的SPH流体CPU模拟方法相比,其加速比达到2个数量级以上,同时相比已有GPUSPH方法,能模拟出更为丰富的细节效果.  相似文献   

本文分析了非结构网格多群粒子输运Sn方程求解的并行性,拟合多核机群系统的特点,设计了MPI/OpenMP混合程序,针对空间网格点采用区域分解划分,计算结点间基于消息传递MPI编程,每个MPI计算进程在计算过程中碰到关于能群的计算,就生成多个OpenMP线程,计算结点内针对能群进行多线程并行计算。数值测试结果表明,非结构网格上的粒子输运问题的混合并行计算能较好地匹配多核机群系统的硬件结构,具有良好的可扩展性,可以扩展到1024个CPU核。  相似文献   

We present a scalable dissipative particle dynamics simulation code, fully implemented on the Graphics Processing Units (GPUs) using a hybrid CUDA/MPI programming model, which achieves 10–30 times speedup on a single GPU over 16 CPU cores and almost linear weak scaling across a thousand nodes. A unified framework is developed within which the efficient generation of the neighbor list and maintaining particle data locality are addressed. Our algorithm generates strictly ordered neighbor lists in parallel, while the construction is deterministic and makes no use of atomic operations or sorting. Such neighbor list leads to optimal data loading efficiency when combined with a two-level particle reordering scheme. A faster in situ generation scheme for Gaussian random numbers is proposed using precomputed binary signatures. We designed custom transcendental functions that are fast and accurate for evaluating the pairwise interaction. The correctness and accuracy of the code is verified through a set of test cases simulating Poiseuille flow and spontaneous vesicle formation. Computer benchmarks demonstrate the speedup of our implementation over the CPU implementation as well as strong and weak scalability. A large-scale simulation of spontaneous vesicle formation consisting of 128 million particles was conducted to further illustrate the practicality of our code in real-world applications.  相似文献   

Gadget is a simulation application for N‐body and smoothed particle hydrodynamics problems in cosmology, and it is widely applied in solving series of cosmological problems. N‐body focuses on the motion of the interaction of N particles, and smoothed particle hydrodynamics is a fluid simulation algorithm that studies the movement of fluid through particle simulation. Most scholars focus their attention on accelerating Gadget on multi‐core CPU or graphics processing units (GPUs) platforms. However, these research activities failed to achieve CPU–GPU hybrid computing, which resulted in tremendous waste of CPU computing resources. In this paper, we propose a CPU–GPU hybrid parallel strategy to accelerate Gadget‐2, a massively parallel structure formation code for cosmological simulations. This strategy uses CPU and GPU to process the calculation of short‐range force. To ensure CPU and GPU workload balance, a dynamic task allocation scheme is proposed according to the computational performance difference between the CPU and GPU. Experimental results showed that our CPU–GPU hybrid parallel strategy achieved an overall speedup factor of 18.6 and a partial speedup factor for short‐range force calculation of 28.35 compared with a single‐core CPU implementation for particles in million‐size magnitudes. Moreover, compared with a GPU platform that contained 12 CPU cores and one GPU, our hybrid parallel strategy obtained overall speedup and partial speedup factors of 6% and 20%, respectively. Furthermore, the scalability of the hybrid strategy is very fine – its performance will be enhanced when the problem scale is increasing. However, this strategy also has its limitation that the performance enhancement will be decreasing if the ratio(the number of CPU cores divides that of the GPU cards) reduces. Finally, in our hybrid strategy, the CPU coefficient of utilization improved by 17.14% or better. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

The material point method (MPM) has attracted increasing attention from the graphics community, as it combines the strengths of both particle‐ and grid‐based solvers. Like the smoothed particle hydrodynamics (SPH) scheme, MPM uses particles to discretize the simulation domain and represent the fundamental unknowns. This makes it insensitive to geometric and topological changes, and readily parallelizable on a GPU. Like grid‐based solvers, MPM uses a background mesh for calculating spatial derivatives, providing more accurate and more stable results than a purely particle‐based scheme. MPM has been very successful in simulating both fluid flow and solid deformation, but less so in dealing with multiple fluids and solids, where the dynamic fluid‐solid interaction poses a major challenge. To address this shortcoming of MPM, we propose a new set of mathematical and computational schemes which enable efficient and robust fluid‐solid interaction within the MPM framework. These versatile schemes support simulation of both multiphase flow and fully‐coupled solid‐fluid systems. A series of examples is presented to demonstrate their capabilities and performance in the presence of various interacting fluids and solids, including multiphase flow, fluid‐solid interaction, and dissolution.  相似文献   

王栋栋  庄雷 《计算机应用》2009,29(6):1702-1710
采用基于粒子插值的SPH方法对火焰流体进行模拟,用GPU加速粒子状态地计算,同时用CPU并行地计算粒子邻接关系并控制粒子产生速率。在SPH模型中,较为高效地加入了漩涡场的计算,增加了粒子运动的细节。在粒子渲染过程中,采用了色度场、有向点扩散和颜色锐化技术,由离散的粒子空间分布得到了较为理想的连续火焰图像。由于该方法属于流体模拟的拉格朗日法,所以火焰具有物理真实性,又由于采用GPU为主CPU为辅的计算架构,使得模拟达到了实时。  相似文献   

当前高性能计算机体系结构呈现多样性特征,给并行应用软件开发带来巨大挑战.采用领域特定语言OPS对高阶精度计算流体力学软件HNSC进行面向多平台的并行化,使用OPS API实现了代码的重构,基于OPS前后端自动生成了纯M PI、OpenM P、M PI+OpenM P和M PI+CUDA版本的可执行程序.在一个配有2块I...  相似文献   

李彪  刘杰 《计算机工程与科学》2020,42(11):1922-1928
Particle transport simulation plays an important role in the field of nuclear science and medical radiation therapy. Based on Monte Carlo method, this paper proposes a heterogeneous cooperative algorithm of particle transport on the Tianhe 2A system. Based on the asynchronous communication modes (BCL and ACL) of the Tianhe 2A system, a simple and efficient symmetric communication mode between the CPU and the Matrix2000 accelerator is proposed. On the Matrix2000 accelerator, the thread level parallelism of the program is developed through OpenMP instructions. The original serial data collection communication mode is optimized, and a new communication mode based on binary tree structure is proposed, which greatly reduces the communication time. On the Tianhe 2A system, the parallel program based on CPU/Matrix2000 heterogeneous collaborative computing can be scaled up to 450k cores, and the parallel efficiency compared to 50k cores is stabilized at 22.54%.  相似文献   

We propose a novel compression scheme to store neighbour lists for iterative solvers that employ Smoothed Particle Hydrodynamics (SPH). The compression scheme is inspired by Stream VByte, but uses a non-linear mapping from data to data bytes, yielding memory savings of up to 87%. It is part of a novel variant of the Cell-Linked-List (CLL) concept that is inspired by compact hashing with an improved processing of the cell-particle relations. We show that the resulting neighbour search outperforms compact hashing in terms of speed and memory consumption. Divergence-Free SPH (DFSPH) scenarios with up to 1.3 billion SPH particles can be processed on a 24-core PC using 172 GB of memory. Scenes with more than 7 billion SPH particles can be processed in a Message Passing Interface (MPI) environment with 112 cores and 880 GB of RAM. The neighbour search is also useful for interactive applications. A DFSPH simulation step for up to 0.2 million particles can be computed in less than 40 ms on a 12-core PC.  相似文献   

为了实现小尺度范围流体场景的实时、真实感模拟,采用弱可压SPH方法对水体进行建模,提出了流体计算的CPU GPU混合架构计算方法。针对邻域粒子查找算法影响流体计算效率的问题,采用三维空间网格对整个模拟区域进行均匀网格划分,利用并行前缀求和和并行计数排序实现邻域粒子的查找。最后,采用基于CUDA并行加速的Marching Cubes算法实现流体表面提取,利用环境贴图表现流体的反射和折射效果,实现流体表面着色。实验结果表明,所提出的流体建模和模拟算法能实现小尺度范围流体的实时计算和渲染,绘制出水的波动、翻卷和木块在水中晃动的动态效果,当粒子数达到1 048 576个时,GPU并行计算方法相较CPU方法的加速比为60.7。  相似文献   

为了实现小尺度范围风沙运动的真实感模拟,采用基于拉格朗日力学无网格形式的光滑粒子流体动力学(smooth particle hydrodynamics,SPH)方法解决了基于欧拉网格法因网格大变形或者变形边界等引起的各种问题,并克服了不能用固定欧拉网格追踪任意单颗粒子运动轨迹的困难,因此该方法在研究风沙运动方面有着独特的优势。然而,随着风沙流动中SPH粒子数目的增加,该方法计算效率低,计算规模大的缺陷在风沙模拟过程中尤为明显。为了提高其计算效率,在CUDA软硬件平台上,建立SPH-GPU并行加速的二维气沙两相耦合模型,对串行的热点程序进行分析,找出最耗时且适合并行的热点程序;其次对GPU并行计算模型进行验证,宏观上得到了沙粒群运动的时空变化规律,微观上得到了典型沙粒的跃移轨迹和变异的尖角轨迹;最后对比了三种不同粒子数下CPU与GPU的计算效率。模拟结果证明SPH-GPU并行计算方法能够进一步应用在风沙流的数值模拟研究中。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号