首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
开展了基于粘性直角非结构网格的并行CFD解算软件的开发研究,工作分两部分:网格分区实现和解算器并行实施,文章介绍了关于CFD解算器并行的实施情况,给出了并行过程中的操作流程,并对一些关键问题进行了讨论。并行计算结果表明项目所采用的并行途径和方法有效,计算结果可靠。  相似文献   

2.
非结构网格的并行多重网格解算器   总被引:2,自引:0,他引:2  
李宗哲  王正华  姚路  曹维 《软件学报》2013,24(2):391-404
多重网格方法作为非结构网格的高效解算器,其串行与并行实现在时空上都具有优良特性.以控制方程离散过程为切入点,说明非结构网格在并行数值模拟的流程,指出多重网格方法主要用于求解时间推进格式产生的大规模代数系统方程,简述了算法实现的基本结构,分析了其高效性原理;其次,综述性地概括了几何多重网格与代数多种网格研究动态,并对其并行化的热点问题进行重点论述.同时,针对非结构网格的实际应用,总结了多重网格解算器采用的光滑算子;随后列举了非结构网格应用的部分开源项目软件,并简要说明了其应用功能;最后,指出并行多重网格解算器在非结构网格应用中的若干关键问题和未来的研究方向.  相似文献   

3.
邓亮  徐传福  刘巍  张理论 《计算机应用》2013,33(10):2783-2786
交替方向隐格式(ADI)是常见的偏微分方程离散格式之一,目前对ADI格式在计算流体力学(CFD)实际应用中的GPU并行工作开展较少。从一个有限体积CFD应用出发,通过分析ADI解法器的特点和计算流程,基于统一计算架构(CUDA)编程模型设计了基于网格点与网格线的两类细粒度GPU并行算法,讨论了若干性能优化方法。在天河-1A系统上,采用128×128×128网格规模的单区结构网格算例,无粘项、粘性项及ADI迭代计算的GPU并行性能相对于单CPU核,分别取得了100.1、40.1和10.3倍的加速比,整体ADI CFD解法器的GPU并行加速比为17.3  相似文献   

4.
微分域网格变形方法能够较好的保持网格模型的局部细节特征,但其计算需要耗费较长的时间.结合GPU的高速并行运算性能,设计并实现了一种基于GPU的微分域网格变形算法.通过GPU进行网格的微分坐标求解、线性系统系数矩阵的Cholesky分解、线性系统求解等运算,从而将网格局部细节特征编码和解码过程以及变形结果的绘制完全通过GPU完成.实验结果表明该算法能够有效加速微分域网格变形方法的计算和绘制.  相似文献   

5.
提出一种网格模型线绘制的GPU加速算法,其在物体空间进行计算,避免了图像空间计算的不连续性.提出了GPU加速物体空间线绘制的通用算法框架,理论上对任何特征线定义均可以按照该框架实现GPU加速的特征线绘制;基于该框架,实现了轮廓线与光极线(photic extremum lines,PEL)2种特征线的GPU加速绘制.实...  相似文献   

6.
当前世界上排前几位的超级计算机都基于大量CPU和GPU组合的混合架构,它们对某些特殊问题,譬如基于FFT的图像处理或N体颗粒计算等领域可获得很高的性能。但是对由有限差分(或基于网格的有限元)离散的偏微分方程问题,于CPU/GPU集群上获得较好的性能仍然是一种挑战。本文提出并测试一种基于这类集群架构的混合算法。算法的可扩展性通过区域分解算法实现,而GPU的性能由基于光滑聚集的代数多重网格法获得,避免了在GPU上表现不理想的不完全分解算法。本文的数值实验采用32CPU/GPU求解用差分离散后达三千万未知数的偏微分方程。  相似文献   

7.
基于GPGPU的Lattice-Boltzmann数值模拟算法   总被引:5,自引:3,他引:2  
对Lattice Boltzmann方法(LBM)在GPGPU下的建模和算法进行了一系列研究,使得该方法在GPU下的计算加速比提升,大大缩短计算过程的时间消耗.重新设计了GPU的计算流程,在舍弃pixel buffer离屏渲染的同时,采用最新的帧缓存对象,多重纹理、多通道渲染和乒乓技术来设计一套基于方腔的LBM数值模拟程序,最终使GPU的计算时间缩短到CPU计算时间的六分之一.  相似文献   

8.
袁斌 《图学学报》2010,31(3):76
计算机图形硬件技术的快速发展可以用来加速可视化过程,为此针对非均匀直线网格,给出了基于均匀辅助网格的CPU光线投射算法、基于辅助纹理的GPU光线投射算法,以及基于切片的3D纹理体绘制算法,并在Nvidia Geforce 6800GT图形卡上对这些算法进行了测试。结果表明,GPU算法远远快于CPU算法,而基于切片的3D纹理体绘制算法则快于GPU光线投射算法。  相似文献   

9.
针对非结构网格隐式算法在GPU上的加速效果不佳的问题,通过分析GPU的架构及并行模式,研究并实现了基于非结构网格格点格式的隐式LU-SGS算法的GPU并行加速.通过采用RCM和Metis网格重排序(重组)方法,优化非结构网格的数据局部性,改善非结构网格的隐式算法在GPU上的并行加速效果.通过三维机翼算例验证了本文实现的正确性及效率.结果表明两种网格重排序(重组)方法分别得到了63%和69%的加速效果提高.优化后的LU-SGS隐式GPU并行算法获得了相较于CPU串行算法27倍的加速比,充分说明了本文方法的高效性.  相似文献   

10.
为了提高计算流体领域中复杂流动现象模拟计算的高效性和准确性,充分利用图形硬件的并行性,提出一种在单机多图形处理器下基于CUDA架构的Lattice Boltzmann方法(LBM)的模拟算法.采用区域划分策略将域上的LBM网格平均分配到不同的GPU设备上,在分区边界处搭接一层网格以方便计算该处网格的迁移过程,减少GPU间的通信量,并合理地利用CUDA存储层次架构中的全局内存和纹理内存为计算网格分配设备空间;采用多线程技术,用每个线程控制不同的GPU设备,同时引入线程同步机制信号量实现线程间的数据通信同步控制,按照LBM方程组的求解过程实现模拟计算.实验结果表明,双GPU将计算加速到单GPU的1.77倍左右,同时将流场计算网格规模从单GPU下的4160×4160扩大到双GPU下的6144×6144.  相似文献   

11.
GPU上计算流体力学的加速   总被引:1,自引:0,他引:1  
本文将计算流体力学中的可压缩的纳维叶-斯托克斯(Navier-Stokes),不可压缩的Navier-Stokes和欧拉(Euler)方程移植到NVIDIA GPU上.模拟了3个测试例子,2维的黎曼问题,方腔流问题和RAE2822型的机翼绕流.相比于CPU,我们在GPU平台上最高得到了33.2倍的加速比.为了最大程度提...  相似文献   

12.
近年来,随着统一计算设备构架(CUDA)的出现,高端图形处理器(GPU)在图像处理、计算流体力学等科学计算领域的应用得到了快速发展.属于介观数值方法的格子Boltzmann方法(LBM)是1种新的计算流体力学(CFD)方法,具有算法简单、能处理复杂边界条件、压力能够直接求解等优势,在多相流、湍流、渗流等领域得到了广泛应用.LBM由于具有内在的并行性,特别适合在GPU上计算.采用多松弛时间模型(MRT)的LBM,受松弛因子的影响较小并且数值稳定性较好.本文实现了MRT-LBM在基于CUDA的GPU上的计算,并通过计算流体力学经典算例--二维方腔流来验证计算的正确性.在雷诺数Re=[10,104]之间,计算了多达26种雷诺数的算例,并将Re=102,4×102,103,2×103,5×103,7.5×103算例对应的主涡中心坐标与文献中结果进行了对比.计算结果与文献数值实验符合较好,从而验证了算法实现的正确性,并显示出MRT-LBM具有更优的数值稳定性.本文还分析了在GPU上MRT-LBM的计算性能并与CPU的计算进行了比较,结果表明,GPU可以极大地加快MRT-LBM的计算,NVIDIA Tesla C2050相对于单核Intel Xeon 5430 CPU的加速比约为60倍.  相似文献   

13.
Most computational fluid dynamics (CFD) simulations require massive computational power which is usually provided by traditional High Performance Computing (HPC) environments. Although interactivity of the simulation process is highly appreciated by scientists and engineers, due to limitations of typical HPC environments, present CFD simulations are usually executed non interactively. A recent trend is to harness the parallel computational power of graphics processing units (GPUs) for general purpose applications. As an alternative to traditional massively parallel computing, GPU computing has also gained popularity in the CFD community, especially for its application to the lattice Boltzmann method (LBM). For instance, Tölke and others presented very efficient implementations of the LBM for 2D as well as 3D space (Toelke J, in Comput Visual Sci. (2008); Toelke J and Krafczk M, in Int J Comput Fluid Dyn 22(7): 443–456 (2008)). In this work we motivate the use of GPU computing to facilitate interactive CFD simulations. In our approach, the simulation is executed on multiple GPUs instead of traditional HPC environments, which allows the integration of the complete simulation process into a single desktop application. To demonstrate the feasibility of our approach, we show a fully bidirectional fluid-structure-interaction for self induced membrane oscillations in a turbulent flow. The efficiency of the approach allows a 3D simulation close to realtime.  相似文献   

14.
Heterogeneous multiprocessor systems, where commodity multicore processors are coupled with graphics processing units (GPUs), have been widely used in high performance computing (HPC). In this work, we focus on the design and optimization of Computational Fluid Dynamics (CFD) applications on such HPC platforms. In order to fully utilize the computational power of such heterogeneous platforms, we propose to design the performance-critical part of CFD applications, namely the linear equation solvers, in a hybrid way. A hybrid linear solver includes both one CPU version and one GPU version of code for solving a linear equations system. When a hybrid linear equation solver is invoked during the CFD simulation, the CPU portion and the GPU portion will be run on corresponding processing devices respectively in parallel according to the execution configuration. Furthermore, we propose to build functional performance models (FPMs) of processing devices and use FPM-based heterogeneous decomposition method to distribute workload between heterogeneous processing devices, in order to ensure balanced workload and optimized communication overhead. Efficiency of this approach is demonstrated by experiments with numerical simulation of lid-driven cavity flow on both a hybrid server and a hybrid cluster.  相似文献   

15.
Graphics processor units (GPU) that are originally designed for graphics rendering have emerged as massively-parallel “co-processors” to the central processing unit (CPU). Small-footprint multi-GPU workstations with hundreds of processing elements can accelerate compute-intensive simulation science applications substantially. In this study, we describe the implementation of an incompressible flow Navier–Stokes solver for multi-GPU workstation platforms. A shared-memory parallel code with identical numerical methods is also developed for multi-core CPUs to provide a fair comparison between CPUs and GPUs. Specifically, we adopt NVIDIA’s Compute Unified Device Architecture (CUDA) programming model to implement the discretized form of the governing equations on a single GPU. Pthreads are then used to enable communication across multiple GPUs on a workstation. We use separate CUDA kernels to implement the projection algorithm to solve the incompressible fluid flow equations. Kernels are implemented on different memory spaces on the GPU depending on their arithmetic intensity. The memory hierarchy specific implementation produces significantly faster performance. We present a systematic analysis of speedup and scaling using two generations of NVIDIA GPU architectures and provide a comparison of single and double precision computational performance on the GPU. Using a quad-GPU platform for single precision computations, we observe two orders of magnitude speedup relative to a serial CPU implementation. Our results demonstrate that multi-GPU workstations can serve as a cost-effective small-footprint parallel computing platform to accelerate computational fluid dynamics (CFD) simulations substantially.  相似文献   

16.
通过求解Euler方程获得运动翼段的非定常流场,并用CUDA语言对流场求解器进行GPU并行计算.使用ARMA(auto-regressive-moving-average)模型对非定常气动力进行辨识,由系统辨识模型得到的结果与全阶CFD计算结果十分吻合.基于降阶气动模型与结构的耦合,计算了具有S型颤振边界的气动弹性标准算例-Isogai Wing的跨音速颤振.本文给出的方法可以在保证气动弹性计算精度的前提下大幅提高计算效率.  相似文献   

17.
An optimized implementation of a block tridiagonal solver based on the block cyclic reduction (BCR) algorithm is introduced and its portability to graphics processing units (GPUs) is explored. The computations are performed on the NVIDIA GTX480 GPU. The results are compared with those obtained on a single core of Intel Core i7-920 (2.67 GHz) in terms of calculation runtime. The BCR linear solver achieves the maximum speedup of 5.84x with block size of 32 over the CPU Thomas algorithm in double precision. The proposed BCR solver is applied to discontinuous Galerkin (DG) simulations on structured grids via alternating direction implicit (ADI) scheme. The GPU performance of the entire computational fluid dynamics (CFD) code is studied for different compressible inviscid flow test cases. For a general mesh with quadrilateral elements, the ADI-DG solver achieves the maximum total speedup of 7.45x for the piecewise quadratic solution over the CPU platform in double precision.  相似文献   

18.
The Building-Cube Method (BCM) based on equally-spaced Cartesian meshes has been proposed as a next generation CFD method. Due to the equally-spaced meshes, it is well suited for highly parallel computation. This paper proposes a parallel implementation scheme of BCM on a GPU cluster system, which needs efficient hierarchical parallel processing to exploit the potential of the cluster system. The proposed scheme employs the Red-Black SOR method for the pressure calculations, which is the most time-consuming part of BCM, to obtain massive data parallelism of BCM. By exploiting the coarse-grain and fine-grain parallelism of BCM, the proposed scheme hierarchically assigns equally-divided tasks into the GPU cluster system. Furthermore, to exploit the computational power of GPUs in the cluster system, the proposed scheme employs an efficient data management such as coalesced data transfer and reusing data on an on-chip memory. Experimental results show that the single GPU implementation can achieve about three times higher performance than the single CPU one. Moreover, the multiple GPU implementation can achieve an almost ideal scalability. Finally, the possibility of further acceleration of not only the pressure calculation but also the whole BCM is discussed.  相似文献   

19.
为了实现小尺度范围风沙运动的真实感模拟,采用基于拉格朗日力学无网格形式的光滑粒子流体动力学(smooth particle hydrodynamics,SPH)方法解决了基于欧拉网格法因网格大变形或者变形边界等引起的各种问题,并克服了不能用固定欧拉网格追踪任意单颗粒子运动轨迹的困难,因此该方法在研究风沙运动方面有着独特的优势。然而,随着风沙流动中SPH粒子数目的增加,该方法计算效率低,计算规模大的缺陷在风沙模拟过程中尤为明显。为了提高其计算效率,在CUDA软硬件平台上,建立SPH-GPU并行加速的二维气沙两相耦合模型,对串行的热点程序进行分析,找出最耗时且适合并行的热点程序;其次对GPU并行计算模型进行验证,宏观上得到了沙粒群运动的时空变化规律,微观上得到了典型沙粒的跃移轨迹和变异的尖角轨迹;最后对比了三种不同粒子数下CPU与GPU的计算效率。模拟结果证明SPH-GPU并行计算方法能够进一步应用在风沙流的数值模拟研究中。  相似文献   

20.
In this paper we describe a general purpose, graphics processing unit (GP-GPU)-based approach for solving partial differential equations (PDEs) within advection–reaction–diffusion models. The GP-GPU-based approach provides a platform for solving PDEs in parallel and can thus significantly reduce solution times over traditional CPU implementations. This allows for a more efficient exploration of various advection–reaction–diffusion models, as well as, the parameters that govern them. Although the GPU does impose limitations on the size and accuracy of computations, the PDEs describing the advection–reaction–diffusion models of interest to us fit comfortably within these constraints. Furthermore, the GPU technology continues to rapidly increase in speed, memory, and precision, thus applying these techniques to larger systems should be possible in the future. We chose to solve the PDEs using two numerical approaches: for the diffusion, a first-order explicit forward Euler solution and a semi-implicit second order Crank–Nicholson solution; and, for the advection and reaction, a first-order explicit solution. The goal of this work is to provide motivation and guidance to the application scientist interested in exploring the use of the GP-GPU computational framework in the course of their research. In this paper, we present a rigorous comparison of our GPU-based advection–reaction–diffusion code model with a CPU-based analog, finding that the GPU model out-performs the CPU implementation in one-to-one comparisons.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号