期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Implementing lattice Boltzmann computation on graphics hardware 总被引：14，自引：0，他引：14

Wei Li Xiaoming Wei Arie Kaufman 《The Visual computer》2003,19(7-8):444-456

The Lattice Boltzmann Model (LBM) is a physically-based approach that simulates the microscopic movement of fluid particles by simple, identical, and local rules. We accelerate the computation of the LBM on general-purpose graphics hardware, by grouping particle packets into 2D textures and mapping the Boltzmann equations completely to the rasterization and frame buffer operations. We apply stitching and packing to further improve the performance. In addition, we propose techniques, namely range scaling and range separation, that systematically transform variables into the range required by the graphics hardware and thus prevent overflow. Our approach can be extended to acceleration of the computation of any cellular automata model. 相似文献

2.

The computation of strain rate tensor in multiple-relaxation-time lattice Boltzmann model

Wenhuan Zhang Changsheng Huang Yihang Wang Baochang Shi Shibo Kuang Zhenhua Chai 《Computers & Mathematics with Applications》2018,75(8):2888-2902

The multiple-relaxation-time (MRT) lattice Boltzmann (LB) model is an important class of LB model with lots of advantages over the traditional single-relaxation-time (SRT) LB model. Generally, the computation of strain rate tensor is crucial for the MRT-LB simulations of some complex flows. At present, only two formulae are available to compute the strain rate tensor in the MRT LB model. One is to compute the strain rate tensor using the non-equilibrium parts of macroscopic moments (Yu formula). The other is to compute the strain rate tensor using the non-equilibrium parts of density distribution functions (Chai formula). The mathematical expressions of these two formulae are so different that we do not know which formula to choose for computing the strain rate tensor in the MRT LB model. To overcome this problem, this paper presents a theoretical study of the relationship between Chai and Yu formulae. The results show that the Yu formula can be deduced from the Chai formula, although they have their own advantages and disadvantages. In particular, the Yu formula is computationally more efficient, while the Chai formula is applicable to more lattice patterns of the MRT LB models. Furthermore, the derivation of the Yu formula in a particular lattice pattern from the Chai formula is more convenient than that proposed by Yu et al. 相似文献

3.

Parallel skyline computation on multicore architectures

Hyeonseung Im Jonghyun Park Sungwoo Park 《Information Systems》2011

With the advent of multicore processors, it has become imperative to write parallel programs if one wishes to exploit the next generation of processors. This paper deals with skyline computation as a case study of parallelizing database operations on multicore architectures. First we parallelize three sequential skyline algorithms, BBS, SFS, and SSkyline, to see if the design principles of sequential skyline computation also extend to parallel skyline computation. Then we develop a new parallel skyline algorithm PSkyline based on the divide-and-conquer strategy. Experimental results show that all the algorithms successfully utilize multiple cores to achieve a reasonable speedup. In particular, PSkyline achieves a speedup approximately proportional to the number of cores when it needs a parallel computation the most. 相似文献

4.

General regularized boundary condition for multi-speed lattice Boltzmann models

O. Malaspinas B. Chopard J. Latt 《Computers & Fluids》2011,49(1):29-35

The lattice Boltzmann method is nowadays a common tool for solving computational fluid dynamics problems. One of the difficulties of this numerical approach is the treatment of the boundaries, because of the lack of physical intuition for the behavior of the density distribution functions close to the walls. A massive effort has been made by the scientific community to find appropriate solutions for boundaries. In this paper we present a completely generic way of treating a Dirichlet boundary for two- and three-dimensional flat walls, edges or corners, for weakly compressible flows, applicable for any lattice topology. The proposed algorithm is shown to be second-order accurate and could also be extended for compressible and thermal flows. 相似文献

5.

Multi-relaxation-time lattice Boltzmann model for axisymmetric flows

Liang Wang Chuguang Zheng 《Computers & Fluids》2010,39(9):1542-2139

In this paper, a lattice-Boltzmann equation (LBE) with multi relaxation times (MRT) is presented for axisymmetric flows. The model is an extension of a recent model with single-relaxation-time [Guo et al., Phys. Rev. E 79, 046708 (2009)], which was developed based on the axisymmetric Boltzmann equation. Due to the use of the MRT collision model, the present model can achieve better numerical stability. The model is validated by some numerical tests including the Hagen-Poiseuille flow, the pulsatile Womersley flow, and the external flow over a sphere. Numerical results are in excellent agreement with analytical solutions or other available data, and the improvement in numerical stability is also confirmed. 相似文献

6.

Multi-GPU performance of incompressible flow computation by lattice Boltzmann method on GPU cluster

Wang Xian Aoki Takayuki 《Parallel Computing》2011,37(9):521-535

GPGPU has drawn much attention on accelerating non-graphic applications. The simulation by D3Q19 model of the lattice Boltzmann method was executed successfully on multi-node GPU cluster by using CUDA programming and MPI library. The GPU code runs on the multi-node GPU cluster TSUBAME of Tokyo Institute of Technology, in which a total of 680 GPUs of NVIDIA Tesla are equipped. For multi-GPU computation, domain partitioning method is used to distribute computational load to multiple GPUs and GPU-to-GPU data transfer becomes severe overhead for the total performance. Comparison and analysis were made among the parallel results by 1D, 2D and 3D domain partitionings. As a result, with 384 × 384 × 384 mesh system and 96 GPUs, the performance by 3D partitioning is about 3-4 times higher than that by 1D partitioning. The performance curve is deviated from the idealistic line due to the long communicational time between GPUs. In order to hide the communication time, we introduced the overlapping technique between computation and communication, in which the data transfer process and computation were done in two streams simultaneously. Using 8-96 GPUs, the performances increase by a factor about 1.1-1.3 with a overlapping mode. As a benchmark problem, a large-scaled computation of a flow around a sphere at Re = 13,000 was carried on successfully using the mesh system 2000 × 1000 × 1000 and 100 GPUs. For such a computation with 2 Giga lattice nodes, 6.0 h were used for processing 100,000 time steps. Under this condition, the computational time (2.79 h) and the data communication time (3.06 h) are almost the same. 相似文献

7.

Memory transfer optimization for a lattice Boltzmann solver on Kepler architecture nVidia GPUs

Mark J. Mawson Alistair J. Revell 《Computer Physics Communications》2014

The Lattice Boltzmann method (LBM) for solving fluid flow is naturally well suited to an efficient implementation for massively parallel computing, due to the prevalence of local operations in the algorithm. This paper presents and analyses the performance of a 3D lattice Boltzmann solver, optimized for third generation nVidia GPU hardware, also known as ‘Kepler’. We provide a review of previous optimization strategies and analyse data read/write times for different memory types. In LBM, the time propagation step (known as streaming), involves shifting data to adjacent locations and is central to parallel performance; here we examine three approaches which make use of different hardware options. Two of which make use of ‘performance enhancing’ features of the GPU; shared memory and the new shuffle instruction found in Kepler based GPUs. These are compared to a standard transfer of data which relies instead on optimized storage to increase coalesced access. It is shown that the more simple approach is most efficient; since the need for large numbers of registers per thread in LBM limits the block size and thus the efficiency of these special features is reduced. Detailed results are obtained for a D3Q19 LBM solver, which is benchmarked on nVidia K5000M and K20C GPUs. In the latter case the use of a read-only data cache is explored, and peak performance of over 1036 Million Lattice Updates Per Second (MLUPS) is achieved. The appearance of a periodic bottleneck in the solver performance is also reported, believed to be hardware related; spikes in iteration-time occur with a frequency of around 11 Hz for both GPUs, independent of the size of the problem. 相似文献

8.

Parallelization of the Kalman filter on multicore computational platforms

O. Rosén A. Medvedev T. Wigren 《Control Engineering Practice》2013,21(9):1188-1194

Parallelization of the Kalman filter algorithm, with emphasis on the specific demands of multicore architecture implementation, is investigated. The approach is based on the nonrestrictive assumption of a banded system matrix. Both time-varying and time-invariant systems can be generally transformed to such a form. The proposed method is applied to a radio interference power estimation problem for which speedup evaluations using up to eight cores are performed. It is shown that the algorithm is capable of achieving linear speedup in the number of cores used, while speedup factors for a parallel BLAS implementation are less than two. An algorithm analysis that provides guidelines to the choice of implementation hardware to meet a desired performance is also provided. 相似文献

9.

Topology optimization of flow domains using the lattice Boltzmann method

Georg Pingen Anton Evgrafov Kurt Maute 《Structural and Multidisciplinary Optimization》2007,34(6):507-524

We consider the optimal design of two- (2D) and three-dimensional (3D) flow domains using the lattice Boltzmann method (LBM) as an approximation of Navier-Stokes (NS) flows. The problem is solved by a topology optimization approach varying the effective porosity of a fictitious material. The boundaries of the flow domain are represented by potentially discontinuous material distributions. NS flows are traditionally approximated by finite element and finite volume methods. These schemes, while well established as high-fidelity simulation tools using body-fitted meshes, are effected in their accuracy and robustness when regular meshes with zero-velocity constraints along the surface and in the interior of obstacles are used, as is common in topology optimization. Therefore, we study the potential of the LBM for approximating low Mach number incompressible viscous flows for topology optimization. In the LBM the geometry of flow domains is defined in a discontinuous manner, similar to the approach used in material-based topology optimization. In addition, this non-traditional discretization method features parallel scalability and allows for high-resolution, regular fluid meshes. In this paper, we show how the variation of the porosity can be used in conjunction with the LBM for the optimal design of fluid domains, making the LBM an interesting alternative to NS solvers for topology optimization problems. The potential of our topology optimization approach will be illustrated by 2D and 3D numerical examples. 相似文献

10.

Large eddy simulation of turbulent open duct flow using a lattice Boltzmann approach

M. Fernandino K. Beronov T. Ytrehus 《Mathematics and computers in simulation》2009

Large eddy simulations of turbulent open duct flow are performed using the lattice Boltzmann method (LBM) in conjunction with the Smagorinsky sub-grid scale (SGS) model. A smaller value of the Smagorinsky constant than the usually used one in plain channel flow simulations is used. Results for the mean flow and turbulent fluctuations are compared to experimental data obtained in an open duct of similar dimensions. It is found that the LBM simulation results are in good qualitative agreement with the experiments. 相似文献

11.

多重网格格子Boltzmann方法的并行算法

刘智翔宋安平徐磊郑汉垣张武《计算机应用》2014,34(11):3065-3068

针对复杂流动数值模拟中的格子Boltzmann方法存在计算网格量大、收敛速度慢的缺点,提出了基于三维几何边界的多重笛卡儿网格并行生成算法,并基于该网格生成方法提出了多重网格并行格子Boltzmann方法（LBM）。该方法结合不同尺度网格间的耦合计算,有效减少了计算网格量,提高了收敛速度;而且测试结果也表明该并行算法具有良好的可扩展性。相似文献

12.

A flexible Patch-based lattice Boltzmann parallelization approach for heterogeneous GPU-CPU clusters

Christian Feichtinger Johannes HabichHarald Köstler Georg HagerUlrich Rüde Gerhard Wellein 《Parallel Computing》2011,37(9):536-549

Sustaining a large fraction of single GPU performance in parallel computations is considered to be the major problem of GPU-based clusters. We address this issue in the context of a lattice Boltzmann flow solver that is integrated in the WaLBerla software framework. Our multi-GPU implementation uses a block-structured MPI parallelization and is suitable for load balancing and heterogeneous computations on CPUs and GPUs. The overhead required for multi-GPU simulations is discussed in detail. It is demonstrated that a large fraction of the kernel performance can be sustained for weak scaling on InfiniBand clusters, leading to excellent parallel efficiency. However, in strong scaling scenarios using multiple GPUs is much less efficient than running CPU-only simulations on IBM BG/P and x86-based clusters. Hence, a cost analysis must determine the best course of action for a particular simulation task and hardware configuration. Finally we present weak scaling results of heterogeneous simulations conducted on CPUs and GPUs simultaneously, using clusters equipped with varying node configurations. 相似文献

13.

Non-Darcy flow in disordered porous media: A lattice Boltzmann study

Zhenhua Chai Baochang Shi Jianhua Lu Zhaoli Guo 《Computers & Fluids》2010,39(10):2069-2077

It is well known that the Darcy law is insufficient for describing high-rate flows in porous media. However, it is still an open problem to establish a universal form for the nonlinear correction to Darcy law. In this work, we will investigate numerically the non-Darcy effect on incompressible flows through disordered porous media. Numerical simulations at pore-scale level are carried out with the Reynolds number varying from 0.02 to 30, which covers the Darcy and non-Darcy flow regimes. Three regimes are identified for flow through porous media, i.e., a linear Darcy regime at vanishing Reynolds number, a cubic transitional or weak inertial regime at low but finite Reynolds number, and a quadratic Forchheimer or strong inertial regime at larger Reynolds numbers. Finally, a general correlation is proposed to include the non-Darcy effect, as an extension to the common empirical expressions. 相似文献

14.

基于格子Boltzmann方法的一维Burgers方程的数值模拟

兰中周乐励华高云《计算机应用》2013,33(9):2432-2435

基于格子Boltzmann方法(LBM)的一维Burgers方程的数值解法,已有2-bit和4-bit模型。文中通过选择合适的离散速度模型构造出恰当的平衡态分布函数; 然后, 利用单松弛的格子Bhatnagar-Gross-Krook模型、Chapman-Enskog展开和多尺度技术, 提出了用于求解一维Burgers方程的3-bit的格子Boltzmann模型,即D1Q3模型,并进行了数值实验。实验结果表明,该方法的数值解与解析解吻合的程度很好,且误差比现有文献中的误差更小,从而验证了格子Boltzamnn模型的有效性。相似文献

15.

Boundary condition considerations in lattice Boltzmann formulations of wetting binary fluids

H.S. Wiklund S.B. Lindström T. Uesaka 《Computer Physics Communications》2011,(10):2192-2200

We propose a new lattice Boltzmann numerical scheme for binary-fluid surface interactions. The new scheme combines the existing binary free energy lattice Boltzmann method [Swift et al., Phys. Rev. E 54 (1996)] and a new wetting boundary condition for diffuse interface methods in order to eliminate spurious variations in the order parameter at solid surfaces. We use a cubic form for the surface free energy density and also take into account the contribution from free energy in the volume when discretizing the wetting boundary condition. This allows us to eliminate the spurious variation in the order parameter seen in previous implementations. With the new scheme a larger range of equilibrium contact angles are possible to reproduce and capillary intrusion can be simulated at higher accuracy at lower resolution. 相似文献

16.

An improved entropic lattice Boltzmann model for parallel computation

T. Yasuda N. Satofuka 《Computers & Fluids》2011,45(1):187-190

In this paper, we suggest two kinds of approximation methods based on Taylor series expansion which can solve the non-linear equation in entropic lattice Boltzmann model without using any iteration methods such as Newton–Raphson method. The advantage of our methods is to be able to avoid the load imbalance in parallel computation which occurs due to the differences of iteration number on each calculation grid. In this study, ELBM simulations using our methods were compared with those using Newton–Raphson method for the channel flow past a square cylinder in Re = 1000 and the validity of the results and computational effort were investigated. As a result, it was found that the solutions obtained by our methods are qualitatively and quantitatively reasonable and CPU time is shorter than those obtained by Newton–Raphson method. 相似文献

17.

A lattice Boltzmann modeling and analysis of the thermal convection in a lithium-ion battery

Jibing Jiang Dinggen Li Ruzhen Dou 《Computers & Mathematics with Applications》2019,77(10):2695-2706

Thermal convection is a critical problem in the design of thermal management system, and is widely encountered in electric and hybrid electric vehicles. In the present work, the lattice Boltzmann method is adopted to investigate the thermal convection in the

LiN i_{x} C o_{y} M n_{z} O_{2}

(NCM) lithium-ion battery. The numerical results reveal that the thermal convection model considered in the current study can clearly depict the temperature evolution in the case of the thermal runaway. Additionally, it is found that as the adiabatic boundary condition is adopted, the maximum temperature inside the battery can reach 320°C at 240s, which in turn affects the surrounding batteries. To prevent the thermal runaway propagation in such a case, we also analyzed the forced convective heat transfer in this situation, and the numerical results indicate that thermal runaway can be effectively decreased if the value of the surface heat transfer coefficient for battery cell increases up to 200Wm^?2K^?1. Moreover, it is noted that when the temperature inside the battery reaches 110°C, the subsequent temperature distributions inside the battery have little influence on the surrounding batteries, which suggests that the thermal management of battery pack in both normal charge and discharge process should be considered. 相似文献

18.

Investigation of deformation and breakup of a falling droplet using a multiple-relaxation-time lattice Boltzmann method

Abbas Fakhari Mohammad Hassan Rahimian 《Computers & Fluids》2011,40(1):156-171

In this paper, a multi-relaxation-time lattice Boltzmann method for multiphase flows is employed to simulate different modes of deformation and fragmentation of an axisymmetric falling droplet under buoyancy force. To show the accuracy of the model, the Laplace law for stationary drops is conducted first. Then, drop deformation and breakup in a free fall is studied in an axially symmetric pipe. Surface tension effects as well as impacts of gas and drop viscosities are investigated for a wide range of Eötvös, Morton, and Archimedes numbers. The drag coefficient of the drop, as it falls, is measured and compared to the empirical correlations, and reasonable agreement is shown. The findings are further verified by comparing a typical bag breakup mechanism with experimental observations. It is seen that at low Eötvös numbers the drop deforms slightly and reaches a steady state. Increase of Eötvös number enhances the rate of deformation, and at a high enough Eötvös value breakup of the drop happens. While the gas viscosity is shown to have a trivial effect on the breakup of the droplet, drop viscosity is the overriding factor in the mechanism of disintegration. Consequently, various breakup modes of the falling droplet are observed just by varying the drop-based Archimedes number. By capturing different breakup mechanisms of a falling droplet such as bag breakup, shear breakup, and, particularly, multimode breakup, the present lattice Boltzmann method exhibits an excellent superiority over the sharp interface tracking schemes that fail to capture dissociation of the interface. 相似文献

19.

Kernel-assisted and topology-aware MPI collective communications on multicore/many-core platforms

Teng Ma George Bosilca Aurelien Bouteiller Jack J. Dongarra 《Journal of Parallel and Distributed Computing》2013

Multicore Clusters, which have become the most prominent form of High Performance Computing (HPC) systems, challenge the performance of MPI applications with non-uniform memory accesses and shared cache hierarchies. Recent advances in MPI collective communications have alleviated the performance issue exposed by deep memory hierarchies by carefully considering the mapping between the collective topology and the hardware topologies, as well as the use of single-copy kernel assisted mechanisms. However, on distributed environments, a single level approach cannot encompass the extreme variations not only in bandwidth and latency capabilities, but also in the capability to support duplex communications or operate multiple concurrent copies. This calls for a collaborative approach between multiple layers of collective algorithms, dedicated to extracting the maximum degree of parallelism from the collective algorithm by consolidating the intra- and inter-node communications. 相似文献

20.

基于Lattice Boltzmann模型的液-液混合流模拟 总被引：4，自引：0，他引：4

朱红斌刘学慧柳有权吴恩华《计算机学报》2006,29(12):2071-2079

引入了一种二元Lattice Boltzmann Model（LBM）,实现丁两种液体组成的混合流的模拟．不同于其它的类似模型,它区分考虑了流体的粘性和扩散特性,可以很容易地模拟各种互溶或者不互溶的混合流现象．此外,由于LBM的运算大都是线性的局部运算,这使得它很容易在可编程图形处理器（Graphics Process Unit,GPU）上进行加速,从而进行实时模拟．给出了若干二元混合流的模拟结果．相似文献