期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Matrix-free GPU implementation of a preconditioned conjugate gradient solver for anisotropic elliptic PDEs

Eike Müller Xu Guo Robert Scheichl Sinan Shi 《Computing and Visualization in Science》2013,16(2):41-58

Many problems in geophysical and atmospheric modelling require the fast solution of elliptic partial differential equations (PDEs) in “flat” three dimensional geometries. In particular, an anisotropic elliptic PDE for the pressure correction has to be solved at every time step in the dynamical core of many numerical weather prediction (NWP) models, and equations of a very similar structure arise in global ocean models, subsurface flow simulations and gas and oil reservoir modelling. The elliptic solve is often the bottleneck of the forecast, and to meet operational requirements an algorithmically optimal method has to be used and implemented efficiently. Graphics Processing Units (GPUs) have been shown to be highly efficient (both in terms of absolute performance and power consumption) for a wide range of applications in scientific computing, and recently iterative solvers have been parallelised on these architectures. In this article we describe the GPU implementation and optimisation of a Preconditioned Conjugate Gradient (PCG) algorithm for the solution of a three dimensional anisotropic elliptic PDE for the pressure correction in NWP. Our implementation exploits the strong vertical anisotropy of the elliptic operator in the construction of a suitable preconditioner. As the algorithm is memory bound, performance can be improved significantly by reducing the amount of global memory access. We achieve this by using a matrix-free implementation which does not require explicit storage of the matrix and instead recalculates the local stencil. Global memory access can also be reduced by rewriting the PCG algorithm using loop fusion and we show that this further reduces the runtime on the GPU. We demonstrate the performance of our matrix-free GPU code by comparing it both to a sequential CPU implementation and to a matrix-explicit GPU code which uses existing CUDA libraries. The absolute performance of the algorithm for different problem sizes is quantified in terms of floating point throughput and global memory bandwidth. 相似文献

2.

基于Python的大规模高性能LBM多相流模拟

徐传福王曦刘舒陈世钊林玉《计算机科学》2020,47(1):17-23

Python由于具有丰富的第三方库、开发高效等优点,已成为数据科学、智能科学等应用领域最流行的编程语言之一。Python强调了对科学与工程计算的支持,目前已积累了丰富的科学与工程计算库和工具。例如,SciPy和NumPy等数学库提供了高效的多维数组操作及丰富的数值计算功能。以往,Python主要作为脚本语言,起到连接数值模拟前处理、求解器和后处理的“胶水”功能,以提升数值模拟的自动化处理水平。近年来,国外已有学者尝试采用Python代码实现求解计算功能,并在高性能计算机上开展了超大规模并行计算研究,取得了不错的效果。由于自身特点,高效大规模Python数值模拟的实现和性能优化与传统基于C/C++和Fortran的数值模拟等具有很大的不同。文中实现了国际上首个完全基于Python的大规模并行三维格子玻尔兹曼多相流模拟代码PyLBMFlow,探索了Python大规模高性能计算和性能优化方法。首先,利用NumPy多维数组和通用函数设计实现了LBM流场数据结构和典型计算内核,通过一系列性能优化并对LBM边界处理算法进行重构,大幅提升了Python的计算效率,相对于基准实现,优化后的串行性能提升了两个量级。在此基础上,采用三维流场区域分解方法,基于mpi4py和Cython实现了MPI+OpenMP混合并行;在天河二号超级计算机上成功模拟了基于D3Q19离散方法和Shan-Chen BGK碰撞模型的气液两相流,算例规模达百亿网格,并行规模达1024个结点,并行效率超过90%。相似文献

3.

An economical method for computation of stationary conditions in gas transport systems

Maksim G. Anuchin Mikhail G. Anuchin A. N. Kuznetsov 《Mathematical Models and Computer Simulations》2016,8(6):651-659

An economic iterative method is proposed for simulating stationary conditions in complex gas transport systems of any topology. The method exploits a fast algebraic pipe model with periodic identification, which is used to adjust the model to the stationary conditions under consideration and to provide the required accuracy of calculation. The identification procedure is based on the solution of stationary hydrodynamic equations. The gas transport system is represented as a list of pipes and a list of nodes. As an initial approximation, we define gas pressures and temperatures in the nodes and thus setting boundary conditions for the stationary flow. In the global iterations of the method, the pressures and temperatures are updated to make the flow disbalance tend to zero. The method allows parallel computing on multiprocessor computers. It is implemented as a separate module in the VOLNA software. Its efficiency is demonstrated through a sample calculation for the gas transport system. A number of related issues are considered, including simulation accuracy, peculiarities in the solution of stationary equations, and the correctness of the calculation setup. The method can be used in combination with any other model of piped compressible gas. 相似文献

4.

常压等离子体射流沉积ZnO及其传感特性研究

韩琦黄淳张健《传感技术学报》2017,30(3)

提出了一种环保、高效、低成本的传感器制作方法.应用等离子体射流喷枪沉积ZnO半导体材料,其中等离子体射流喷枪的气源为N2,电源采用输出频率为20 kHz的高频高压电源.使用Zn(NO3)2的水溶液作为前驱物,经过雾化装置形成气溶胶,气溶胶在等离子体的氛围中成功在衬底上沉积出一层ZnO半导体材料,通过XRD、SEM、UV-Vis等对其进行表征分析.并对基于该方法制备的ZnO湿度传感器性能进行表征.ZnO沉积时间为5 min时的灵敏度最高,低湿和高湿下分别为0.435 pF/% RH和19.634 pF/% RH,同时该样品的迟滞、响应时间与恢复时间亦是最优. 相似文献

5.

A finite difference real ghost fluid method on moving meshes with corner-transport upwind interpolation

Yan Ding Li Yang Li Yuan 《Computers & Fluids》2011,49(1):247-257

The real ghost fluid method (RGFM) [Wang CW, Liu TG, Khoo BC. A real-ghost fluid method for the simulation of multi-medium compressible flow. SIAM J Sci Comput 2006;28:278–302] has been shown to be more robust than previous versions of GFM for simulating multi-medium flow problems with large density and pressure jumps. In this paper, a finite difference RGFM is combined with adaptive moving meshes for one- and two-dimensional problems. A high resolution corner-transport upwind (CTU) method is used to interpolate approximate solutions from old quadrilateral meshes to new ones. Unlike the dimensional splitting interpolation, the CTU method takes into account the transport across corner points, which is physically more sensible. Several one- and two-dimensional examples with large density and pressure jumps are computed. The results show the present moving mesh method can effectively reduce the conservative errors produced by GFM and can increase the computational efficiency. 相似文献

6.

Hybrid algorithm for accelerating the double series of Floquet vector modes

LI Weidong HONG Wei HAO Zhangcheng ZHOU Houxing 《中国科学F辑(英文版)》2006,49(5):616-626

In this paper, a hybrid algorithm for accelerating the double series of Floquet vector modes arising in the analysis of frequency selective surfaces (FSS) is presented. The asymptotic terms with slow convergence in the double series are first accelerated by Poisson transformation and Ewald method, and then the remained series is accelerated by Shank transformation. It results in significant savings in memory and computing time. Numerical examples verify the validity of the hybrid acceleration algorithm. 相似文献

7.

Corrigendum to “Complex flow simulation in natural aquifer” [Adv. Eng. Inform. 23 (2009) 288–293]: An algorithm for parallel flow simulations in the finite element framework

H. Mustapha A. Ghorayeb K. Mustapha 《Advanced Engineering Informatics》2013,27(1):149-156

The high advanced techniques in parallel computing can be employed for a better understanding of groundwater flow fluids. Generally, the geological media are very heterogeneous and contain complex structures. Decomposing these structures into, approximately, equivalent sub-structures for a load-balancing is a major challenge. This paper proposes and analyses a new algorithm to simulate parallel flow fluid in such complex media. Fully parallel software is developed, and two well-known sparse linear solvers, based respectively on a multifrontal Cholesky factorization and an iterative structured multigrid method, are compared. The mixed finite element (MFE) method is used to discretize Darcy’s equation. Numerical examples are presented to show the efficiency and robustness of the algorithm proposed. 相似文献

8.

An enhanced hybrid method for the simulation of highly skewed non-Gaussian stochastic fields

《Computer Methods in Applied Mechanics and Engineering》2005,194(45-47):4824-4844

In this paper, an enhanced hybrid method (EHM) is presented for the simulation of homogeneous non-Gaussian stochastic fields with prescribed target marginal distribution and spectral density function. The presented methodology constitutes an efficient blending of the Deodatis–Micaletti method with a neural network based function approximation. Precisely, the function fitting ability of neural networks based on the resilient back-propagation (Rprop) learning algorithm is employed to approximate the unknown underlying Gaussian spectrum. The resulting algorithm can be successfully applied for simulating narrow-banded fields with very large skewness at a fraction of the computing time required by the existing methods. Its computational efficiency is demonstrated in three numerical examples involving fields that follow the beta and lognormal distributions. 相似文献

9.

A multi-objective perspective on performance assessment and automated selection of single-objective optimization algorithms

《Applied Soft Computing》2020

We build upon a recently proposed multi-objective view onto performance measurement of single-objective stochastic solvers. The trade-off between the fraction of failed runs and the mean runtime of successful runs – both to be minimized – is directly analyzed based on a study on algorithm selection of inexact state-of-the-art solvers for the famous Traveling Salesperson Problem (TSP). Moreover, we adopt the hypervolume indicator (HV) commonly used in multi-objective optimization for simultaneously assessing both conflicting objectives and investigate relations to commonly used performance indicators, both theoretically and empirically. Next to Penalized Average Runtime (PAR) and Penalized Quantile Runtime (PQR), the HV measure is used as a core concept within the construction of per-instance algorithm selection models offering interesting insights into complementary behavior of inexact TSP solvers. 相似文献

10.

A finite-difference solution of Onsager's model for flow in a gas centrifuge

R.J. Ribando 《Computers & Fluids》1984,12(3):235-252

A finite-difference algorithm for computing the secondary flow field in a gas centrifuge is described. The linearized axisymmetric Navier-Stokes equations in primitive variable form are solved by a time-marching technique. Most previous finite-difference solution algorithms for gas centrifuge flows have involved numerically resolving the thin Ekman layers at the ends using an extremely fine mesh. Here instead we apply an analytical matching condition at the ends which has the same effect on the flow outside the Ekman layers as would the direct application of the hydrodynamic and thermal boundary conditions on a highly refined mesh. With this matching at the ends and judicious finite-differencing in the interior, the computational efficiency approaches that of algorithms based on eigenfunction solution of the steady-state problem while retaining the convenience of time-marching finite-difference methods. While non-linear terms, sources and sinks, and curvature are not included at present, the algorithm does lend itself to their later inclusion. 相似文献

11.

Gas-kinetic numerical study of complex flow problems covering various flow regimes

Zhi-Hui Li Lin Bi Han-Xin Zhang Lin Li 《Computers & Mathematics with Applications》2011,61(12):3653-3667

The Boltzmann simplified velocity distribution function equation, as adapted to various flow regimes, is described on the basis of the Boltzmann–Shakhov model from the kinetic theory of gases in this study. The discrete velocity ordinate method of gas-kinetic theory is studied and applied to simulate complex multi-scale flows. On the basis of using the uncoupling technique on molecular movements and collisions in the DSMC method, the gas-kinetic finite difference scheme is constructed by extending and applying the unsteady time-splitting method from computational fluid dynamics, which directly solves the discrete velocity distribution functions. The Gauss-type discrete velocity numerical quadrature technique for flows with different Mach numbers is developed to evaluate the macroscopic flow parameters in the physical space. As a result, the gas-kinetic numerical algorithm is established for studying the three-dimensional complex flows with high Mach numbers from rarefied transition to continuum regimes. On the basis of the parallel characteristics of the respective independent discrete velocity points in the discretized velocity space, a parallel strategy suitable for the gas-kinetic numerical method is investigated and, then, the HPF (High Performance Fortran) parallel programming software is developed for simulating gas dynamical problems covering the full spectrum of flow regimes. To illustrate the feasibility of the present gas-kinetic numerical method and simulate gas transport phenomena covering various flow regimes, the gas flows around three-dimensional spheres and spacecraft-like shapes with different Knudsen numbers and Mach numbers are investigated to validate the accuracy of the numerical methods through HPF parallel computing. The computational results determine the flow fields in high resolution and agree well with the theoretical and experimental data. This computing, in practice, has confirmed that the present gas-kinetic algorithm probably provides a promising approach for resolving hypersonic aerothermodynamic problems with the complete spectrum of flow regimes from the gas-kinetic point of view for solving the mesoscopic Boltzmann model equation. 相似文献

12.

A dynamic task scheduling framework based on chicken swarm and improved raven roosting optimization methods in cloud computing

Shadi Torabi Faramarz Safi-Esfahani 《The Journal of supercomputing》2018,74(6):2581-2626

Scheduling means devoting tasks among computational resources, considering specific goals. Cloud computing is facing a dynamic and rapidly evolving situation. Devoting tasks to the computational resources could be done in numerous different ways. As a consequence, scheduling of tasks in cloud computing is considered as a NP-hard problem. Meta-heuristic algorithms are a proper choice for improving scheduling in cloud computing, but they should, of course, be consistent with the dynamic situation in the field of cloud computing. One of the newest bio-inspired meta-heuristic algorithms is the chicken swarm optimization (CSO) algorithm. This algorithm is inspired by the hierarchical behavior of chickens in a swarm for finding food. The diverse movements of the chickens create a balance between the local and the global search for finding the optimal solution. Raven roosting optimization (RRO) algorithm is inspired by the social behavior of raven and the information flow between the members of the population with the goal of finding food. The advantage of this algorithm lies in using the individual perception mechanism in the process of searching the problem space. In the current work, an ICDSF scheduling framework is proposed. It is a hybrid (IRRO-CSO) meta-heuristic approach based on the improved raven roosting optimization algorithm (IRRO) and the CSO algorithm. The CSO algorithm is used for its efficiency in satisfying the balance between the local and the global search, and IRRO algorithm is chosen for solving the problem of premature convergence and its better performance in bigger search spaces. First, the performance of the proposed hybrid IRRO-CSO algorithm is compared with other imitation-based swarm intelligence methods using benchmark functions (CEC 2017). Then, the capabilities of the proposed scheduling hybrid algorithm (IRRO-CSO) are tested using the NASA-iPSC parallel workload and are compared with the other available algorithms. The obtained results from the implementation of the hybrid IRRO-CSO algorithm in MATLAB show an improvement in the average best fitness compared with the following algorithms: IRRO, RRO, CSO, BAT and PSO. Finally, simulation tests performed in cloud computing environment show improvements in terms of reduction of execution time, reduction of response time and the increase in throughput by using the proposed hybrid IRRO-CSO approach for dynamic scheduling. 相似文献

13.

Investigation on runtime partitioning of elastic mobile applications for mobile cloud computing 总被引：1，自引：0，他引：1

Muhammad Shiraz Ejaz Ahmed Abdullah Gani Qi Han 《The Journal of supercomputing》2014,67(1):84-103

The latest developments in mobile computing technology have increased the computing capabilities of smartphones in terms of storage capacity, features support such as multimodal connectivity, and support for customized user applications. Mobile devices are, however, still intrinsically limited by low bandwidth, computing power, and battery lifetime. Therefore, the computing power of computational clouds is tapped on demand basis for mitigating resources limitations in mobile devices. Mobile cloud computing (MCC) is believed to be able to leverage cloud application processing services for alleviating the computing limitations of smartphones. In MCC, application offloading is implemented as a significant software level solution for sharing the application processing load of smartphones. The challenging aspect of application offloading frameworks is the resources intensive mechanism of runtime profiling and partitioning of elastic mobile applications, which involves additional computing resources utilization on Smart Mobile Devices (SMDs). This paper investigates the overhead of runtime application partitioning on SMD by analyzing additional resources utilization on SMD in the mechanism of runtime application profiling and partitioning. We evaluate the mechanism of runtime application partitioning on SMDs in the SmartSim simulation environment and validate the overhead of runtime application profiling by running prototype application in the real mobile computing environment. Empirical results indicate that additional computing resources are utilized in runtime application profiling and partitioning. Hence, lightweight alternatives with optimal distributed deployment and management mechanism are mandatory for accessing application processing services of computational clouds. 相似文献

14.

高炉炉喉煤气流形态三维模型重建

安剑奇刘洋吴敏曹卫华《控制理论与应用》2014,31(5):624-631

高炉炉喉煤气流分布是表征高炉生产状况的重要因素且无法直接检测;本文提出一种基于炉顶摄像检测的高炉炉喉煤气流形态三维模型重建方法.首先建立检测系统的硬件结构,其次对煤气流形态图像预处理后,采用边缘检测算法提取图像特征,并提出基于方向链码的边缘跟踪算法,计算煤气流形态的图像坐标;最后结合摄像机内部参数并利用坐标系三维变化关系以及坐标映射关系,计算煤气流形态在三维空间的坐标,从而重建煤气流形态的三维模型.通过仿真对比高炉布料后煤气流形态发展的4个阶段的三维模型变化,估计气流形态的变化,从而估计高炉内部的煤气流分布状况,对指导高炉布料操作以及高炉稳顺运行奠定了基础. 相似文献

15.

Parameter tuning evolutionary algorithms for runtime versus cost trade-off in a cloud computing environment

《Simulation Modelling Practice and Theory》2018

The runtime of an evolutionary algorithm can be reduced by increasing the number of parallel evaluations. However, increasing the number of parallel evaluations can also result in wasted computational effort since there is a greater probability of creating solutions that do not contribute to convergence towards the global optimum. A trade-off, therefore, arises between the runtime and computational effort for different levels of parallelization of an evolutionary algorithm. When the computational effort is translated into cost, the trade-off can be restated as runtime versus cost. This trade-off is particularly relevant for cloud computing environments where the computing resources can be exactly matched to the level of parallelization of the algorithm, and the cost is proportional to the runtime and how many instances that are used. This paper empirically investigates this trade-off for two different evolutionary algorithms, NSGA-II and differential evolution (DE) when applied to a multi-objective discrete-event simulation (DES) problem. Both generational and steady-state asynchronous versions of both algorithms are included. The approach is to perform parameter tuning on a simplified version of the DES model. A subset of the best configurations from each tuning experiment is then evaluated on a cloud computing platform. The results indicate that, for the included DES problem, the steady-state asynchronous version of each algorithm provides a better runtime versus cost trade-off than the generational versions and that DE outperforms NSGA-II. 相似文献

16.

可重构资源管理及硬件任务布局的算法研究 总被引：1，自引：0，他引：1

李涛杨愚鲁《计算机研究与发展》2008,45(2):375-382

可重构系统具有微处理器的灵活性和接近于ASIC的计算速度,可重构硬件的动态部分重构能力能够实现计算和重构操作的重叠,使系统能够动态地改变运行任务,可重构资源管理和硬件任务布局方法是提高可重构系统性能的关键.提出了基于任务上边界计算最大空闲矩形的算法(TT-KAMER),能够有效地管理系统的空闲可重构资源;在此基础上使用FF和启发式BF算法进行硬件任务的布局.实验表明,算法能够有效地实现在线资源分配与任务布局,获得较高的资源利用率. 相似文献

17.

Learning dynamic algorithm portfolios

Matteo Gagliolo Jürgen Schmidhuber 《Annals of Mathematics and Artificial Intelligence》2006,47(3-4):295-328

Algorithm selection can be performed using a model of runtime distribution, learned during a preliminary training phase. There is a trade-off between the performance of model-based algorithm selection, and the cost of learning the model. In this paper, we treat this trade-off in the context of bandit problems. We propose a fully dynamic and online algorithm selection technique, with no separate training phase: all candidate algorithms are run in parallel, while a model incrementally learns their runtime distributions. A redundant set of time allocators uses the partially trained model to propose machine time shares for the algorithms. A bandit problem solver mixes the model-based shares with a uniform share, gradually increasing the impact of the best time allocators as the model improves. We present experiments with a set of SAT solvers on a mixed SAT-UNSAT benchmark; and with a set of solvers for the Auction Winner Determination problem. This work was supported by SNF grant 200020-107590/1. 相似文献

18.

基于运行时模型的混合云管理方法

陈星兰兴土李隘鹏郭文忠黄罡《软件学报》2017,28(7):1881-1897

随着云计算技术的普及,涌现出众多不同用途、不同类型的云计算平台.为了满足遗产系统整合和动态资源扩展等需求,常常需要构造混合云来统一管理不同云平台中的计算和存储资源.然而,不同云平台的管理接口和管理机制存在差异,使得开发混合云管理系统难度大、复杂度高.本文提出一种基于运行时模型的混合云管理方法：首先,在云平台管理接口基础上,构造单一云平台的运行时模型;其次,根据云平台领域知识,提出一种云平台软件体系结构的统一模型;最后,通过模型转换,实现云平台统一模型到运行时模型的映射.于是,管理程序可以建立在云平台统一模型的基础上,降低了混合云管理系统开发的难度和复杂度.本文还实现了基于运行时模型的CloudStack和亚马逊EC2混合云管理系统,并对方法的可行性和有效性进行了验证. 相似文献

19.

Lattice Boltzmann simulation of water and gas flow in porous gas diffusion layers in fuel cells reconstructed from micro-tomography

Yuan Gao Xiaoxian Zhang Pratp Rama Rui Chen Hossein Ostadi Kyle Jiang 《Computers & Mathematics with Applications》2013,65(6):891-900

The porous gas diffusion layers (GDLs) are key components in hydrogen fuel cells. During their operation the cells produce water at the cathode, and to avoid flooding, the water has to be removed out of the cells. How to manage the water is therefore an important issue in fuel cell design. In this paper we investigated water flow in the GDLs using a combination of the lattice Boltzmann method and X-ray computed tomography at the micron scale. Water flow in the GDL depends on water–air surface tension and hydrophobicity. To correctly represent the water–gas surface tension, the formations of water droplets in air were simulated, and the water–gas surface tension was obtained by fitting the simulated results to the Young–Laplace formula. The hydrophobicity is represented by the water–gas-fabric contact angle. For a given water–gas surface tension the value of the contact angle was determined by simulating the formations of water droplets on a solid surface with different hydrophobicity. We then applied the model to simulate water intrusion into initially dry GDLs driven by a pressure gradient in attempts to understand the impact of hydrophobicity on water distribution in the GDLs. The structures of the GDL were acquired by X-ray micro-tomography at a resolution of 1.7 microns. The simulated results revealed that with an increase in hydrophobicity, water transport in GDLs changes from piston-flow to channelled flow. 相似文献

20.

On enhanced non-linear free surface flow simulations with a hybrid LBM–VOF model

Christian F. Janßen Stephan T. Grilli Manfred Krafczyk 《Computers & Mathematics with Applications》2013,65(2):211-229

In this paper, we present extensions, extensive validations and applications of our previously published hybrid volume-of-fluid-based (VOF) model for the simulation of free-surface flow problems. For the solution of the flow field, the lattice Boltzmann method is used, where the free surface is represented by a VOF approach. The advection equation for the VOF fill level is discretized with a finite volume method, on the basis of a 3D Piecewise Linear Interface Reconstruction (PLIC) algorithm. The model is validated for several standard free surface benchmarks, such as breaking dam scenarios and a free falling jet. Finally, the hybrid algorithm is applied to the simulation of a wave breaking by overturning during shoaling, which is considered to be a demanding test case, especially for VOF solvers. In this case, the flow field is initialized early in the shoaling process with a solitary wave solution from inviscid, irrotational potential flow. The wave breaking process is then simulated with the 3D transient and turbulent LBM–VOF solver. All validation and benchmark tests confirm the accuracy of the proposed hybrid model. 相似文献