期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Large-scale homo- and heterogeneous parallel paradigm design based on CFD application PHengLEI

YunBo Wan Zhong Zhao Jie Liu Laiping Zhang Yong Zhang Jianqiang Chen 《Concurrency and Computation》2024,36(5):e7933

The development of computational fluid dynamics (CFD) highly depends on high-performance computers. Computer hardware has evolved rapidly, yet scalable CFD parallel software remains scarce. In this article, we design a highly scalable CFD parallel paradigm for both homogeneous and heterogeneous supercomputers. The paradigm achieves the separation of communication and computation and automatically adapts to various solvers and hardware environments, thus reducing programming difficulties and increasing automatic parallelization. Meanwhile, the number of communications is greatly reduced and the scalability of the program is improved through implementing centralized communication and two-level partitioning techniques. Complex flow problems for real aircraft were then computed on different hardware platforms with a grid size of ten billion. The homogeneous computer hardware includes Intel Xeon Gold 6258R and Phytium 2000+ processors, and the heterogeneous computer platforms include NVIDIA Tesla V100 and SW26010 processors. High parallel efficiency was obtained on all computer platforms, verifying that the paradigm has good automatic parallelization, scalability, and stability. The paradigm in this article has an important reference value for CFD massively parallel computing and can promote the development and application of CFD technology. 相似文献

2.

Parallelization of a multiblock flow code: an engineering implementation

《Computers & Fluids》1999,28(4-5):603-614

Current trends in computer hardware are dictating a gradual shift toward the use of clusters of relatively inexpensive but powerful workstations, or massively parallel processing (MPP) machines, for scientific computing. However, most computational fluid dynamics (CFD) codes in use today were developed for large, shared-memory machines and are not readily portable to the distributed computing environment. One major hurdle in porting CFD codes to distributed computing platforms is the difficulty encountered in partitioning the problem so that the computation-to-communication ratio for each compute node (process) is maximized and the idle time during which one node waits for other nodes to transfer data is minimized. In the present work, pertinent issues involved in the parallelization of a widely used multiblock Navier–Stokes code TLNS3D are discussed. An engineering approach is used here to parallelize this code so that minimal deviation from the original (nonparallel) code is incurred. A natural partitioning along grid blocks is adopted in which one or more blocks are distributed to each of the available nodes. An automatic, static load-balancing strategy is employed for equitable distribution of computational work to specified nodes. Both parallel virtual machine (PVM) and message passing interface (MPI) protocols are incorporated for data communication to allow maximum portability to a wide range of computer configurations. Results are presented that are comparable with a priori estimates of performance for distributed computing and that are competitive in terms of central processing unit (CPU) time and wall time usage with large, shared-memory supercomputers. 相似文献

3.

A hardware Memetic accelerator for VLSI circuit partitioning

Stephen Coe Author Vitae Author Vitae Medhat Moussa Author Vitae 《Computers & Electrical Engineering》2007,33(4):233-248

During the last decade, the complexity and size of circuits have been rapidly increasing, placing a stressing demand on industry for faster and more efficient CAD tools for VLSI circuit layout. One major problem is the computational requirements for optimizing the place and route operations of a VLSI circuit. Thus, this paper investigates the feasibility of using reconfigurable computing platforms to improve the performance of CAD optimization algorithms for the VLSI circuit partitioning problem. The proposed Genetic algorithm architecture achieves up-to 5× speedup over conventional software implementation while maintaining on average 88% solution quality. Furthermore, a reconfigurable computing based Hybrid Memetic algorithm improves upon this solution while using a fraction of the execution time required by the conventional software based approach. 相似文献

4.

An Evaluation of Architectural Platforms for Parallel Navier-Stokes Computations

Jayasimha D. N. Hayder M. E. Pillay S. K. 《The Journal of supercomputing》1997,11(1):41-60

We study the computational, communication, and scalability characteristics of a computational fluid dynamics application, which solves the time-accurate flow field of a jet using the compressible Navier-Stokes equations, on a variety of parallel architectural platforms. The platforms chosen for this study are a cluster of workstations (the LACE experimental testbed at NASA Lewis), a shared-memory multiprocessor (the CRAY Y-MP), and distributed-memory multiprocessors with different topologies (the IBM SP and the CRAY T3D). We investigate the impact of various networks connecting the cluster of workstations on the performance of the application and the overheads induced by popular message-passing libraries used for parallelization. The work also highlights the importance of matching the memory bandwidth to processor speed for good single processor performance. By studying the performance of an application on a variety of architectures, we are able to point out the strengths and weaknesses of each of the example computing platforms. This revised version was published online in June 2006 with corrections to the Cover Date. 相似文献

5.

Accelerating incompressible flow computations with a Pthreads-CUDA implementation on small-footprint multi-GPU platforms

Julien C. Thibault Inanc Senocak 《The Journal of supercomputing》2012,59(2):693-719

Graphics processor units (GPU) that are originally designed for graphics rendering have emerged as massively-parallel “co-processors” to the central processing unit (CPU). Small-footprint multi-GPU workstations with hundreds of processing elements can accelerate compute-intensive simulation science applications substantially. In this study, we describe the implementation of an incompressible flow Navier–Stokes solver for multi-GPU workstation platforms. A shared-memory parallel code with identical numerical methods is also developed for multi-core CPUs to provide a fair comparison between CPUs and GPUs. Specifically, we adopt NVIDIA’s Compute Unified Device Architecture (CUDA) programming model to implement the discretized form of the governing equations on a single GPU. Pthreads are then used to enable communication across multiple GPUs on a workstation. We use separate CUDA kernels to implement the projection algorithm to solve the incompressible fluid flow equations. Kernels are implemented on different memory spaces on the GPU depending on their arithmetic intensity. The memory hierarchy specific implementation produces significantly faster performance. We present a systematic analysis of speedup and scaling using two generations of NVIDIA GPU architectures and provide a comparison of single and double precision computational performance on the GPU. Using a quad-GPU platform for single precision computations, we observe two orders of magnitude speedup relative to a serial CPU implementation. Our results demonstrate that multi-GPU workstations can serve as a cost-effective small-footprint parallel computing platform to accelerate computational fluid dynamics (CFD) simulations substantially. 相似文献

6.

面向超大规模并行模拟的LBM计算流体力学软件

吕小敬刘钊褚学森石树鹏孟虹松黄震春《计算机科学》2020,47(4):13-17

格子玻尔兹曼方法(Lattice Boltzmann Method,LBM)是一种基于介观模拟尺度的计算流体力学方法,已被广泛用于理论研究和工程领域。提高LBM计算流体软件的并行模拟能力,是高性能计算及应用研究中的一项重要内容。该研究基于“神威·太湖之光”超级计算系统,设计并实现了一套高效扩展的LBM计算流体力学软件。针对国产众核处理器SW26010的架构,文中设计了以下几种提高SWLBM方针速度和可扩展性的多级并行技术,包括面向19点stencil的数据复用、碰撞过程向量化、主从异步并行通信计算隐藏等。基于以上并行优化方案,文中测试了高达56000亿网格的数值模拟,SWLBM软件持续浮点计算性能达到4.7 PFlops,软件模拟速度提高了172倍。相比百万核心10000*10000*5000网格风场模拟,SWLBM整机千万核心的并行效率可达87%。测试结果表明,SWLBM有能力为工业应用提供实用的大规模并行模拟解决方案。相似文献

7.

FuPerMod: a software tool for the optimization of data-parallel applications on heterogeneous platforms

David Clarke Ziming Zhong Vladimir Rychkov Alexey Lastovetsky 《The Journal of supercomputing》2014,69(1):61-69

Optimization of data-parallel applications for modern HPC platforms requires partitioning the computations between the heterogeneous computing devices in proportion to their speed. Heterogeneous data partitioning algorithms are based on computation performance models of the executing platforms. Their implementation is not trivial as it requires: accurate and efficient benchmarking of computing devices, which may share resources and/or execute different codes; appropriate interpolation methods to predict performance; and advanced mathematical methods to solve the data partitioning problem. In this paper, we present FuPerMod, a software tool that addresses these implementation issues and automates the development of data partitioning code in data-parallel applications for heterogeneous HPC platforms. 相似文献

8.

A case study on expressiveness and performance of component-oriented parallel programming

Francisco Heron de Carvalho Junior Cenez Araújo de Rezende 《Journal of Parallel and Distributed Computing》2013

Component-oriented programming has been applied to address the requirements of large-scale applications from computational sciences and engineering that present high performance computing (HPC) requirements. However, parallelism continues to be a challenging requirement in the design of CBHPC (Component-Based High Performance Computing) platforms. This paper presents strong evidence about the efficacy and the efficiency of HPE (Hash Programming Environment), a CBHPC platform that provides full support for parallel programming, on the development, deployment and execution of numerical simulation code onto cluster computing platforms. 相似文献

9.

Computational challenges of viscous incompressible flows 总被引：1，自引：0，他引：1

Dochan Kwak Cetin Kiris Chang Sung Kim 《Computers & Fluids》2005,34(3):283-299

Over the past 30 years, numerical methods and simulation tools for incompressible flows have been advanced as a subset of the computational fluid dynamics (CFD) discipline. Although incompressible flows are encountered in many areas of engineering, simulation of compressible flow has been the major driver for developing computational algorithms and tools. This is probably due to the rather stringent requirements for predicting aerodynamic performance characteristics of flight vehicles, while flow devices involving low-speed or incompressible flow could be reasonably well designed without resorting to accurate numerical simulations. As flow devices are required to be more sophisticated and highly efficient, CFD tools become increasingly important in fluid engineering for incompressible and low-speed flow. This paper reviews some of the successes made possible by advances in computational technologies during the same period, and discusses some of the current challenges faced in computing incompressible flows. 相似文献

10.

Physis语言框架在WENO高阶数值格式异构计算中的应用

邬萍孟晨王龙《数据与计算发展前沿》2015,6(5):42-47

WENO(weighted essentially non-oscillatory)是计算流体力学中广泛采用的一种高阶数值格式。由于算法本身和异构计算编程的复杂性,需要开展异构计算代码自动生成的研究,以加速更多的应用。本文基于Physis这一领域编程语言框架,针对三维五阶WENO计算的天文应用,实现了其异构代码的自动生成。在超级计算机"元"上的测试结果表明,自动生成的异构计算代码具有良好的可扩展性,计算性能达到手工优化异构代码的72%,可为相关流体计算的异构代码生成提供借鉴。相似文献

11.

Complementing computational fluid dynamics methods with classical analytical techniques

《Computers & Fluids》1999,28(4-5):389-425

New aerospace vehicle designs must have greater performance and versatility at affordable cost. This requires multi-disciplinary analysis and optimization which in turn requires more accurate and efficient numerical simulation tools. The need for greater accuracy and efficiency of computational fluid dynamics (CFD) tools is further amplified by the industry trend toward distributed computing (e.g. workstation clusters) and away from supercomputers. Complementary analytic methods coupled with traditional CFD approaches offer the means for increased simulation capability by incorporating more essential physics into solution algorithms and reducing reliance on grid density for achieving accuracy. McDonnell Douglas Aerospace has a focused activity directed at improving affordability of CFD tools with complementary analytic techniques and has developed a strong capability. Results have proven very successful. Several examples of ongoing work are discussed, including improved far-field boundary conditions for CFD codes and analytic-based aerodynamic analysis and design optimization methods. 相似文献

12.

Parallel computational methods for 3D simulation of a parafoil with prescribed shape changes

《Parallel Computing》1997,23(9):1349-1363

In this paper we describe parallel computational methods for 3D simulation of the dynamics and fluid dynamics of a parafoil with prescribed, time-dependent shape changes. The mathematical model is based on the time-dependent, 3D Navier-Stokes equations governing the incompressible flow around the parafoil and Newton's law of motion governing the dynamics of the parafoil, with the aerodynamic forces acting on the parafoil calculated from the flow field. The computational methods developed for these 3D simulations include a stabilized space-time finite element formulation to accommodate for the shape changes, special mesh generation and mesh moving strategies developed for this purpose, iterative solution techniques for the large, coupled nonlinear equation systems involved, and parallel implementation of all these methods on scalable computing systems such as the Thinking Machines CM-5. As an example, we report 3D simulation of a flare maneuver in which the parafoil velocity is reduced by pulling down the flaps. This simulation requires solution of over 3.6 million coupled, nonlinear equations at every time step of the simulation. 相似文献

13.

An application-centric characterization of domain-based SFC partitioners for parallel SAMR 总被引：1，自引：0，他引：1

Steensland J. Chandra S. Parashar M. 《Parallel and Distributed Systems, IEEE Transactions on》2002,13(12):1275-1289

Structured adaptive mesh refinement (SAMR) methods for the numerical solution of partial differential equations yield highly advantageous ratios for cost/accuracy as compared to methods based on static uniform approximations. These techniques are being effectively used in many domains including computational fluid dynamics, numerical relativity, astrophysics, subsurface modeling, and oil reservoir simulation. Distributed implementations of these methods, however, lead to significant challenges in dynamic data-distribution, load-balancing, and runtime management. This paper presents an application-centric characterization of a suite of dynamic domain-based inverse space-filling curve partitioning techniques for the distributed adaptive grid hierarchies that underlie SAMR applications. The overall goal of this research is to formulate policies required to drive a dynamically adaptive metapartitioner for SAMR grid hierarchies capable of selecting the most appropriate partitioning strategy at runtime based on current application and system state. Such a metapartitioner can significantly reduce the execution time of SAMR applications. 相似文献

14.

CFD for incompressible flows at NASA Ames

Dochan Kwak Cetin Kiris 《Computers & Fluids》2009,38(3):504-510

Over the past 30 years, numerical methods and simulation tools for incompressible flows have been advanced as a subset of the computational fluid dynamics (CFD) discipline. Although incompressible flows are encountered in many areas of engineering, the simulation of compressible flows has driven most of the development of computational algorithms and tools at NASA Ames Research Center. This is due to the stringent requirements for predicting aerodynamic performances of flight vehicles. Conversely, low-speed incompressible flow through or past flow devices did not require the same numerical accuracy. This practice of tolerating relatively low-fidelity solutions in engineering applications has changed, as the design of low-speed flow devices have become more sophisticated, along with more strict efficiency requirements. Accurate and robust CFD tools have become increasingly important in fluid engineering for incompressible and low-speed flow. This paper reviews advances in computational technologies for incompressible flow simulation developed at Ames, and some engineering successes brought about by these advances made during the same period. Additionally, some of the current challenges faced in computing incompressible flows are presented. 相似文献

15.

Investigation on runtime partitioning of elastic mobile applications for mobile cloud computing 总被引：1，自引：0，他引：1

Muhammad Shiraz Ejaz Ahmed Abdullah Gani Qi Han 《The Journal of supercomputing》2014,67(1):84-103

The latest developments in mobile computing technology have increased the computing capabilities of smartphones in terms of storage capacity, features support such as multimodal connectivity, and support for customized user applications. Mobile devices are, however, still intrinsically limited by low bandwidth, computing power, and battery lifetime. Therefore, the computing power of computational clouds is tapped on demand basis for mitigating resources limitations in mobile devices. Mobile cloud computing (MCC) is believed to be able to leverage cloud application processing services for alleviating the computing limitations of smartphones. In MCC, application offloading is implemented as a significant software level solution for sharing the application processing load of smartphones. The challenging aspect of application offloading frameworks is the resources intensive mechanism of runtime profiling and partitioning of elastic mobile applications, which involves additional computing resources utilization on Smart Mobile Devices (SMDs). This paper investigates the overhead of runtime application partitioning on SMD by analyzing additional resources utilization on SMD in the mechanism of runtime application profiling and partitioning. We evaluate the mechanism of runtime application partitioning on SMDs in the SmartSim simulation environment and validate the overhead of runtime application profiling by running prototype application in the real mobile computing environment. Empirical results indicate that additional computing resources are utilized in runtime application profiling and partitioning. Hence, lightweight alternatives with optimal distributed deployment and management mechanism are mandatory for accessing application processing services of computational clouds. 相似文献

16.

Parallel Algorithms for Dynamic Shortest Path Problems

Ismail Chabini & Sridevi Ganugapati 《International Transactions in Operational Research》2002,9(3):279-302

The development of intelligent transportation systems (ITS) and the resulting need for the solution of a variety of dynamic traffic network models and management problems require faster‐than‐real‐time computation of shortest path problems in dynamic networks. Recently, a sequential algorithm was developed to compute shortest paths in discrete time dynamic networks from all nodes and all departure times to one destination node. The algorithm is known as algorithm DOT and has an optimal worst‐case running‐time complexity. This implies that no algorithm with a better worst‐case computational complexity can be discovered. Consequently, in order to derive algorithms to solve all‐to‐one shortest path problems in dynamic networks, one would need to explore avenues other than the design of sequential solution algorithms only. The use of commercially‐available high‐performance computing platforms to develop parallel implementations of sequential algorithms is an example of such avenue. This paper reports on the design, implementation, and computational testing of parallel dynamic shortest path algorithms. We develop two shared‐memory and two message‐passing dynamic shortest path algorithm implementations, which are derived from algorithm DOT using the following parallelization strategies: decomposition by destination and decomposition by transportation network topology. The algorithms are coded using two types of parallel computing environments: a message‐passing environment based on the parallel virtual machine (PVM) library and a multi‐threading environment based on the SUN Microsystems Multi‐Threads (MT) library. We also develop a time‐based parallel version of algorithm DOT for the case of minimum time paths in FIFO networks, and a theoretical parallelization of algorithm DOT on an ‘ideal’ theoretical parallel machine. Performances of the implementations are analyzed and evaluated using large transportation networks, and two types of parallel computing platforms: a distributed network of Unix workstations and a SUN shared‐memory machine containing eight processors. Satisfactory speed‐ups in the running time of sequential algorithms are achieved, in particular for shared‐memory machines. Numerical results indicate that shared‐memory computers constitute the most appropriate type of parallel computing platforms for the computation of dynamic shortest paths for real‐time ITS applications. 相似文献

17.

Coupled shell-fluid interaction problems with degenerate shell and three-dimensional fluid elements 总被引：1，自引：0，他引：1

R. K. Singh T. Kant A. Kakodkar 《Computers & Structures》1991,38(5-6):515-528

Discrete methods for practical coupled three-dimensional fluid-structure interaction problems are developed. A C° explicitly integrated two-dimensional degenerate shell element and a three-dimensional fluid element are coupled to study shell dynamics, fluid transient and coupled shell-fluid interaction problems. The method of partitioning is used to integrate the fluid and shell meshes in a staggered fashion in an optimum manner. Effective explicit-implicit partitioning is shown to achieve high computational efficiency for this type of problem. 相似文献

18.

Spark平台上利用网络加权Voronoi图的分散迭代社区聚类并行化研究

颜烨张学文王立婧《计算机应用与软件》2021,38(3):14-21,38

针对当下数据大规模增长对计算能力需求的急剧增长,传统独立运行的机器在大规模网络社区中执行社区检测操作时无法提供所需的数据处理能力的问题,提出一种网络加权Voronoi图的并行分散迭代社区聚类法(NWVD-PDICCM)。利用基于网络加权Voronoi图的分散迭代社区聚类方法(NWVD-DICCM)提取大型网络的有效社区结构。结合并行聚类方法,将DICCM方法的操作从串行过程转换为并行计算。利用执行并行社区聚类时的图分区,通过最小化从属工作者之间的通信来加速该过程。仿真实验结果表明,NWVD-PDICCM可以与一系列计算机架构平台共同运行,并且实现基于Spark平台的并行操作,相比其他几种较新的方法,在大规模网络数据处理能力方面得到显著提升。相似文献

19.

Multidisciplinary Simulation of the Maneuvering of an Aircraft

C. Farhat K. Pierson C. Degand 《Engineering with Computers》2001,17(1):16-27

A computational methodology for the simulation of the transient aeroelastic response of an unrestrained and flexible aircraft during high-G maneuvers is presented. The key components of this methodology are: (a) a three-field formulation for coupled fluid/structure interaction problems; (b) a second-order time-accurate and geometrically conservative flow solver for CFD computations on unstructured dynamic meshes; (c) a corotational finite element method for the solution of geometrically nonlinear and unrestrained structural dynamics problems; (d) a robust method for updating an unrestrained and unstructured moving fluid mesh; and (e) a second-order time-accurate staggered algorithm for time-integrating the coupled fluid/structure semi-discrete equations of motion. This computational methodology is illustrated with the simulation on a parallel processor of several three-dimensional high-G pullup maneuvers of the Langley Fighter in the transonic regime, using a detailed finite element aeroelastic model. 相似文献

20.

A high-performance lattice Boltzmann implementation to model flow in porous media 总被引：1，自引：0，他引：1

Chongxun Pan Cass T. Miller 《Computer Physics Communications》2004,158(2):89-105

We examine the problem of simulating single and multiphase flow in porous medium systems at the pore scale using the lattice Boltzmann (LB) method. The LB method is a powerful approach, but one which is also computationally demanding; the resolution needed to resolve fundamental phenomena at the pore scale leads to very large lattice sizes, and hence substantial computational and memory requirements that necessitate the use of massively parallel computing approaches. Common LB implementations for simulating flow in porous media store the full lattice, making parallelization straightforward but wasteful. We investigate a two-stage implementation consisting of a sparse domain decomposition stage and a simulation stage that avoids the need to store and operate on lattice points located within a solid phase. A set of five domain decomposition approaches are investigated for single and multiphase flow through both homogeneous and heterogeneous porous medium systems on different parallel computing platforms. An orthogonal recursive bisection method yields the best performance of the methods investigated, showing near linear scaling and substantially less storage and computational time than the traditional approach. 相似文献