首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
基于Fluent的全机数值模拟及并行计算   总被引:3,自引:0,他引:3  
利用CFD商用软件Fluent对亚声速飞行飞机的三维绕流流场进行了数值模拟以及并行计算,得到了飞机附近的流场,实现了此软件在高性能并行计算机上的并行;并且通过对不同数量网格在不同结点数目机群上的计算结果进行分析比较,验证了此商用软件在并行平台上应用的有效性,也为进行大规模科学工程计算提供了技术参照。  相似文献   

2.
Dynamically allocating computing nodes to parallel applications is a promising technique for improving the utilization of cluster resources. Detailed simulations can help identify allocation strategies and problem decomposition parameters that increase the efficiency of parallel applications. We describe a simulation framework supporting dynamic node allocation which, given a simple cluster model, predicts the running time of parallel applications taking CPU and network sharing into account. Simulations can be carried out without needing to modify the application code. Thanks to partial direct execution, simulation times and memory requirements are reduced. In partial direct execution simulations, the application's parallel behavior is retrieved via direct execution, and the duration of individual operations is obtained from a performance prediction model or from prior measurements. Simulations may then vary cluster model parameters, operation durations and problem decomposition parameters to analyze their impact on the application performance and identify the limiting factors. We implemented the proposed techniques by adding direct execution simulation capabilities to the Dynamic Parallel Schedules parallelization framework. We introduce the concept of dynamic efficiency to express the resource utilization efficiency as a function of time. We verify the accuracy of our simulator by comparing the effective running time, respectively the dynamic efficiency, of parallel program executions with the running time, respectively the dynamic efficiency, predicted by the simulator under different parallelization and dynamic node allocation strategies.  相似文献   

3.
Clusters of SMPs are hybrid-parallel architectures that combine the main concepts of distributed-memory and shared-memory parallel machines. Although SMP clusters are widely used in the high performance computing community, there exists no single programming paradigm that allows exploiting the hierarchical structure of these machines. Most parallel applications deployed on SMP clusters are based on MPI, the standard API for distributed-memory parallel programming, and thus may miss a number of optimization opportunities offered by the shared memory available within SMP nodes. In this paper we present extensions to the data parallel programming language HPF and associated compilation techniques for optimizing HPF programs on clusters of SMPs. The proposed extensions enable programmers to control key aspects of distributed-memory and shared-memory parallelization at a high-level of abstraction. Based on these language extensions, a compiler can adopt a hybrid parallelization strategy which closely reflects the hierarchical structure of SMP clusters by automatically exploiting shared-memory parallelism based on OpenMP within cluster nodes and distributed-memory parallelism utilizing MPI across nodes. We describe the implementation of these features in the VFC compiler and present experimental results which show the effectiveness of these techniques.  相似文献   

4.
Hydrologic modeling requires the handling of a wide range of highly nonlinear processes from the scale of a hill slope to the continental scale, and thus the computational efficiency of the model becomes a critical issue for water resource management. This work is aimed at implementing and evaluating a flexible parallel computing framework for hydrologic simulations by applying OpenMP in the HydroGeoSphere (HGS) model. HGS is a 3D control-volume finite element model that solves the nonlinear coupled equations describing surface–subsurface water flow, solute migration and energy transport. The computing efficiency of HGS is improved by three parallel computing schemes: 1) parallelization of Jacobian matrix assembly, 2) multi-block node reordering for performing LU solve efficiently, and 3) parameter privatization for reducing memory access latency. Regarding to the accuracy and consistency of the simulation solutions obtained with parallel computing, differences in the solutions are entirely due to use of a finite linear solver iteration tolerance, which produces slightly different solutions which satisfy the convergence tolerance. The maximum difference in the head solution between the serial and parallel simulations is less than 10−3 m, using typical convergence tolerances. Using the parallel schemes developed in this work, three key achievements can be summarized: (1) parallelization of a physically-based hydrologic simulator can be performed in a manner that allows the same code to be executed on various shared memory platforms with minimal maintenance; (2) a general, flexible and robust parallel iterative sparse-matrix solver can be implemented in a wide range of numerical models employing either structured or unstructured mesh; and (3) the methodology is flexible, especially for the efficient construction of the coefficient and Jacobian matrices, compared to other parallelized hydrologic models which use parallel library packages.  相似文献   

5.
We compare two recently developed mesoscale models of binary immiscible and ternary amphiphilic fluids. We describe and compare the algorithms in detail and discuss their stability properties. The simulation results for the cases of self-assembly of ternary droplet phases and binary water-amphiphile sponge phases are compared and discussed. Both models require parallel implementation and deployment on large scale parallel computing resources in order to achieve reasonable simulation times for three-dimensional models. The parallelization strategies and performance on two distinct parallel architectures are compared and discussed. Large scale three-dimensional simulation of multiphase fluids requires the extensive use of high performance visualization techniques in order to enable the large quantities of complex data to be interpreted. We report on our experiences with two commercial visualization products: AVS and VTK. We also discuss the application and use of novel computational steering techniques for the more efficient utilization of high performance computing resources. We close the paper with some suggestions for the future development of both models.  相似文献   

6.
Heterogeneous network-based distributed and parallel computing is gaining increasing acceptance as an alternative or complementary paradigm to multiprocessor-based parallel processing as well as to conventional supercomputing. While algorithmic and programming aspects of heterogeneous concurrent computing are similar to their parallel processing counterparts, system issues, partitioning and scheduling, and performance aspects are significantly different. In this paper, we discuss the evolution of heterogeneous concurrent computing, in the context of the parallel virtual machine (PVM) system, a widely adopted software system for network computing. In particular, we highlight the system level infrastructures that are required, aspects of parallel algorithm development that most affect performance, system capabilities and limitations, and tools and methodologies for effective computing in heterogeneous networked environments. We also present recent developments and experiences in the PVM project, and comment on ongoing and future work.  相似文献   

7.
针对大数据环境下基于Can树(canonical order tree)的增量关联规则算法存在树结构空间占用过大、频繁模式挖掘效率不佳以及MapReduce集群并行化性能不足等问题,提出了一种基于粗糙集和归并剪枝方法改进的并行关联规则增量挖掘算法MR-PARIRM(MapReduce-based parallel association rules incremental mining algo-rithm using rough set and merge pruning).首先,设计了一种基于粗糙集的相似项合并策略RS-SIM(rough set based similar item merge)对数据集的相似项进行合并处理,并根据合并后的数据进行Can树构造,从而降低树结构的空间占用;其次,提出了一种归并剪枝策略MPS(merge pruning strategy)对树结构中的传播路径进行修剪合并,通过压缩频繁模式搜索空间来加快频繁项挖掘;最后,通过动态调度策略DSS(dynamic scheduling strategy)对异构式MapReduce集群中的计算任务进行动态调度,实现了负载均衡,有效提升了集群的并行化运算能力.最终的实验仿真结果表明,MR-PARIRM在大数据环境下具有相对较好的性能表现,适用于对大规模数据进行并行化处理.  相似文献   

8.
【目的】本文主要分析人工智能和大数据应用随着迅速增大的数据规模,给计算机系统带来的主要挑战,并针对计算机系统的发展趋势给出了一些面向人工智能和大数据亟待解决的高效能计算的若干研究方向。【文献范围】本文广泛查阅国内外在超级计算和高性能计算平台进行大数据和人工智能计算的最新研究成果及解决的挑战性问题。【方法】大数据既为人工智能提供了日益丰富的训练数据集合,但也给计算机系统的算力提出了更高的要求。近年来我国超级计算机处于世界的前列,为大数据和人工智能的大规模应用提供了强有力的计算平台支撑。【结果】而目前以超级计算机为代表的高性能计算平台大多采用CPU+加速器构成的异构并行计算系统,其数量众多的计算核心能够为人工智能和大数据应用提供强大的计算能力。【局限性】由于体系结构复杂,在充分发挥计算能力和提高计算效率方面存在较大挑战。尤其针对有别于科学计算的人工智能和大数据领域,其并行计算效率的提升更为困难。【结论】因此需要从底层的资源管理、任务调度、以及基础算法设计、通信优化,到上层的模型并行化和并行编程等方面展开高效能计算的研究,全面提升人工智能和大数据应用在高性能计算平台上的计算能效。  相似文献   

9.
We examine the problem of simulating single and multiphase flow in porous medium systems at the pore scale using the lattice Boltzmann (LB) method. The LB method is a powerful approach, but one which is also computationally demanding; the resolution needed to resolve fundamental phenomena at the pore scale leads to very large lattice sizes, and hence substantial computational and memory requirements that necessitate the use of massively parallel computing approaches. Common LB implementations for simulating flow in porous media store the full lattice, making parallelization straightforward but wasteful. We investigate a two-stage implementation consisting of a sparse domain decomposition stage and a simulation stage that avoids the need to store and operate on lattice points located within a solid phase. A set of five domain decomposition approaches are investigated for single and multiphase flow through both homogeneous and heterogeneous porous medium systems on different parallel computing platforms. An orthogonal recursive bisection method yields the best performance of the methods investigated, showing near linear scaling and substantially less storage and computational time than the traditional approach.  相似文献   

10.
流线是流场可视化的主要方法之一,而针对大规模流场的流线生成由于计算量大往往需要采用高性能计算机这样的并行计算环境结合并行化算法以实现计算加速.在当前异构计算系统越来越普遍的情况下,为了充分利用并行异构计算环境的计算能力,实现更高效的并行流线生成,本文采用了基于数据并行原语结合分布式消息通讯的技术架构,设计了一套适用于异构集群的混合并行流线生成系统,并在此基础上针对数据分块、数据冗余化及进程通讯策略等方面进行设计,提出并实现了一套并行粒子追踪算法.该系统被部署于国产超算平台上,并针对大规模CFD流场模拟结果数据可视化应用开展了实验.本文给出了相关实验结果,分析了核心并行算法的速度性能、可扩展性以及负载均衡等方面情况,说明了系统及算法的有效性和可扩展性.  相似文献   

11.
随着计算机硬件性能的提高,目前在个人终端上也开始出现使用预训练机器学习模型进行推理的运用.Caffe是一款流行的深度学习框架,擅长图像分类等任务,但是在默认状态下只能单核运行,无法充分发挥异构并行计算设备的计算能力.深度学习对于计算性能的要求较高,如果能并行化以充分使用所有计算设备,就能提升计算速度和使用体验.由于CP...  相似文献   

12.
基于对称三对角矩阵特征求解的分而治之方法,提出了一种改进的使用MPI/Cilk模型求解的混合并行实现,结合节点间数据并行和节点内多任务并行,实现了对分治算法中分治阶段和合并阶段的多任务划分和动态调度.节点内利用Cilk任务并行模型解决了线程级并行的数据依赖和饥饿等待等问题,提高了并行性;节点间通过改进合并过程中的通信流程,使组内进程间只进行互补的数据交换,降低了通信开销.数值实验体现了该混合并行算法在计算效率和扩展性方面的优势.  相似文献   

13.
This paper shows how a high-level matrix programming language may be used to perform Monte Carlo simulation, bootstrapping, estimation by maximum likelihood and GMM, and kernel regression in parallel on symmetric multiprocessor computers or clusters of workstations. The implementation of parallelization is done in a way such that an investigator may use the programs without any knowledge of parallel programming. A bootable CD that allows rapid creation of a cluster for parallel computing is introduced. Examples show that parallelization can lead to important reductions in computational time. Detailed discussion of how the Monte Carlo problem was parallelized is included as an example for learning to write parallel programs for Octave. JEL Classifications: C13; C14; C15; C63; C87  相似文献   

14.
俞莉花  曾国荪 《计算机科学》2011,38(10):285-290
计算环境的异构性以及应用任务的复杂多样性导致异构计算的必要性。异构计算的目的是重视并行处理系 统和计算任务的差异,寻求系统和任务的有效匹配,从而获得并行任务在系统上执行的最佳效果。当前,异构计算中 的时间优化执行方法较成熟,但同时将时间和能耗联合起来作为异构计算优化执行目标方面的研究很少。以高性能 计算和绿色计算为总目标,针对异构计算环境中并行任务分配调度执行问题,提出了异构任务模型、异构计算速率矩 阵、异构计算功率矩阵,利用能耗时间归一思想,给出并行任务在异构处理机上时间与能耗启发式优化执行算法,并通 过实例分析证实算法的可行性和有效性。  相似文献   

15.
机群OpenMP系统的设计与实现   总被引:5,自引:0,他引:5  
OpenMP以其易用性和支持增量并行的特点成为共享存储体系结构的编程标准.目前机群系统已成为高性能计算的主流平台,研究机群OpenMP系统对推进并行应用的开发和普及非常有意义.该文作者以软件DSM系统JIAJIA作为OpenMP的运行时系统,结合一个前端编译器OMP2JIA,在机群系统上实现了OpenMP/JIAJIA计算环境,同时在提高性能方面根据机群系统特点扩展了OpenMP制导,优化了后端运行时库。通过11个OpenMP应用,作者比较了该计算环境和一个支持OpenMP的硬件cc-NUMA系统(SGI 2100)的性能.结果表明,作者的机群OpenMP系统的7机平均加速比为4.62;SGI 2100系统为4.55,二者性能相当.  相似文献   

16.
The paper presents a new open‐source framework called KernelHive for multilevel parallelization of computations among various clusters, cluster nodes, and finally, among both CPUs and GPUs for a particular application. An application is modeled as an acyclic directed graph with a possibility to run nodes in parallel and automatic expansion of nodes (called node unrolling) depending on the number of computation units available. A methodology is proposed for parallelization and mapping of an application to the environment that includes selection of devices using a chosen optimizer, selection of best grid configurations for compute devices, optimization of data partitioning and the execution. One of possibly many scheduling algorithms can be selected considering execution time, power consumption, and so on. An easy‐to‐use GUI is provided for modeling and monitoring with a repository of ready‐to‐use constructs and computational kernels. The methodology, execution times, and scalability have been demonstrated for a distributed and parallel password‐breaking example run in a heterogeneous environment with a cluster and servers with different numbers of nodes and both CPUs and GPUs. Additionally, performance of the framework has been compared with an MPI + OpenCL implementation using a parallel geospatial interpolation application employing up to 40 cluster nodes and 320 cores. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

17.
This paper deals with the solution of three key problems for enabling the consideration of the massive parallelization for multibody dynamics. Instead of classical joints, the flexible joints with appropriate stiffness and damping are introduced in the multibody system, which enables to derive completely decoupled equations of motion and as a consequence to simulate them using massive parallel computing. Such formulation causes the uprise of high frequencies in the solution. Therefore, the heterogeneous multiscale method is used for numerical integration. However, three key problems had to be solved prior to such multibody simulation could be considered for further development. The problems are: the clear distinction of macro-model and micro-model in order to really reduce the eigenvalues of the integrated model, the completely decoupled procedure for estimation of reaction forces for each micro-integration restart, and the suitable choice of microintegration time length.  相似文献   

18.
The paper deals with the parallelization of Delaunay triangulation algorithms, giving more emphasis to pratical issues and implementation than to theoretical complexity. Two parallel implementations are presented. The first one is built on De Wall, an Ed triangulator based on an original interpretation of the divide & conquer paradigm. The second is based on an incremental construction algorithm. The parallelization strategies are presented and evaluated. The target parallel machine is a distributed computing environment, composed of coarse grain processing nodes. Results of first implementations are reported and compared with the performance of the serial versions running on a Unix workstation.  相似文献   

19.
In this work a computational procedure for two-scale topology optimization problem using parallel computing techniques is developed. The goal is to obtain simultaneously the best structure and material, minimizing structural compliance. An algorithmic strategy is presented in a suitable way for parallelization. In terms of parallel computing facilities, an IBM Cluster 1350 is used comprising 70 computing nodes each with two dual core processors, for a total of 280 cores. Scalability studies are performed with mechanical structures of low/moderate dimensions. Finally the applicability of the proposed methodology is demonstrated solving a grand challenge problem that is the simulation of trabecular bone adaptation.  相似文献   

20.
基于Cart3D的全机数值模拟及并行计算   总被引:2,自引:0,他引:2       下载免费PDF全文
利用CFD商用软件Cart3D对亚声速飞行飞机的三维绕流流场进行了数值模拟以及并行计算,得到了飞机附近的流场,实现了此软件在高性能并行计算机上的并行;通过对比不同商用软件的计算结果,验证了用Cart3D软件进行数值模拟的有效性,为大规模科学工程计算提供了技术参照。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号