首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
We address the problem of porting parallel distributed applications from static homogeneous cluster environments to dynamic heterogeneous Grid resources. We introduce a generic technique for adaptive load balancing of parallel applications on heterogeneous resources and evaluate it using a case study application: a Virtual Reactor for simulation of plasma chemical vapour deposition. This application has a modular architecture with a number of loosely coupled components suitable for distribution over the Grid. It requires large parameter space exploration that allows using Grid resources for high-throughput computing. The Virtual Reactor contains a number of parallel solvers originally designed for homogeneous computer clusters that needed adaptation to the heterogeneity of the Grid. In this paper we study the performance of one of the parallel solvers, apply the technique developed for adaptive load balancing, evaluate the efficiency of this approach and outline an automated procedure for optimal utilization of heterogeneous Grid resources for high-performance parallel computing.  相似文献   

2.
Many scientific applications involve grids that lack a uniform underlying structure. These applications are often also dynamic in nature in that the grid structure significantly changes between successive phases of execution. In parallel computing environments, mesh adaptation of unstructured grids through selective refinement/coarsening has proven to be an effective approach. However, achieving load balance while minimizing interprocessor communication and redistribution costs is a difficult problem. Traditional dynamic load balancers are mostly inadequate because they lack a global view of system loads across processors. In this paper, we propose a novel and general-purpose load balancer that utilizes symmetric broadcast networks (SBN) as the underlying communication topology and compare its performance with a successful global load balancing environment, called PLUM, specifically created to handle adaptive unstructured applications. Our experimental results on an IBM SP2 demonstrate that the SBN-based load balancer achieves lower redistribution costs than that under PLUM by overlapping processing and data migration  相似文献   

3.
燃烧数值模拟计算通常采用非结构网格模拟计算区域。在非结构网格上进行并行模拟计算时,其自适应方式使得不同进程上的计算负载频繁变动,且差异巨大,导致并行计算效率低下。为了提高并行计算的效率,一个有效的方法是采用动态负载平衡技术。提出一种针对燃烧的化学反应状态的动态负载平衡方法,该方法采用不同策略对化学反应不同阶段各进程上的计算负载进行预测,根据预测结果平均进程间的计算任务,达到负载平衡。实验分析表明,该方法能有效地降低进程间的负载不平衡程度,使得模拟计算的总体运行时间降低了10%。  相似文献   

4.
一个基于网络并行计算环境的动态负载分配算法   总被引:8,自引:0,他引:8  
网络并行计算系统具有大量的自主的计算资源,如何充分发挥它们的潜在性能,这正是负载平衡的研究内容。文中描述一个基于网络并行计算环境的动态负载分配算法,该算法能够根据系统的状态和任务之间的通信关系动态地分配系统中的负载,以实现系统的动态负载平衡。通过应用实例测试说明该算法在稳定性和性能上,优于稳定的发送者动自适应算法。  相似文献   

5.
一维高效动态负载平衡方法:多层均权法   总被引:6,自引:0,他引:6  
莫则尧 《计算机学报》2001,24(2):183-190
提出了一个适合同构和异构并行计算环境的高效一维动态负载平衡方法;多层均权法,并成功地解决了多物质非定常流体力学Lagrange法并行数值模拟过程中的动态负载不平衡问题。文中给出了详细的理论分析以及两台并行机上结合某实际物理问题组织的并行数值实验。  相似文献   

6.
In this paper we consider the scalability of parallel space‐filling curve generation as implemented through parallel sorting algorithms. Multiple sorting algorithms are studied and results show that space‐filling curves can be generated quickly in parallel on thousands of processors. In addition, performance models are presented that are consistent with measured performance and offer insight into performance on still larger numbers of processors. At large numbers of processors, the scalability of adaptive mesh refined codes depends on the individual components of the adaptive solver. One such component is the dynamic load balancer. In adaptive mesh refined codes, the mesh is constantly changing resulting in load imbalance among the processors requiring a load‐balancing phase. The load balancing may occur often, requiring the load balancer to perform quickly. One common method for dynamic load balancing is to use space‐filling curves. Space‐filling curves, in particular the Hilbert curve, generate good partitions quickly in serial. However, at tens and hundreds of thousands of processors serial generation of space‐filling curves will hinder scalability. In order to avoid this issue we have developed a method that generates space‐filling curves quickly in parallel by reducing the generation to integer sorting. Copyright © 2007 John Wiley & Sons, Ltd.  相似文献   

7.
当前网络用户数量、多运营终端节点数量增长趋势明显,导致网络计算资源很难达到均衡状态。提出基于多Agent技术的多运营终端自适应负载均衡算法。定义多运营终端负载状态,采集多运营终端负载信息,并量化处理负载信息。以此为基础,搭建Agent负载均衡结构,引入多Agent技术,结合多运营终端节点工作特点,构建网络资源模型,设计...  相似文献   

8.
Particle tracking methods are a versatile computational technique central to the simulation of a wide range of scientific applications. In this paper, we present a new parallel particle tracking framework for the applications of scientific computing. The framework includes the in-element particle tracking method, which is based on the assumption that particle trajectories are computed by problem data localized to individual elements, as well as the dynamic partitioning of particle-mesh computational systems. The ultimate goal of this research is to develop a parallel in-element particle tracking framework capable of interfacing with a different order of accuracy of ordinary differential equation (ODE) solver. The parallel efficiency of such particle-mesh systems depends on the partitioning of both the mesh elements and the particles; this distribution can change dramatically because of movement of the particles and adaptive refinement of the mesh. To address this problem we introduce a combined load function that is a function of both the particle and mesh element distributions. We present experimental results that detail the performance of this parallel load balancing approach for a three-dimensional particle-mesh test problem on an unstructured, adaptive mesh, and demonstrate the ability of interfacing with different ODE solvers.  相似文献   

9.
In order to exploit the efficient computing power of many integrated cores on heterogeneous cluster, a multi-level and multi-granularity collaborative parallel computing method is proposed for finite element structural mechanical analysis. Computing tasks are divided into three levels: inter-node parallelism, inter-device parallelism and inter-core parallelism. Through mapping decomposablecomput- ing jobs to different hardware layers of heterogeneous MIC system, the proposed method not only effectively resolves the load balancing problem between CPU and MIC devices, but also significantly reduces the communication overheads of the system. Different engineering simulation case experiments for large scale parallel computing were conducted on “Tianhe 2” supercomputer. Up to 39000 CPU+MIC cores were employed and the finite element size of the analysis was more than 100 million units. Test results show that the proposed method can achieve good speedup and parallel computing efficiency in large scale parallel computing of finite element structural analysis. The optimized adaptation of finite element structural analysis and heterogeneous MIC computing platform is realized, which can provide reference for parallel porting and performance optimization of similar applications.  相似文献   

10.
A load balancing framework for adaptive and asynchronous applications   总被引:1,自引:0,他引:1  
We describe the design of a flexible load balancing framework and runtime software system for supporting the development of adaptive applications on distributed-memory parallel computers. The runtime system supports a global namespace, transparent object migration, automatic message forwarding and routing, and automatic load balancing. These features can be used at the discretion of the application developer in order to simplify program development and to eliminate complex bookkeeping associated with mobile data objects. An evaluation of this system in the context of a three-dimensional tetrahedral advancing front parallel mesh generator shows that overall runtime improvements of 15 percent compared to common stop-and-repartition load balancing methods, 30 percent compared to explicit intrusive load balancing methods, and 42 percent compared to no load balancing are possible on large processor configurations. At the same time, the overheads attributable to the runtime system are a fraction of 1 percent of the total runtime. The parallel advancing front method is a coarse-grained and highly adaptive application and therefore exercises all of the features of the runtime system.  相似文献   

11.
Mesh adaption is a powerful tool for efficient unstructured-grid computations but causes load imbalance among processors on a parallel machine. We present a novel method calledPLUMto dynamically balance the processor workloads with a global view. This paper describes the implementation and integration of all major components within our dynamic load balancing strategy for adaptive grid calculations. Mesh adaption, repartitioning, processor assignment, and remapping are critical components of the framework that must be accomplished rapidly and efficiently so as not to cause a significant overhead to the numerical simulation. A data redistribution model is also presented that predicts the remapping cost on the SP2. This model is required to determine whether the gain from a balanced workload distribution offsets the cost of data movement. Results presented in this paper demonstrate thatPLUMis an effective dynamic load balancing strategy which remains viable on a large number of processors.  相似文献   

12.
In modern parallel adaptive mesh computations the problem size varies during simulation. In this study we investigate the comparative behavior of four load balancing algorithms when the number of processors is dynamically changed during the lifetime of a multistage parallel computation. The focus is on communication and data movement overheads, total parallel runtime and total resource consumption. We demonstrate the main ideas for the case of six adaptive mesh refinement (AMR) applications with different kinds of growth patterns. The results presented are for a 32 processor Intel cluster connected by Ethernet.  相似文献   

13.
A repartitioning hypergraph model for dynamic load balancing   总被引:1,自引:0,他引:1  
In parallel adaptive applications, the computational structure of the applications changes over time, leading to load imbalances even though the initial load distributions were balanced. To restore balance and to keep communication volume low in further iterations of the applications, dynamic load balancing (repartitioning) of the changed computational structure is required. Repartitioning differs from static load balancing (partitioning) due to the additional requirement of minimizing migration cost to move data from an existing partition to a new partition. In this paper, we present a novel repartitioning hypergraph model for dynamic load balancing that accounts for both communication volume in the application and migration cost to move data, in order to minimize the overall cost. The use of a hypergraph-based model allows us to accurately model communication costs rather than approximate them with graph-based models. We show that the new model can be realized using hypergraph partitioning with fixed vertices and describe our parallel multilevel implementation within the Zoltan load balancing toolkit. To the best of our knowledge, this is the first implementation for dynamic load balancing based on hypergraph partitioning. To demonstrate the effectiveness of our approach, we conducted experiments on a Linux cluster with 1024 processors. The results show that, in terms of reducing total cost, our new model compares favorably to the graph-based dynamic load balancing approaches, and multilevel approaches improve the repartitioning quality significantly.  相似文献   

14.
This paper presents an external parallelization of Constraint Programming (CP) search tree mixing both static and dynamic partitioning. The principle of the parallelization is to partition the CP search tree into a set of sub-trees, then assign each sub-tree to one computing core in order to perform a local search using a sequential CP solver. In this context, static partitioning consists of decomposing the CP variables domains in order to split the CP search tree into a set of disjoint sub-trees to assign them to the cores. This strategy performs well without adding an extra cost to the parallel search, but the problem is the load imbalance between computing cores. On the other hand, dynamic partitioning is based on preservation of the search state to generate, dynamically or on demand, the sub-trees that are assigned to the cores. This strategy offers good load balancing between the different computing cores, but computing overcosts appear due to the initialisation of the search when a sub-tree is migrated from one core to another. In this paper, we propose a new partitioning strategy that mixes the static and dynamic partitioning and enjoys the benefits of each strategy. This mixed partitioning is designed to run on shared and distributed memory architectures. The performances obtained are illustrated by solving the CP problems modelled using the FlatZinc format and solved using the Google OR-Tools solver on top of the parallel Bobpp framework.  相似文献   

15.
Adaptive mesh refinement (AMR) is a type of multiscale algorithm that achieves high resolution in localized regions of dynamic, multidimensional numerical simulations. One of the key issues related to AMR is dynamic load balancing (DLB), which allows large-scale adaptive applications to run efficiently on parallel systems. In this paper, we present an efficient DLB scheme for structured AMR (SAMR) applications. This scheme interleaves a grid-splitting technique with direct grid movements (e.g., direct movement from an overloaded processor to an underloaded processor), for which the objective is to efficiently redistribute workload among all the processors so as to reduce the parallel execution time. The potential benefits of our DLB scheme are examined by incorporating our techniques into a SAMR cosmology application, the ENZO code. Experiments show that by using our scheme, the parallel execution time can be reduced by up to 57% and the quality of load balancing can be improved by a factor of six, as compared to the original DLB scheme used in ENZO.  相似文献   

16.
Unstructured meshes are used in many engineering applications with irregular domains, from elastic deformation problems to crack propagation to fluid flow. Because of their complexity and dynamic behavior, the development of scalable parallel software for these applications is challenging. The Charm++ Parallel Framework for Unstructured Meshes allows one to write parallel programs that operate on unstructured meshes with only minimal knowledge of parallel computing, while making it possible to achieve excellent scalability even for complex applications. Charm++’s message-driven model enables computation/communication overlap, while its run-time load balancing capabilities make it possible to react to the changes in computational load that occur in dynamic physics applications. The framework is highly flexible and has been enhanced with numerous capabilities for the manipulation of unstructured meshes, such as parallel mesh adaptivity and collision detection.  相似文献   

17.
谢妍  涂斌  卢本卓  张林波 《软件学报》2013,24(S2):110-117
说明如何利用并行自适应有限元软件平台PHG 求解生物分子溶液体系的非线性Poisson-Boltzmann方程,并介绍一种解决这类问题的方法,它将网格生成与自适应计算过程结合在一起,可自动产生合适的网格,避免复杂的曲面网格生成步骤.之前的网格生成工作有:(1) TMSmesh生成高斯曲面的三角网格; (2) TransforMesh删除自相交的三角网格; (3) ISO2Mesh提高表面网格质量3个步骤.而基于PHG的自适应加密模块可以在逐次调整网格的同时保持动态负载平衡,高效地得到计算网格用于近似求解非线性Poisson-Boltzmann方程.计算了小球模型和AChE系统,分别从误差指示子下降阶和溶剂化能收敛的角度验证了方法的有效性,并且还将网格生成算法成功地应用于gA离子通道.  相似文献   

18.
重叠网格技术广泛应用在复杂外型和运动边界问题的流场数值模拟中.本文在并行重叠网格隐式挖洞算法实现的基础上,提出了笛卡尔辅助网格和多块结构网格的混合重叠网格方法.通过笛卡尔辅助网格实现重叠网格洞边界和网格插值关系的快速建立.通过定义重叠区域网格权重、部件网格与背景网格绑定的方法,建立了混合网格的并行分配模式,有效减少重叠插值信息在各进程间的通信,实现计算负载和通信负载在各个进程的均匀分配.测试表明该方法可应用于数千万量级的重叠网格系统,可扩展至千核规模,高效的实现多个物体构成的复杂网格系统的重叠关系建立.  相似文献   

19.
文章针对三维分子动力学并行数值模拟中出现的负载不平衡现象,在静态负载平衡基础上,提出了一种简单有效的动态负载平衡算法。通过对三维分子动力学的并行数值模拟试验,此算法可以使得负载基本达到动态平衡,并进一步提高了并行效率。  相似文献   

20.
We present a parallel algorithm for static and dynamic partitioning of unstructured FEM-meshes. The method consists of two parts. First a fast but inaccurate sequential clustering is determined which is used, together with a simple mapping heuristic, to map the mesh initially onto the processors of a parallel system. The second part of the method uses a massively parallel algorithm to remap and optimize the mesh decomposition, taking several cost functions into account which reflect the characteristics of the underlying hardware and the requirements of the numerical solution method supposed to run after the decomposition. The parallel algorithm first calculates the number of nodes that have to be migrated between pairs of clusters in order to obtain an optimal load balancing. In a second step, nodes to be migrated are chosen according to cost functions optimizing the amount of necessary communication and the shapes of subdomains. The latter criterion is extremely important for the convergence behavior of certain numerical solution methods, especially for preconditioned conjugate gradient methods. The parallel parts of the method are implemented in C under Parix to run on the Parsytec GC systems. Results on up to 64 processors are presented and compared to those of other existing methods. © 1998 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号