首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
With the proliferation of workstation clusters connected by high-speed networks, providing efficient system support for concurrent applications engaging in nontrivial interaction has become an important problem. Two principal barriers to harnessing parallelism are: (1) efficient mechanisms that achieve transparent dependency maintenance while preserving semantic correctness and (2) scheduling algorithms that match coupled processes to distributed resources while explicitly incorporating their communication costs. This paper describes a set of performance features and their properties and implementation in a system support environment called DUNES that achieves transparent dependency maintenance—IPC, file access, memory access, process creation/termination, process relationships—under dynamic load balancing. The two principal performance features are push/pull-based active and passive end-point caching and communication-sensitive load balancing. Collectively, they mitigate the overhead introduced by the transparent dependency maintenance mechanisms. Communication-sensitive load balancing, in addition, affects the scheduling of distributed resources to application processes where both communication and computation costs are explicitly taken into account. DUNES' architecture endows commodity operating systems with distributed operating system functionality while achieving transparency with respect to their existing application base. DUNES also preserves semantic correctness with respect to single processor semantics. We show performance measurements of a UNIX-based implementation on Sparc and x86 architectures over high-speed LAN environments. We show that significant performance gains in terms of system throughput and parallel application speedup are achievable.  相似文献   

2.
Dynamic balancing of computation and communication load is vital for the execution stability and performance of distributed, parallel simulations deployed on the shared, unreliable resources of large-scale environments. High Level Architecture (HLA) based simulations can experience a decrease in performance due to imbalances that are produced initially and/or during run time. These imbalances are generated by the dynamic load changes of distributed simulations or by unknown, non-managed background processes resulting from the non-dedication of shared resources. Due to the dynamic execution characteristics of elements that compose distributed applications, the computational load and interaction dependencies of each simulation entity change during run time. These dynamic changes lead to an irregular load and communication distribution, which increases overhead of resources and latencies. A static partitioning of load is limited to deterministic applications and is incapable of predicting the dynamic changes caused by distributed applications or by external background processes. Therefore, a scheme for balancing the communication and computational load during the execution of distributed simulations is devised in a scalable hierarchical architecture. The proposed balancing system employs local and cluster monitoring mechanisms in order to observe the distributed load changes and identify imbalances, repartitioning policies to determine a distribution of load and minimize imbalances. A migration technique is also employed by this proposed balancing system to perform reliable and low-latency load transfers. Such a system successfully improves the use of shared resources and increases distributed simulations’ performance by minimizing communication latencies and partitioning the load evenly. Experiments and comparative analyses were conducted in order to identify the gains that the proposed balancing scheme provides to large-scale distributed simulations.  相似文献   

3.
Load balancing algorithms are designed essentially to equally distribute the load on processors and maximize their utilities while minimizing the total task execution time. In order to achieve these goals, the load-balancing mechanism should be “fair” in distributing the load across the different processors. This implies that the difference between the heaviest-loaded and the lightest-loaded processors should be minimized. Therefore, the load information on each processor must be updated such that the load-balancing mechanism can be more effective. In this work, we present an application independent dynamic algorithm for scheduling tasks and load- balancing in message passing systems. We propose a DAG-based Dynamic Load Balancing algorithm for Real time applications (DAG-DLBR) that is designed to work dynamically to cope with possible changes in the load that might occur during runtime. This algorithm addresses the challenge of devising a load balancing scheme which judicially deals with the hybrid execution of existing real-time application (represented by a Direct Acyclic Graph (DAG)) together with newly arriving jobs. The main objective of this algorithm is to reduce response times of the newly arriving jobs while maintaining the time constrains of the existing DAG. To evaluate the performance of the DAG-DLBR algorithm, a comparison with the performance of two common dynamic load balancing algorithms is presented. This comparison is performed by evaluating, experimentally, the execution time of different load balancing algorithms on a homogenous real parallel machine. In addition, the values of load imbalance, the execution time, and the communication overhead time are evaluated analytically using different benchmarks as test-bed workloads. These workloads cover a wide range of dynamic applications with different task types. Experimental results illustrate the improved performance of the DAG-DLBR algorithm compared to both distributed and hierarchal based algorithms by at least 12 and 19%, respectively. This improvement is true for all workloads, even with highly dependent workload. The DAG-DLBR algorithm achieves lower computation time than its corresponding values of both the distributed and the hierarchical-based algorithms for 4, 8, 12 and 16 processors.  相似文献   

4.
高性能集群工作方式越来越受到人们的关注。通常集群是一组通过网络连接的多个异构的计算机系统。在集群工作模式下,一个非常重要的问题就是要确保负载量的均衡。由于目前的负载均衡系统大多只支持同构集群环境,且均衡粒度为作业级,过于粗糙,所以不能很好的适用于并行程序中并行任务的均衡。本文提出了一种并行程序的开发框架,使用移动Agent技术解决任务的动态迁移性,为程序员提供了一个简单的开发接口,大大地简化了他们的工作。系统采用java和Aglet平台开发而成。实验表明,该系统灵活有效。  相似文献   

5.
Unstructured meshes are used in many engineering applications with irregular domains, from elastic deformation problems to crack propagation to fluid flow. Because of their complexity and dynamic behavior, the development of scalable parallel software for these applications is challenging. The Charm++ Parallel Framework for Unstructured Meshes allows one to write parallel programs that operate on unstructured meshes with only minimal knowledge of parallel computing, while making it possible to achieve excellent scalability even for complex applications. Charm++’s message-driven model enables computation/communication overlap, while its run-time load balancing capabilities make it possible to react to the changes in computational load that occur in dynamic physics applications. The framework is highly flexible and has been enhanced with numerous capabilities for the manipulation of unstructured meshes, such as parallel mesh adaptivity and collision detection.  相似文献   

6.
Load balancing is a key issue in the development of parallel algorithms with irregular structures. Existing load balancing systems each support only one specific programming paradigm and thus are of limited use. The system VDS presented here allows concurrent use of various paradigms such as fork-join, weighted tasks, and static dags (directed acyclic graphs that are known in advance). The system provides visual performance evaluation tools to facilitate the efficient application of the system. VDS supports various communication interfaces including PVM and MPI. Thus, VDS-applications can be run on architectures ranging from workstation clusters to massively parallel systems.  相似文献   

7.
Modern distributed systems consisting of powerful workstations and high-speed interconnection networks are an economical alternative to special-purpose supercomputers. The technical issues that need to be addressed in exploiting the parallelism inherent in a distributed system include heterogeneity, high-latency communication, fault tolerance and dynamic load balancing. Current software systems for parallel programming provide little or no automatic support towards these issues and require users to be experts in fault-tolerant distributed computing. The Paralex system is aimed at exploring the extent to which the parallel application programmer can be liberated from the complexities of distributed systems. Paralex is a complete programming environment and makes extensive use of graphics to define, edit, execute, and debug parallel scientific applications. All of the necessary code for distributing the computation across a network and replicating it to achieve fault tolerance and dynamic load balancing is automatically generated by the system. In this paper we give an overview of Paralex and present our experiences with a prototype implementation  相似文献   

8.
Considering application behavior in graph partitioning is an arduous task because of the chicken-and-egg problem: the application behavior depends on how the graph is decomposed while achieving load balance requires the knowledge of how the application utilizes the underlying resources. Advances in multi-core processors further complicate the endeavor by introducing hardware diversity and intra-node contention. As an attempt to quantify performance for partitioning refinement, we propose a model that predicts execution times of iterative mesh-based applications running on heterogeneous multi-core clusters. Apart from considering resource heterogeneity, the model takes into account hierarchical communication characteristics, overlap between computation and communication, as well as performance penalties due to intra-node contention. We present a detailed methodology on how to obtain key parameters from a real system and highlight potential pitfalls of conventional approaches in obtaining the parameters. Experiments were conducted using a synthetic application benchmark solving a partial differential equation. Evaluation shows a good agreement between actual time measurement and the performance model.  相似文献   

9.
PC clusters have become popular in parallel processing. They do not involve specialized interprocessor networks, so the latency of data communications is rather long. The programming models for PC clusters are often different than those for parallel machines or supercomputers containing sophisticated interprocessor communication networks. For PC clusters, load balancing among the nodes becomes a more critical issue in attempts to yield high performance. We introduce a new model for program development on PC clusters, namely, the super-programming model (SPM). The workload is modeled as a collection of super-instructions (SIs). We propose that a set of SIs be designed for each application domain. They should constitute an orthogonal set of frequently used high-level operations in the corresponding application domain. Each SI should normally be implemented as a high-level language routine that can execute on any PC. Application programs are modeled as super-programs (SPs), which are coded using SIs. SIs are dynamically assigned to available PCs at runtime. Because of the known granularity of SIs, an upper bound on their execution time can be estimated at static time. Therefore, dynamic load balancing becomes an easier task. Our motivation is to support dynamic load balancing and code porting, especially for applications with diverse sets of inputs such as data mining. We apply here SPM to the implementation of an a priori-like algorithm for mining association rules. Our experiments show that the average idle time per node is kept very low.  相似文献   

10.
谭鹤毅 《测控技术》2017,36(6):109-111
针对分布式多核节点系统的负载均衡难以取得最优解的问题,提出了一种基于改进极值优化的负载均衡方法.该方法通过节点的CPU占用率发现负载不均衡情况,然后用一个衡量模型估计计算与通信开销使改进的极值优化方法能够实现集群的负载均衡.仿真与实验结果表明该算法能够提高分布式集群的计算效率,是一种理想的负载均衡算法.  相似文献   

11.
系统虚拟化技术可以动态再配置应用环境的计算资源.当前动态资源配置方法主要关注于保证具有动态负载应用的服务质量.这些方法受应用性能驱动并常常增加资源控制系统的响应延迟.针对上述问题提出了以资源使用状态为驱动的资源再配置方法(resource-use-status-drivenresource reconfigurationscheme,RUSiC),自动适应动态负载变化来满足应用性能的资源需求.RUSiC被设计成2层架构的资源再配置模型,基于实时的资源使用状态,及时用较小的开销为应用调整合适的资源配置.同时,RUSiC也考虑到电能的有效使用,在资源的新配置中通过尽可能减少激活物理节点的数量避免大量不必要的电能消耗及相关的冷却费用.实验数据表明,在应用负载变化时,RUSiC能快速监测并响应变化的资源需求,并在保证应用性能的前提下,使用较小数量的激活物理节点.  相似文献   

12.
基于多区结构网格的计算流体力学方法,在并行处理的难点是多个网格数据块在计算资源上的高效合理分配,以实现大规模并行环境下的负载平衡。本文围绕负载平衡问题,介绍了 CCFD 软件开展的一些工作,包括:1. 面向结构网格的双层图剖分策略,通过细层图剖分环节考虑计算量和通信量的负载平衡;2. 建立可细分的重叠网格体系,并基于该体系建立了重叠网格系统的双级负载平衡模型。算例验证表明,所采用的负载平衡策略在大规模并行环境下能获得较高并行效率。  相似文献   

13.
A repartitioning hypergraph model for dynamic load balancing   总被引:1,自引:0,他引:1  
In parallel adaptive applications, the computational structure of the applications changes over time, leading to load imbalances even though the initial load distributions were balanced. To restore balance and to keep communication volume low in further iterations of the applications, dynamic load balancing (repartitioning) of the changed computational structure is required. Repartitioning differs from static load balancing (partitioning) due to the additional requirement of minimizing migration cost to move data from an existing partition to a new partition. In this paper, we present a novel repartitioning hypergraph model for dynamic load balancing that accounts for both communication volume in the application and migration cost to move data, in order to minimize the overall cost. The use of a hypergraph-based model allows us to accurately model communication costs rather than approximate them with graph-based models. We show that the new model can be realized using hypergraph partitioning with fixed vertices and describe our parallel multilevel implementation within the Zoltan load balancing toolkit. To the best of our knowledge, this is the first implementation for dynamic load balancing based on hypergraph partitioning. To demonstrate the effectiveness of our approach, we conducted experiments on a Linux cluster with 1024 processors. The results show that, in terms of reducing total cost, our new model compares favorably to the graph-based dynamic load balancing approaches, and multilevel approaches improve the repartitioning quality significantly.  相似文献   

14.
集群动态负载平衡系统的性能评价   总被引:18,自引:0,他引:18  
唐丹  金海  张永坤 《计算机学报》2004,27(6):803-811
该文使用随机Petri网对集群动态负载平衡系统建立了一个抽象模型.通过细化模型中的节点本地处理部分对5种动态负载平衡算法的性能进行了分析,并讨论了集群负载特性对动态负载平衡系统性能的影响,最后得出的主要结论有:(1)动态负载平衡算法可以取得比静态负载平衡算法更好的性能;(2)与传统的只考虑CPU就绪队列的负载平衡算法相比,考虑了各种I/O请求队列的负载平衡算法可以取得更好的性能;(3)即使在极端的集群负载特性中。集群动态负载平衡算法仍然能取得比较理想的性能,因此实现即使是十分简单的集群动态负载平衡系统也是很有必要的。  相似文献   

15.
In this paper, a new adaptive scheme is presented for dynamic load balancing in a message-passing multicomputer. The scheme is based on using easy-to-implement heuristics and adaptive threshold in balancing the system load among dispersed nodes. It uses a distributed control over all computer nodes as coordinated by an information collector. Four heuristic methods are presented here, which are distinguished by the ranges for location and threshold update policies and by the disciplines used for determining the load transfer destination. A parallel simulator with distributed load balancers is developed on an iPSC/2 hypercube multi-computer. The load balancing scheme is evaluated on the basis of the effects of system utilization, load imbalance, communication and migration overhead, and multicomputer size. Relative merits of the four methods are revealed under various physical configurations of the multicomputer. The application potentials are discussed for parallel execution of AI-oriented programs and distributed semantic network data bases.  相似文献   

16.
一种支持分布式进程迁移的动态负载平衡征募算法的研究   总被引:1,自引:0,他引:1  
负载平衡是分布式系统必须考虑的问题,本文介绍的征募算法独立于网络拓扑结构,其思想可以应用到分布式系统中,征募算法的设计思想向传统负载平衡算法提出了挑战,它不但克服了投标算法的缺点,而且在减小通讯开销和提高处理机利用率两方面作了很多努力,使其成为一种高效的分布式进程迁移和动态负载平衡策略。我们在分布式UNIX系统上实现并验证了征募算法的高效性。  相似文献   

17.
乐观策略下并行离散事件仿真动态负载划分优化算法   总被引:4,自引:0,他引:4  
动态负载划分是提高并行离散事件仿真运行性能的有效途径之一.现有研究往往孤立地考虑计算负载平衡和通信负载优化,使得复杂应用背景下整体性能低下.论文综合考虑仿真模型计算负载和交互模式,提出了一个基于带权重无向图有限容量k划分问题的并行离散事件仿真负载划分模型,并配合一套通用的仿真运行性能度量方法,提出了一个基于顶点交换的启发式局部搜索近似划分算法,实现了在计算负载平衡的前提下系统通信负载最优化,其近似解与全局最优解比值不小于(1-1/|N|)(1-ε).实验证明了该动态负载划分算法的有效性和实用性.  相似文献   

18.
基于交易服务的一个动态负载平衡模型   总被引:3,自引:0,他引:3  
CORBA是当前被广泛采用的分布式对象平台,它成功地解决了异种平台的互操作问题。为了提高整个基于CORBA的分布式系统的性能,负载平衡成为CORBA应当考虑的问题。本文给出了一个基于交易服务的通用动态负载平衡模型。它不仅实现了CORBA负载平衡,而且其对交易服务的扩展对用户透明,原有应用可以直接与之集成。文章最后给出了模型的性能的评价和分析。  相似文献   

19.
Dynamic load balancing schemes are significant for efficiently executing nonuniform problems in highly parallel multicomputer systems.The objective is to minimize the total exectuion time of single applications.This paper has proposed an ARID strategy for distributed dynamic load balancing.Its principle and control protocol are described,and te communication overhead,the effect on system stability and the performance efficiency are analyzed.Finally,simulation experiments are carried out to compare the adaptive strategy with other dynamic load balancing schemes.  相似文献   

20.
The parallel computation capabilities of modern graphics processing units (GPUs) have attracted increasing attention from researchers and engineers who have been conducting high computational throughput studies. However, current single GPU based engineering solutions are often struggling to fulfill their real-time requirements. Thus, the multi-GPU-based approach has become a popular and cost-effective choice for tackling the demands. In those cases, the computational load balancing over multiple GPU “nodes” is often the key and bottleneck that affect the quality and performance of the real-time system. The existing load balancing approaches are mainly based on the assumption that all GPU nodes in the same computer framework are of equal computational performance, which is often not the case due to cluster design and other legacy issues. This paper presents a novel dynamic load balancing (DLB) model for rapid data division and allocation on heterogeneous GPU nodes based on an innovative fuzzy neural network (FNN). In this research, a 5-state parameter feedback mechanism defining the overall cluster and node performance is proposed. The corresponding FNN-based DLB model will be capable of monitoring and predicting individual node performance under different workload scenarios. A real-time adaptive scheduler has been devised to reorganize the data inputs to each node when necessary to maintain their runtime computational performance. The devised model has been implemented on two dimensional (2D) discrete wavelet transform (DWT) applications for evaluation. Experiment results show that this DLB model enables a high computational throughput while ensuring real-time and precision requirements from complex computational tasks.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号