首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 31 毫秒
《Parallel Computing》1997,22(11):1477-1492
Cluster-based computing, which exploits the aggregate power of a network of workstations, has drawn increasing attention from the parallel processing community. The main problem with this computing environment is the permanently changing workload of individual workstations which makes the efficiency and the execution time of parallel applications unpredictable. In this paper, we introduce an efficient load balancing scheme which aims at dynamically balancing the workload of data parallel applications in this computing environment. Simulation and experimental studies of our load balancing strategy are performed under various load situations and it is shown that it can effectively balance the workload among the workstations involved. Further, it was shown that a significant improvement in computing performance can be achieved when using our load balancing strategy as compared to the case where no load balancing is applied, particularly under a heavily loaded system.  相似文献   

在很多应用中都出现负载平衡的问题,尤其是负载平衡在并行分布式计算系统中起到不同寻常的作用.以工作站机群为代表的网络计算环境是当前并行计算和分布式系统的研究重点之一,解决异构性问题和动态负载平衡是使用机群进行网络并行计算的关键.本文对并行计算中的动态负载平衡问题进行了分析并提出了一些解决办法.  相似文献   

Clusters of workstations are emerging as an important architecture. Programming tools that aid in distributing applications on workstation clusters must address problems of mapping the application, heterogeneity and maximizing system utilization in the presence of varying resource availability. Both computation and communication capabilities may vary with time due to other applications competing for resources, so dynamic load balancing is a key requirement. For greatest benefit, the tool must support a relatively wide class of applications running on clusters with a range of computation and communication capabilities. We have developed a system that supports dynamic load balancing of distributed applications consisting of parallelized DOALL and DOACROSS loops. The focus of the paper is on how the system automatically determines key load balancing parameters using run-time information and information provided by programming tools such as a parallelizing compiler. The parameters discussed are the grain size of the application, the frequency of load balancing, and the parameters that control work movement. Our results are supported by measurements on an implementation for the Nectar system at Carnegie Mellon University and by simulation. © 1997 by John Wiley & Sons, Ltd.  相似文献   

Modern distributed systems consisting of powerful workstations and high-speed interconnection networks are an economical alternative to special-purpose supercomputers. The technical issues that need to be addressed in exploiting the parallelism inherent in a distributed system include heterogeneity, high-latency communication, fault tolerance and dynamic load balancing. Current software systems for parallel programming provide little or no automatic support towards these issues and require users to be experts in fault-tolerant distributed computing. The Paralex system is aimed at exploring the extent to which the parallel application programmer can be liberated from the complexities of distributed systems. Paralex is a complete programming environment and makes extensive use of graphics to define, edit, execute, and debug parallel scientific applications. All of the necessary code for distributing the computation across a network and replicating it to achieve fault tolerance and dynamic load balancing is automatically generated by the system. In this paper we give an overview of Paralex and present our experiences with a prototype implementation  相似文献   

由于个人机和工作站良好的性价比以及网络速度的加快,利用机群系统进行高性能计算成为一个热点。在这种异构计算环境中,了解负载信息的变化对于实现负载平衡是必不可少的。该文提出了基于移动代理进行负载监控,为实现异构计算环境中分布并行应用的负载平衡提供了重要信息。这种模式具有良好的移植性、可扩展性、灵活性。  相似文献   

燃烧数值模拟计算通常采用非结构网格模拟计算区域。在非结构网格上进行并行模拟计算时,其自适应方式使得不同进程上的计算负载频繁变动,且差异巨大,导致并行计算效率低下。为了提高并行计算的效率,一个有效的方法是采用动态负载平衡技术。提出一种针对燃烧的化学反应状态的动态负载平衡方法,该方法采用不同策略对化学反应不同阶段各进程上的计算负载进行预测,根据预测结果平均进程间的计算任务,达到负载平衡。实验分析表明,该方法能有效地降低进程间的负载不平衡程度,使得模拟计算的总体运行时间降低了10%。  相似文献   

《Parallel Computing》2014,40(5-6):86-99
Simulation of in vivo cellular processes with the reaction–diffusion master equation (RDME) is a computationally expensive task. Our previous software enabled simulation of inhomogeneous biochemical systems for small bacteria over long time scales using the MPD-RDME method on a single GPU. Simulations of larger eukaryotic systems exceed the on-board memory capacity of individual GPUs, and long time simulations of modest-sized cells such as yeast are impractical on a single GPU. We present a new multi-GPU parallel implementation of the MPD-RDME method based on a spatial decomposition approach that supports dynamic load balancing for workstations containing GPUs of varying performance and memory capacity. We take advantage of high-performance features of CUDA for peer-to-peer GPU memory transfers and evaluate the performance of our algorithms on state-of-the-art GPU devices. We present parallel efficiency and performance results for simulations using multiple GPUs as system size, particle counts, and number of reactions grow. We also demonstrate multi-GPU performance in simulations of the Min protein system in E. coli. Moreover, our multi-GPU decomposition and load balancing approach can be generalized to other lattice-based problems.  相似文献   

异构HPL(High-performance Linpack)效率的提高需要充分发挥加速部件和通用CPU计算能力,加速部件集成了更多的计算核心,负责主要的计算,通用CPU负责任务调度的同时也参与计算.在合理划分任务,平衡负载的前提下,优化CPU端计算性能对整体效率的提升尤为重要.针对具体平台体系结构特点对BLAS(Basic linear Algebra Subprograms)函数进行优化往往可以更加充分的利用通用CPU计算能力,提高系统整体效率.BLIS(BLAS-like Library Instantiation Software)算法库是开源的BLAS函数框架,具有易开发、易移植和模块化等优点.本文基于异构系统平台体系结构以及HPL算法特点,充分利用三级缓存、向量化指令和多线程并行等技术手段优化CPU端调用的各级BLAS函数,应用auto-tuning技术优化矩阵分块参数,从而形成了HygonBLIS算法库,与MKL相比,异构环境下HPL整体性能提高了11.8%.  相似文献   

异构HPL(high-performance Linpack)效率的提高需要充分发挥加速部件和通用CPU计算能力,加速部件集成了更多的计算核心,负责主要的计算,通用CPU负责任务调度的同时也参与计算.在合理划分任务、平衡负载的前提下,优化CPU端计算性能对整体效率的提升尤为重要.针对具体平台体系结构特点对BLAS(ba...  相似文献   

为解决网络计算平台异构性问题,目前的趋势是分布式技术和并行技术的结合。动态负载平衡是使用机群进行网络并行计算的关键,而获取节点的负载信息是实现动态负载平衡的前提。给出了一种利用移动代理技术获取节点负载信息的方法,该法不仅可实现上述平衡,且可大大减少网络并行计算中的通讯开销。  相似文献   

Load balancing involves assigning to each processor work proportional to its performance, thereby minimizing the execution time of a program. Although static load balancing can solve many problems (e.g., those caused by processor heterogeneity and nonuniform loops) for most regular applications, the transient external load due to multiple users on a network of workstations necessitates a dynamic approach to load balancing. In this paper we show that different load balancing schemes are best for different applications under varying program and system parameters. Therefore, application-driven customized dynamic load balancing becomes essential for good performance. We present a hybrid compile-time and run-time modeling and decision process which selects (customizes) the best scheme, along with automatic generation of parallel code with calls to a run-time library for load balancing.  相似文献   

基于规则的分层负载平衡调度模型   总被引:13,自引:0,他引:13  
On a massively parallel and distributed system and a network of workstations system, it is a critical problem to increase the utilization efficiency of resources and the answer speed of tasks by using effective load balancing scheduling strategy. This paper analyzes the scheduling strategy of dynamic load balancing and static load balancing,and then proposes a hierarchical load balancing scheduling model based on rules. Finally,making somecomparisons with Other scheduling models.  相似文献   

The main objective of this study is to transform a network of workstations into a load balanced distributed computing system (LBDCS). LBDCS is to improve the performance of generally underutilized timeshared workstations and highly CPU intensive independent or parallel applications. It affects the initial placement of the tasks and task migrations later during their executions. One of the important implementation features of LBDCS is that it does not use any intermediary such as PVM (parallel virtual machine) or MPI (message passing interface) for inter-task communication. It defines various metrics to characterize the level of load and dynamically monitors the system and applications to detect the load imbalances. The employed load balancing algorithm makes use of predicted load indices which are computed as weighted averages of the past system and application loads. Performance analysis of the system has been conducted using a number of hypothetical applications and two simple real life applications (in this case matrix multiplication and merge-sort). Hypothetical applications provide flexibility for testing the system under tunable application conditions. Using load balancing, an average speedup and efficiency close to 70% of their theoretical upper bounds are observed for different applications. Copyright © 1999 John Wiley & Sons, Ltd.  相似文献   

国家气象局天气组网雷达定量估测降水系统不仅拥有较大的计算量,而且具有较大的数据吞吐量,同时对实时性要求较高。如果缩短其执行时间,无疑将会带来巨大的收益。鉴于这些特点,使用VTune Amplifer XE对串行程序进行了热点分析和并行性分析,得出程序中有较多线程级并行性,从而制定了相应的并行化方案;然后使用Win32多线程和OpenMP两种技术对该程序在Intel四核处理器平台上进行了并行化。程序主要由单站处理和组网处理两部分组成。由于计算资源的限制,并行后的单站处理程序只有大约10%的性能提升,而组网处理程序则可以达到近似线性的性能提升。通过调整计算负载,并行化版本的加速比可以达到5.5。最后,可以得出该并行化方法适用于计算密集且数据吞吐量较大的一类应用。  相似文献   

一种适用于机群系统的任务动态调度方法*   总被引:21,自引:1,他引:21  
傅强  郑纬民 《软件学报》1999,10(1):19-23
任务调度是机群系统上实现并行计算需要解决的重要问题之一.对于在运行中动态产生任务的并行应用程序,由于很难作出准确的任务分配决策,可能导致各个计算结点的任务负载失衡,最终引起整个系统的性能显著下降.因此,需要通过任务再分配来维持负载平衡.该文提出一种任务分配与再分配方法,它通过尽量延迟任务的执行开始时刻,在任务再分配时避免了进程迁移,使得引入的调度开销很小.分析和实验结果表明,该方法在许多情况下能够有效地提高并行程序的运行性能.  相似文献   

Tools to support mesh adaptation on massively parallel computers   总被引:1,自引:0,他引:1  
The scalable execution of parallel adaptive analyses requires the application of dynamic load balancing to repartition the mesh into a set of parts with balanced work load and minimal communication. As the adaptive meshes being generated reach billions of elements and the analyses are performed on massively parallel computers with 100,000??s of computing cores, a number of complexities arise that need to be addressed. This paper presents procedures developed to deal with two of them. The first is a procedure to support multiple parts per processor which is used as the mesh increases in size and it is desirable to partition the mesh to a larger number of computing cores than are currently being used. The second is a predictive load balancing method that sets entity weights before dynamic load balancing steps so that the mesh is well balanced after the mesh adaptation step thus avoiding excessive memory spikes that would otherwise occur during mesh adaptation.  相似文献   

Hamdi  Mounir  Pan  Yi  Hamidzadeh  B.  Lim  F. M. 《The Journal of supercomputing》1999,13(2):111-132
Parallel computing on clusters of workstations is receiving much attention from the research community. Unfortunately, many aspects of parallel computing over this parallel computing engine is not very well understood. Some of these issues include the workstation architectures, the network protocols, the communication-to-computation ratio, the load balancing strategies, and the data partitioning schemes. The aim of this paper is to assess the strengths and limitations of a cluster of workstations by capturing the effects of the above issues. This has been achieved by evaluating the performance of this computing environment in the execution of a parallel ray tracing application through analytical modeling and extensive experimentation. We were successful in illustrating the effect of major factors on the performance and scalability of a cluster of workstations connected by an Ethernet network. Moreover, our analytical model was accurate enough to agree closely with the experimental results. Thus, we feel that such an investigation would be helpful in understanding the strengths and weaknesses of an Ethernet cluster of workstation in the execution of parallel applications.  相似文献   

通过对机群系统中的动态负载平衡算法的研究,解决任务再分配时由于进程迁移而引起额外开销较大的问题,提出了一个有效的动态负载平衡算法。通过实验结果分析,可以证明此算法能够提高并行程序的运行性能。  相似文献   

BLAS (basic linear algebra subprograms)是最基本、最重要的底层数学库之一.在一个标准的BLAS库中,BLAS 3级函数涵盖的矩阵-矩阵运算尤为重要,在许多大规模科学与工程计算应用中被广泛调用.另外, BLAS 3级属于计算密集型函数,对充分发挥处理器的计算性能有至关重要的作用.针对国产SW26010-Pro处理器研究BLAS 3级函数的众核并行优化技术.具体而言,根据SW26010-Pro的存储层次结构,设计多级分块算法,挖掘矩阵运算的并行性.在此基础上,基于远程内存访问(remote memory access, RMA)机制设计数据共享策略,提高从核间的数据传输效率.进一步地,采用三缓冲、参数调优等方法对算法进行全面优化,隐藏直接内存访问(direct memory access, DMA)访存开销和RMA通信开销.此外,利用SW26010-Pro的两条硬件流水线和若干向量化计算/访存指令,还对BLAS 3级函数的矩阵-矩阵乘法、矩阵方程组求解、矩阵转置操作等若干运算进行手工汇编优化,提高了函数的浮点计算效率.实验结果显示,所提出的并行优化技术...  相似文献   

We describe a compiler and run-time system that allow data-parallel programs to execute on a network of heterogeneous UNIX workstations. The programming language supported is Dataparallel C, a SIMD language with virtual processors and a global name space. This parallel programming environment allows the user to take advantage of the power of multiple workstations without adding any message-passing calls to the source program. Because the performance of Individual workstations in a multi-user environment may change during the execution of a Dataparallel C program, the run-time system automatically performs dynamic load balancing. We present experimental results that demonstrate the usefulness of dynamic load-balancing In a multi-user environment These results suggest that initially allocating the same amount of work to each processor and letting the dynamic load balancing algorithm adjust the load during program execution yields very good performance. Hence neither the compiler nor the run-time system need a priori knowledge of the speeds of the machines that will participate in a program execution.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号