共查询到20条相似文献,搜索用时 10 毫秒
1.
I. Riakiotakis F. M. Ciorba T. Andronikos G. Papakonstantinou A. T. Chronopoulos 《Concurrency and Computation》2012,24(18):2302-2327
Loops are the richest source of parallelism in scientific applications. A large number of loop scheduling schemes have therefore been devised for loops with and without data dependencies (modeled as dependence distance vectors) on heterogeneous clusters. The loops with data dependencies require synchronization via cross‐node communication. Synchronization requires fine‐tuning to overcome the communication overhead and to yield the best possible overall performance. In this paper, a theoretical model is presented to determine the granularity of synchronization that minimizes the parallel execution time of loops with data dependencies when these are parallelized on heterogeneous systems using dynamic self‐scheduling algorithms. New formulas are proposed for estimating the total number of scheduling steps when a threshold for the minimum work assigned to a processor is assumed. The proposed model uses these formulas to determine the synchronization granularity that minimizes the estimated parallel execution time. The accuracy of the proposed model is verified and validated via extensive experiments on a heterogeneous computing system. The results show that the theoretically optimal synchronization granularity, as determined by the proposed model, is very close to the experimentally observed optimal synchronization granularity, with no deviation in the best case, and within 38.4% in the worst case. Copyright © 2012 John Wiley & Sons, Ltd. 相似文献
2.
Alejandro Acosta Vicente BlancoAuthor VitaeFrancisco AlmeidaAuthor Vitae 《Computers & Electrical Engineering》2013
Actual HPC systems are composed by multicore processors and powerful graphics processing units. Adapting existing code and libraries to these new systems is a fundamental problem due to the important increment on programming difficulties. The heterogeneity, both at architectural and programming levels at the same time, raises the programmability wall. The performance of the code is affected by the large interdependence between the code and the parallel architecture. We have developed a dynamic load balancing library that allows parallel code to be adapted to a wide variety of heterogeneous systems. The overhead introduced by our system is minimal and the cost to the programmer negligible. This system has been successfully applied to solve load imbalance problems appearing in homogeneous and heterogeneous multiGPU platforms. We consider the Dynamic Programming technique as case of study to validate our proposals using different heterogeneous scenarios in multiGPU systems. 相似文献
3.
《Advances in Engineering Software》2003,34(4):185-201
In the present work, the parallelization of the solution of a system of linear equations, in the framework of finite element computational analyses, is dealt with. As the substructuring method is used, the basic idea refers to a way of decomposing the considered spatial domain, discretized by the finite elements, into a finite set of non-overlapping subdomains, each assigned to an individual processor and computationally analysed in parallel. Considering the fact that Personal Computers and Work Stations are still the most frequently used computers, a parallel computational platform can be built by connecting the available computers into a computer network. The incorporated computers being usually of different computational power and memory size, the efficiency of parallel computations on such a heterogeneous distributed system depends mainly on proper load balance. To cope the balance problem, an algorithm for the efficient load balance for structured and free 2D quadrilateral finite element meshes based on the rearrangement of elements among respective sub-domains, has been developed. 相似文献
4.
LAN Youran 《计算机科学技术学报》1996,11(3):195-207
It is desirable in a distributed system to have the system load balanced evenly among the nodes so that the mean job response time is minimized.In this paper,we present a dynamic load balancing mechanism(DLB).It adopts a cntralized approach and is network topology independent.The DLB mechanism employs a set of threscholds which are automatically adjusted as the system load changes.It also provides a simple mechanism for the system to switch between periodic and instantaneous load balancing policies with ease.The performance of the proposed algorithm is evaluated by intensive simulations for various parameters.Te simulation results show that the mean job response time in a system implementing DLB algorithm is significantly lower than the same system without load balancings.Furthermore,compared with a previously proposed algorithm,DLB algorithm demonstrates improved performance,especially when the system is heavily loaded and the load is unevenly distributed. 相似文献
5.
《Journal of Parallel and Distributed Computing》2004,64(4):481-497
The distributed environments constitute a major option for the future development of high-performance computing. In order to be able to efficiently execute parallel applications on such systems, one should ensure a fair utilization of the available resources. Here, we address a number of aspects regarding the generalization of the diffusion algorithms for the case when the processors have different relative speeds and the communication parameters have different values. Although some work has been done in this direction, we propose complementary results and we investigate other variants than those commonly used. In a first step, we discuss general aspects of the generalized diffusion. Bounds are formulated for the convergence factor and an explicit expression is given for the migration flow generated by such algorithms. It is shown that this flow has an important property, that is a scaled projection of all other balancing flows. In the second part, a variant of generalized diffusion is investigated. Complexity results are formulated and it is shown that this algorithm theoretically converges faster than the hydrodynamic algorithm. Comparative tests between different variants of generalized diffusion algorithms are performed. 相似文献
6.
应用于高性能计算领域的通用GPU拥有强大的并行计算能力,以通用GPU作为主处理器的数据分析系统相较于传统数据库能够提供更好的性能。在大数据场景下,如何根据CPU和GPU的资源在处理器之间合理分配工作负载是亟待解决的问题。提出了一种CPU GPU异构数据分析系统上的负载均衡处理策略。该策略采用流水线模型将工作负载分解,基于流水线设计了负载均衡模型,将工作负载合理分配至异构处理器,减少系统总执行时间开销,实现了性能提升。实验结果表明,提出的基于流水线的负载均衡模型能适应不同查询请求下的不同数据量场景,具有良好的性能。 相似文献
7.
Adaptive mesh refinement (AMR) is a type of multiscale algorithm that achieves high resolution in localized regions of dynamic, multidimensional numerical simulations. One of the key issues related to AMR is dynamic load balancing (DLB), which allows large-scale adaptive applications to run efficiently on parallel systems. In this paper, we present an efficient DLB scheme for structured AMR (SAMR) applications. This scheme interleaves a grid-splitting technique with direct grid movements (e.g., direct movement from an overloaded processor to an underloaded processor), for which the objective is to efficiently redistribute workload among all the processors so as to reduce the parallel execution time. The potential benefits of our DLB scheme are examined by incorporating our techniques into a SAMR cosmology application, the ENZO code. Experiments show that by using our scheme, the parallel execution time can be reduced by up to 57% and the quality of load balancing can be improved by a factor of six, as compared to the original DLB scheme used in ENZO. 相似文献
8.
Andreu Moreno Anna Sikora Eduardo César Joan Sorribes Tomàs Margalef 《The Journal of supercomputing》2017,73(9):3738-3760
This work presents a new algorithm, called Heterogeneous Dynamic Pipeline Mapping, that allows for dynamically improving the performance of pipeline applications running on heterogeneous systems. It is aimed at balancing the application load by determining the best replication (of slow stages) and gathering (of fast stages) combination taking into account processors computation and communication capacities. In addition, the algorithm has been designed with the requirement of keeping complexity low to allow its usage in a dynamic tuning tool. For this reason, it uses an analytical performance model of pipeline applications that addresses hardware heterogeneity and which depends on parameters that can be known in advance or measured at run-time. A wide experimentation is presented, including the comparison with the optimal brute force algorithm, a general comparison with the Binary Search Closest algorithm, and an application example with the Ferret pipeline included in the PARSEC benchmark suite. Results, matching those of the best existing algorithms, show significant performance improvements with lower complexity (\(O(N^3\)), where N is the number of pipeline stages). 相似文献
9.
Heiner Ackermann Simon Fischer Martin Hoefer Marcel Sch?ngens 《Distributed Computing》2011,23(5-6):321-330
We consider a dynamic load balancing scenario in which users allocate resources in a non-cooperative and selfish fashion. The perceived performance of a resource for a user decreases with the number of users that allocate the resource. In our dynamic, concurrent model, users may reallocate resources in a round-based fashion. As opposed to various settings analyzed in the literature, we assume that users have quality of service demands. A user has zero utility when falling short of a certain minimum performance threshold and having positive utility otherwise. Whereas various load-balancing protocols have been proposed for the setting without quality of service requirements, we consider protocols that satisfy an additional locality constraint: The behavior of a user depends merely on the state of the resource it currently allocates. This property is particularly useful in scenarios where the state of other resources is not readily accessible. For instance, if resources represent channels in a mobile network, then accessing channel information may require time-intensive measurements. We consider several variants of the model, where the quality of service demands may depend on the user, the resource, or both. For all cases we present protocols for which the dynamics converge to a state in which all users are satisfied. More importantly, the time to reach such a state scales nicely. It is only logarithmic in the number of users, which makes our protocols applicable in large-scale systems. 相似文献
10.
Jose Luis Bosque Pablo Toharia Oscar D. Robles Luis Pastor 《The Journal of supercomputing》2013,65(3):1104-1113
This paper presents a load balancing algorithm specifically designed for heterogeneous clusters, composed of nodes with different computational capabilities. The method is based on a new index, which takes into consideration two levels of processors heterogeneity: the number of cores per node and the computational power of each core. The experimental results show that this index allows achieving balanced workload distributions even on those clusters where heterogeneity can not be neglected. 相似文献
11.
In order to reach challenging performance goals, computer architecture is expected to change significantly in the near future. Heterogeneous chips, equipped with different types of cores and memory, will force application developers to deal with irregular communication patterns, high levels of parallelism, and unexpected behavior.Load balancing among the heterogeneous compute units will be a critical task in order to achieve an effective usage of the computational power provided by such new architectures. In this highly dynamic scenario, Partitioned Global Address Space (PGAS) languages, like Coarray Fortran, appear a promising alternative to standard MPI programming that uses two-sided communications, in particular because of PGAS one-sided semantic and ease of programmability. In this paper, we show how Coarray Fortran can be used for implementing dynamic load balancing algorithms on an exascale compute node and how these algorithms can produce performance benefits for an Asian option pricing problem, running in symmetric mode on Intel Xeon Phi Knights Corner and Knights Landing architectures. 相似文献
12.
Disk load balancing for video-on-demand systems 总被引:5,自引:0,他引:5
For a video-on-demand computer system, we propose a scheme which balances the load on the disks, thereby helping to solve
a performance problem crucial to achieving maximal video throughput. Our load-balancing scheme consists of two components.
The static component determines good assignments of videos to groups of striped disks. The dynamic component uses these assignments,
and features a “DASD dancing” algorithm which performs real-time disk scheduling in an effective manner. Our scheme works
synergistically with disk striping. We examine the performance of the proposed algorithm via simulation experiments. 相似文献
13.
Sheldon Shen 《Acta Informatica》1988,25(6):663-676
Summary This paper explores and applies the concept of cooperation to the load balancing problem in a computer network. We discuss an analytical model and propose a scheme which can be classified as distributed, dynamic, and stochastic. In the case of a homogeneous network, we guarantee that the load is balanced and no communication cost or information exchange is necessary. 相似文献
14.
Distributed strategic interleaving with load balancing 总被引:1,自引:0,他引:1
In a previous paper, we developed an algebraic theory of threads, interleaving of threads, and interaction of threads with services. In the current paper, we assume that the threads and services are distributed over the nodes of a network. We extend the theory developed so far to the distributed case by introducing distributed interleaving strategies that support explicit thread migration and see to load balancing or capability searching by implicit thread migration. The extension to the distributed case provides insight into details of multi-threading that come up in a networked environment. 相似文献
15.
In this paper we consider the application of accelerated techniques in order to increase the rate of convergence of the diffusive iterative load balancing algorithms. In particular, we compare the application of Semi-Iterative, Second Degree and Variable Extrapolation techniques on the basic diffusion method for various types of network graphs. 相似文献
16.
网络中流量的不断增长容易导致流量不均衡、网络拥塞,进而影响用户的体验。因特网服务提供商(ISP)通常采用优化开放最短路径优先(OSPF)权值(OPW)算法应对网络拥塞,然而该算法存在三个方面的问题:1)需要实际流量矩阵;2)容易导致网络震荡;3)OPW已经被证实为NP难题,并且需要采用集中式方法求解。针对OPW算法存在的问题,提出了一种基于逐跳计算的分布式负载均衡算法(DLBH)。首先,为所有节点设置虚拟流量;然后,根据虚拟流量计算所有链路的代价;最后,采用分布式算法计算最优路由。DLBH采用分布式方法解决网络拥塞问题,而OPW只能采用集中式方法解决网络拥塞问题,因此DLBH的扩展性优于OPW的扩展性。理论分析表明,DLBH的时间复杂度远远小于OPW的时间复杂度。实验结果表明,DLBH的最大链路利用率明显低于OPW算法的最大链路利用率,大幅降低了网络拥塞。 相似文献
17.
Dynamic load balancing on Web-server systems 总被引:1,自引:0,他引:1
Popular Web sites cannot rely on a single powerful server nor on independent mirrored-servers to support the ever-increasing request load. Distributed Web server architectures that transparently schedule client requests offer a way to meet dynamic scalability and availability requirements. The authors review the state of the art in load balancing techniques on distributed Web-server systems, and analyze the efficiencies and limitations of the various approaches 相似文献
18.
Willebeek-LeMair M.H. Reeves A.P. 《Parallel and Distributed Systems, IEEE Transactions on》1993,4(9):979-993
Dynamic load balancing strategies for minimizing the execution time of single applications running in parallel on multicomputer systems are discussed. Dynamic load balancing (DLB) is essential for the efficient use of highly parallel systems when solving non-uniform problems with unpredictable load estimates. With the evolution of more highly parallel systems, centralized DLB approaches which make use of a high degree of knowledge become less feasible due to the load balancing communication overhead. Five DLB strategies are presented which illustrate the tradeoff between 1) knowledge - the accuracy of each balancing decision, and 2) overhead - the amount of added processing and communication incurred by the balancing process. All five strategies have been implemented on an Inter iPSC/2 hypercube 相似文献
19.
20.
《Journal of Parallel and Distributed Computing》2005,65(11):1397-1405
In this paper, three distributed load-balancing algorithms for dynamic networks are investigated. Dynamic networks are networks in which the topology may change dynamically. The definition of a dynamic network is introduced and its graph model is presented. The main result of this study consists in proving the convergence toward the uniform load distribution of the diffusion algorithm on an arbitrary dynamic network despite communication link failures. We also give two adaptations of this algorithm (the GAE and the relaxed diffusion). Note that the hypotheses of our result are realistic and that for example the network does not have to be maintained connected. To study the behavior of these algorithms, we compare the load evolution by several simulations. 相似文献