期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

An efficient grid scheduling strategy for data parallel applications

Kashif Hesham Khan Kalim Qureshi Mostafa Abd-El-Barr 《The Journal of supercomputing》2014,68(3):1487-1502

Scheduling large-scale application in heterogeneous grid systems is a fundamental NP-complete problem that is critical to obtain good performance and execution cost. To achieve high performance in a grid system it requires effective task partitioning, resource management and load balancing. The heterogeneous and dynamic nature of a grid, as well as the diverse demands of applications running on the grid, makes grid scheduling a major task. Existing schedulers in wide-area heterogeneous systems require a large amount of information about the application and the grid environment to produce reasonable schedules. However, this required information may not be available, may be too expensive to collect, or may increase the runtime overhead of the scheduler such that the scheduler is rendered ineffective. We believe that no one scheduler is appropriate for all grid systems and applications. This is because while data parallel applications in which further data partitioning is possible can be further improved by efficient management of resources, smart selection of resources and load balancing can be possible, in functional/not-dividable-task parallel applications such partitioning is either not possible or difficult or expensive in term of performance. In this paper, we propose a scheduler for data parallel applications (SDPA) which offers an efficient task partitioning and load balancing strategy for data parallel applications in grid environment. The proposed SDPA offers two major features: maintaining job priority even if insufficient number of free resources is available and pre-task assignment to cut the idle time of nodes. The SDPA selects nodes smartly according to the nature of task and the nodes’ resources availability. Simulation results conducted reveal that SDPA achieves performance improvement over reported strategies in the reviewed literature in terms of execution time, throughput and waiting time. 相似文献

2.

Using imbalance metrics to optimize task clustering in scientific workflow executions

《Future Generation Computer Systems》2015

Scientific workflows can be composed of many fine computational granularity tasks. The runtime of these tasks may be shorter than the duration of system overheads, for example, when using multiple resources of a cloud infrastructure. Task clustering is a runtime optimization technique that merges multiple short running tasks into a single job such that the scheduling overhead is reduced and the overall runtime performance is improved. However, existing task clustering strategies only provide a coarse-grained approach that relies on an over-simplified workflow model. In this work, we examine the reasons that cause Runtime Imbalance and Dependency Imbalance in task clustering. Then, we propose quantitative metrics to evaluate the severity of the two imbalance problems. Furthermore, we propose a series of task balancing methods (horizontal and vertical) to address the load balance problem when performing task clustering for five widely used scientific workflows. Finally, we analyze the relationship between these metric values and the performance of proposed task balancing methods. A trace-based simulation shows that our methods can significantly decrease the runtime of workflow applications when compared to a baseline execution. We also compare the performance of our methods with two algorithms described in the literature. 相似文献

3.

CUIRRE: An open-source library for load balancing and characterizing irregular applications on GPUs

Tao Zhang Wei Shu Min-You Wu 《Journal of Parallel and Distributed Computing》2014

While Graphics Processing Units (GPUs) show high performance for problems with regular structures, they do not perform well for irregular tasks due to the mismatches between irregular problem structures and SIMD-like GPU architectures. In this paper, we introduce a new library, CUIRRE, for improving performance of irregular applications on GPUs. CUIRRE reduces the load imbalance of GPU threads resulting from irregular loop structures. In addition, CUIRRE can characterize irregular applications for their irregularity, thread granularity and GPU utilization. We employ this library to characterize and optimize both synthetic and real-world applications. The experimental results show that a 1.63× on average and up to 2.76× performance improvement can be achieved with the centralized task pool approach in the library at a 4.57% average overhead with static loading ratios. To avoid the cost of exhaustive searches of loading ratios, an adaptive loading ratio method is proposed to derive appropriate loading ratios for different inputs automatically at runtime. Our task pool approach outperforms other load balancing schemes such as the task stealing method and the persistent threads method. The CUIRRE library can easily be applied on many other irregular problems. 相似文献

4.

计算网格中基于时间均衡的并行粗粒度任务调度算法

胡艳丽张维明肖卫东汤大权《小型微型计算机系统》2008,29(1):124-129

考虑网格资源异构、自治、动态等特性,讨论本地用户具有强占优先权情况下的任务调度问题,提出了TBBS(Time-Balancing Based Scheduling Algorithm)算法.建立调度优化模型,以期望完成时间最小为目标选择执行任务的最佳资源组合.以时间均衡策略将任务分解并调度到资源上执行,减少了子任务同步时因等待而产生的延时,获得较好的并行计算性能.采用重复调度策略,适应计算网格中资源的特性. 相似文献

5.

Experiences from integrating algorithmic and systemic load balancing strategies

Ioana Banicescu Sheikh Ghafoor Vijay Velusamy Samuel H. Russ Mark Bilderback 《Concurrency and Computation》2001,13(2):121-139

Load balancing increases the efficient use of existing resources for parallel and distributed applications. At a coarse level of granularity, advances in runtime systems for parallel programs have been proposed in order to control available resources as efficiently as possible by utilizing idle resources and using task migration. Simultaneously, at a finer granularity level, advances in algorithmic strategies for dynamically balancing computational loads by data redistribution have been proposed in order to respond to variations in processor performance during the execution of a given parallel application. Combining strategies from each level of granularity can result in a system which delivers advantages of both. The resulting integration is systemic in nature and transfers the responsibility of efficient resource utilization from the application programmer to the runtime system. This paper presents the design and implementation of a system that combines an algorithmic fine-grained data parallel load balancing strategy with a systemic coarse-grained task-parallel load balancing strategy, and reports on recent experimental results of running a computationally intensive scientific application under this integrated system. The experimental results indicate that a distributed runtime environment which combines both task and data migration can provide performance advantages with little overhead. It also presents proposals for performance enhancements of the implementation, as well as future explorations for effective resource management. Copyright © 2001 John Wiley & Sons, Ltd. 相似文献

6.

A Dynamic Load Balancing Framework for Real-time Applications in Message Passing Systems

Ghada F. El Kabbany Nayer M. Wanas Nadia H. Hegazi Samir I. Shaheen 《International journal of parallel programming》2011,39(2):143-182

Load balancing algorithms are designed essentially to equally distribute the load on processors and maximize their utilities while minimizing the total task execution time. In order to achieve these goals, the load-balancing mechanism should be “fair” in distributing the load across the different processors. This implies that the difference between the heaviest-loaded and the lightest-loaded processors should be minimized. Therefore, the load information on each processor must be updated such that the load-balancing mechanism can be more effective. In this work, we present an application independent dynamic algorithm for scheduling tasks and load- balancing in message passing systems. We propose a DAG-based Dynamic Load Balancing algorithm for Real time applications (DAG-DLBR) that is designed to work dynamically to cope with possible changes in the load that might occur during runtime. This algorithm addresses the challenge of devising a load balancing scheme which judicially deals with the hybrid execution of existing real-time application (represented by a Direct Acyclic Graph (DAG)) together with newly arriving jobs. The main objective of this algorithm is to reduce response times of the newly arriving jobs while maintaining the time constrains of the existing DAG. To evaluate the performance of the DAG-DLBR algorithm, a comparison with the performance of two common dynamic load balancing algorithms is presented. This comparison is performed by evaluating, experimentally, the execution time of different load balancing algorithms on a homogenous real parallel machine. In addition, the values of load imbalance, the execution time, and the communication overhead time are evaluated analytically using different benchmarks as test-bed workloads. These workloads cover a wide range of dynamic applications with different task types. Experimental results illustrate the improved performance of the DAG-DLBR algorithm compared to both distributed and hierarchal based algorithms by at least 12 and 19%, respectively. This improvement is true for all workloads, even with highly dependent workload. The DAG-DLBR algorithm achieves lower computation time than its corresponding values of both the distributed and the hierarchical-based algorithms for 4, 8, 12 and 16 processors. 相似文献

7.

一种基于任务响应时间预测的网格调度算法的研究

田生伟吐尔根·依布拉音禹龙于炯《计算机工程与应用》2008,44(1):123-125

针对网格环境下计算节点的自治性、异构性、分布性等特征,提出了一种动态的基于任务响应时间预测的调度算法。该调度方法依据历史数据和最近访问过计算节点的任务请求提交时间、任务完成时间、网络通信延迟等信息,预测计算节点将来的任务响应时间,将任务提交给轻负载或性能较优的计算节点完成。实验结果表明,该方法不但可以有效减少不必要的延迟,而且在任务响应时间、任务的吞吐率及任务在调度器内等待被调度的时间方面比随机调度等传统算法要优。相似文献

8.

A hybrid load balancing policy underlying grid computing environment

《Computer Standards & Interfaces》2007,29(2):161-173

In recent years, network bandwidth and quality has been drastically improved, even much faster than the enhancement of computer performance. The various communication and computing tasks in the fields such as telecommunication, multimedia, information technology, and construction simulation, can be integrated and applied in a distributed computing environment nowadays. However, as the demands of many researches for computing resources gradually grow, Grid Computing integrated with a distributed computing environment and the Internet (network) has gained more attention. The so-called Grid Computing is to utilize the idle computing resources (nodes) on the network to facilitate the execution of complicated tasks that require large-scale computing. In other words, the composition of Grid resources is dynamic and varies with time. Thus, when selecting nodes for executing a task, the dynamic of the nodes in the Grid must be considered, and to exploit the effectiveness of the resources, they have to be properly selected according to the properties of the task. This study proposed a hybrid load balancing policy which integrated static and dynamic load balancing technologies to assist in the selection for effective nodes. In addition, if any selected node can no longer provide resources, it can be promptly identified and replaced with a substitutive node to maintain the execution performance and the load balancing of the system. 相似文献

9.

Cacheminer: A runtime approach to exploit cache locality on SMP

Yong Yan Xiaodong Zhang 《Parallel and Distributed Systems, IEEE Transactions on》2000,11(4):357-374

Exploiting cache locality of parallel programs at runtime is a complementary approach to a compiler optimization. This is particularly important for those applications with dynamic memory access patterns. We propose a memory-layout oriented technique to exploit cache locality of parallel loops at runtime on Symmetric Multiprocessor (SMP) systems. Guided by application-dependent and targeted architecture-dependent hints, our system, called Cacheminer, reorganizes and partitions a parallel loop using the memory-access space of its execution. Through effective runtime transformations, our system maximizes the data reuse in each partitioned data region assigned in a cache, and minimizes the data sharing among the partitioned data regions assigned to all caches. The executions of tasks in the partitions are scheduled in an adaptive and locality-presented way to minimize the execution time of programs by trading off load balance and locality. We have implemented the Cacheminer runtime library on two commercial SMP servers and an SimCS simulated SMP. Our simulation and measurement results show that our runtime approach can achieve comparable performance with the compiler optimizations for programs with regular computation and memory-access patterns, whose load balance and cache locality can be well optimized by the tiling and other program transformations. However, our experimental results show that our approach is able to significantly improve the memory performance for the applications with irregular computation and dynamic memory access patterns. These types of programs are usually hard to optimize by static compiler optimizations 相似文献

10.

A Proposal to Extend the OpenMP Tasking Model with Dependent Tasks

Alejandro Duran Roger Ferrer Eduard Ayguadé Rosa M. Badia Jesus Labarta 《International journal of parallel programming》2009,37(3):292-305

Tasking in OpenMP 3.0 has been conceived to handle the dynamic generation of unstructured parallelism. New directives have been added allowing the user to identify units of independent work (tasks) and to define points to wait for the completion of tasks (task barriers). In this document we propose extensions to allow the runtime detection of dependencies between generated tasks, broading the range of applications that can benefit from tasking or improving the performance when load balancing or locality are critical issues for performance. The proposed extensions are evaluated on a SGI Altix multiprocessor architecture using a couple of small applications and a prototype runtime system implementation. 相似文献

11.

一种基于Cache的网格任务反馈调度方法

下载免费PDF全文

袁平鹏曹文治邝坪《软件学报》2006,17(11):2314-2323

网格调度的目标提高网格资源的利用率、改善网格应用的性能,它是网格中需着力解决的问题之一.目前,围绕着网格中的任务调度算法,国内外已做了大量的研究工作,先后提出了各种调度算法.但是,这些调度算法不能很好地适应网格环境下的自治性、动态性、分布性等特征.针对目前网格调度机制存在的问题,提出了一种动态的网格调度技术--基于Cache的反馈调度方法(cache based feedback scheduling,简称CBFS).该调度方法依据Cache中所存放的最近访问过的资源信息,如最近一次请求提交时间、任务完成时间等信息进行反馈调度,将任务提交给负载较小或性能较优的资源来完成.实验结果表明,CBFS方法不但可以有效减少不必要的延迟,而且在任务响应时间的平滑性、任务的吞吐率及任务在调度器等待调度的时间方面比随机调度等传统算法要好. 相似文献

12.

Dynamic balancing of communication and computation load for HLA-based simulations on large-scale distributed systems 总被引：1，自引：0，他引：1

Robson E. De Grande Author VitaeAzzedine BoukercheAuthor Vitae 《Journal of Parallel and Distributed Computing》2011,71(1):40-52

Dynamic balancing of computation and communication load is vital for the execution stability and performance of distributed, parallel simulations deployed on the shared, unreliable resources of large-scale environments. High Level Architecture (HLA) based simulations can experience a decrease in performance due to imbalances that are produced initially and/or during run time. These imbalances are generated by the dynamic load changes of distributed simulations or by unknown, non-managed background processes resulting from the non-dedication of shared resources. Due to the dynamic execution characteristics of elements that compose distributed applications, the computational load and interaction dependencies of each simulation entity change during run time. These dynamic changes lead to an irregular load and communication distribution, which increases overhead of resources and latencies. A static partitioning of load is limited to deterministic applications and is incapable of predicting the dynamic changes caused by distributed applications or by external background processes. Therefore, a scheme for balancing the communication and computational load during the execution of distributed simulations is devised in a scalable hierarchical architecture. The proposed balancing system employs local and cluster monitoring mechanisms in order to observe the distributed load changes and identify imbalances, repartitioning policies to determine a distribution of load and minimize imbalances. A migration technique is also employed by this proposed balancing system to perform reliable and low-latency load transfers. Such a system successfully improves the use of shared resources and increases distributed simulations’ performance by minimizing communication latencies and partitioning the load evenly. Experiments and comparative analyses were conducted in order to identify the gains that the proposed balancing scheme provides to large-scale distributed simulations. 相似文献

13.

Data‐driven execution of fast multipole methods

Hatem Ltaief Rio Yokota 《Concurrency and Computation》2014,26(11):1935-1946

Fast multipole methods (FMMs) have complexity, are compute bound, and require very little synchronization, which makes them a favorable algorithm on next‐generation supercomputers. Their most common application is to accelerate N‐body problems, but they can also be used to solve boundary integral equations. When the particle distribution is irregular and the tree structure is adaptive, load balancing becomes a non‐trivial question. A common strategy for load balancing FMMs is to use the work load from the previous step as weights to statically repartition the next step. The authors discuss in the paper another approach based on data‐driven execution to efficiently tackle this challenging load balancing problem. The core idea consists of breaking the most time‐consuming stages of the FMMs into smaller tasks. The algorithm can then be represented as a directed acyclic graph where nodes represent tasks and edges represent dependencies among them. The execution of the algorithm is performed by asynchronously scheduling the tasks using the queueing and runtime for kernels runtime environment, in a way such that data dependencies are not violated for numerical correctness purposes. This asynchronous scheduling results in an out‐of‐order execution. The performance results of the data‐driven FMM execution outperform the previous strategy and show linear speedup on a quad‐socket quad‐core Intel Xeon system.Copyright © 2013 John Wiley & Sons, Ltd. 相似文献

14.

Towards a hybrid load balancing policy in grid computing system

Kuo-Qin Yan Shun-Sheng Wang Shu-Ching Wang Chiu-Ping Chang 《Expert systems with applications》2009,36(10):12054-12064

Grid computing has become conventional in distributed systems due to technological advancements and network popularity. Grid computing facilitates distributed applications by integrating available idle network computing resources into formidable computing power. As a result, by using efficient integration and sharing of resources, this enables abundant computing resources to solve complicated problems that a single machine cannot manage. However, grid computing mines resources from accessible idle nodes and node accessibility varies with time. A node that is currently idle, may become occupied within a second of time and then be unavailable to provide resources. Accordingly, node selection must provide effective and sufficient resources over a long period to allow load assignment. This study proposes a hybrid load balancing policy to integrate static and dynamic load balancing technologies. Essentially, a static load balancing policy is applied to select effective and suitable node sets. This will lower the unbalanced load probability caused by assigning tasks to ineffective nodes. When a node reveals the possible inability to continue providing resources, the dynamic load balancing policy will determine whether the node in question is ineffective to provide load assignment. The system will then obtain a new replacement node within a short time, to maintain system execution performance. 相似文献

15.

An improved load-balancing mechanism based on deadline failure recovery on GridSim

Deepak Kumar Patel Devashree Tripathy Chitaranjan Tripathy 《Engineering with Computers》2016,32(2):173-188

Grid computing has emerged a new field, distinguished from conventional distributed computing. It focuses on large-scale resource sharing, innovative applications and in some cases, high performance orientation. The Grid serves as a comprehensive and complete system for organizations by which the maximum utilization of resources is achieved. The load balancing is a process which involves the resource management and an effective load distribution among the resources. Therefore, it is considered to be very important in Grid systems. For a Grid, a dynamic, distributed load balancing scheme provides deadline control for tasks. Due to the condition of deadline failure, developing, deploying, and executing long running applications over the grid remains a challenge. So, deadline failure recovery is an essential factor for Grid computing. In this paper, we propose a dynamic distributed load-balancing technique called “Enhanced GridSim with Load balancing based on Deadline Failure Recovery” (EGDFR) for computational Grids with heterogeneous resources. The proposed algorithm EGDFR is an improved version of the existing EGDC in which we perform load balancing by providing a scheduling system which includes the mechanism of recovery from deadline failure of the Gridlets. Extensive simulation experiments are conducted to quantify the performance of the proposed load-balancing strategy on the GridSim platform. Experiments have shown that the proposed system can considerably improve Grid performance in terms of total execution time, percentage gain in execution time, average response time, resubmitted time and throughput. The proposed load-balancing technique gives 7 % better performance than EGDC in case of constant number of resources, whereas in case of constant number of Gridlets, it gives 11 % better performance than EGDC. 相似文献

16.

Adaptive Workload Management through Elastic Scheduling 总被引：6，自引：0，他引：6

Buttazzo Giorgio Abeni Luca 《Real-Time Systems》2002,23(1-2):7-24

In real-time computing systems, timing constraints imposed on application tasks are typically guaranteed off line using schedulability tests based on fixed parameters and worst-case execution times. However, a precise estimation of tasks' computation times is very hard to achieve, due to the non-deterministic behavior of several low-level processor mechanisms, such as caching, prefetching, and DMA data transfer. The disadvantage of relying the guarantee test on a priori estimates is that an underestimation of computation times may jeopardize the correct behavior of the system, whereas an overestimation will certainly waste system resources and causes a performance degradation. In this paper, we propose a new methodology for automatically adapting the rates of a periodic task set without forcing the programmer to provide a priori estimates of tasks' computation times. Actual executions are monitored by a runtime mechanism and used as feedback signals for predicting the actual load and achieving rate adaptation. Load balancing is performed using an elastic task model, according to which tasks utilizations are treated as springs with given elastic coefficients. 相似文献

17.

边缘环境下计算密集型任务调度研究综述

下载免费PDF全文

刘炎培朱运静宾艳茹陈宁宁王丽萍《计算机工程与应用》2022,58(20):28-42

随着移动设备数量的急剧增长及计算密集型应用如人脸识别、车联网以及虚拟现实等的广泛使用,为了实现满足用户QoS请求的任务和协同资源的最优匹配,使用合理的计算密集型应用的任务调度方案,从而解决边缘云中心时延长、成本高、负载不均衡和资源利用率低等问题。阐述了边缘计算环境下计算密集型应用的任务调度框架、执行过程、应用场景及性能指标。从时间和成本、能耗和资源利用率以及负载均衡和吞吐量为优化目标的边缘计算环境下计算密集型应用的任务调度策略进行了对比和分析,并归纳出目前这些策略的优缺点及适用场景。通过分析5G环境下基于SDN的边缘计算架构,提出了基于SDN环境下的边缘计算密集型数据包任务调度策略、基于深度强化学习的计算密集型应用的任务调度策略和5G IoV网络中多目标跨层任务调度策略。从容错调度、动态微服务调度、人群感知调度以及安全和隐私等几个方面总结和归纳了目前边缘计算环境中任务调度所面临的挑战。相似文献

18.

大规模CFD多区结构网格任务负载平衡算法 总被引：1，自引：0，他引：1

唐波王勇献《计算机工程与科学》2014,36(7):1213-1220

针对现有负载平衡算法的适应度低、可扩展性差、通信开销度量不准确的缺陷, 提出一种大规模CFD多区结构网格任务负载平衡算法。通过对网格块的分割、网格块之间的组合映射、进程上网格计算量的调整来实现并行CFD任务负载平衡。实验结果表明, 该算法既适应同构平台也适应异构平台, 既适应网格块数多于进程数的情况也适应网格块数少于进程数的情况, 该算法可使得整个计算空间分配到各进程上的计算量负载平衡, 同时使得各进程间的最大通信开销最小。相似文献

19.

Online scheduling and placement of hardware tasks with multiple variants on dynamically reconfigurable field-programmable gate arrays

Thomas Marconi 《Computers & Electrical Engineering》2014

Hardware task scheduling and placement at runtime plays a crucial role in achieving better system performance by exploring dynamically reconfigurable Field-Programmable Gate Arrays (FPGAs). Although a number of online algorithms have been proposed in the literature, no strategy has been engaged in efficient usage of reconfigurable resources by orchestrating multiple hardware versions of tasks. By exploring this flexibility, on one hand, the algorithms can be potentially stronger in performance; however, on the other hand, they can suffer much more runtime overhead in selecting dynamically the best suitable variant on-the-fly based on its runtime conditions imposed by its runtime constraints. In this work, we propose a fast efficient online task scheduling and placement algorithm by incorporating multiple selectable hardware implementations for each hardware request; the selections reflect trade-offs between the required reconfigurable resources and the task runtime performance. Experimental studies conclusively reveal the superiority of the proposed algorithm in terms of not only scheduling and placement quality but also faster runtime decisions over rigid approaches. 相似文献

20.

Dynamic Data Migration for Structured AMR Solvers

Markus Nordén Henrik Löf Jarmo Rantakokko Sverker Holmgren 《International journal of parallel programming》2007,35(5):477-491

On cc-NUMA multi-processors, the non-uniformity of main memory latencies motivates the need for co-location of threads and data. We call this special form of data locality, geographical locality. In this article, we study the performance of a parallel PDE solver with adaptive mesh refinement (AMR). The solver is parallelized using OpenMP and the adaptive mesh refinement makes dynamic load balancing necessary. Due to the dynamically changing memory access pattern caused by the runtime adaption, it is a challenging task to achieve a high degree of geographical locality. The main conclusions of the study are: (1) that geographical locality is very important for the performance of the solver, (2) that the performance can be improved significantly using dynamic page migration of misplaced data, (3) that a migrate-on-next-touch directive works well whereas the first-touch strategy is less advantageous for programs exhibiting a dynamically changing memory access patterns, and (4) that the overhead for such migration is low compared to the total execution time. 相似文献