首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
针对网格环境下计算节点的自治性、异构性、分布性等特征,提出了一种动态的基于任务响应时间预测的调度算法。该调度方法依据历史数据和最近访问过计算节点的任务请求提交时间、任务完成时间、网络通信延迟等信息,预测计算节点将来的任务响应时间,将任务提交给轻负载或性能较优的计算节点完成。实验结果表明,该方法不但可以有效减少不必要的延迟,而且在任务响应时间、任务的吞吐率及任务在调度器内等待被调度的时间方面比随机调度等传统算法要优。  相似文献   

2.
考虑网格资源异构、自治、动态等特性,讨论本地用户具有强占优先权情况下的任务调度问题,提出了TBBS(Time-Balancing Based Scheduling Algorithm)算法.建立调度优化模型,以期望完成时间最小为目标选择执行任务的最佳资源组合.以时间均衡策略将任务分解并调度到资源上执行,减少了子任务同步时因等待而产生的延时,获得较好的并行计算性能.采用重复调度策略,适应计算网格中资源的特性.  相似文献   

3.
Task based approaches with dynamic load balancing are well suited to exploit parallelism in irregular applications. For such applications, the execution time of tasks can often not be predicted due to input dependencies. Therefore, a static task assignment to execution resources usually does not lead to the best performance. Moreover, a dynamic load balancing is also beneficial for heterogeneous execution environments. In this article a new adaptive data structure is proposed for storing and balancing a large number of tasks, allowing an efficient and flexible task management. Dynamically adjusted blocks of tasks can be moved between execution resources, enabling an efficient load balancing with low overhead, which is independent of the actual number of tasks stored. We have integrated the new approach into a runtime system for the execution of task-based applications for shared address spaces. Runtime experiments with several irregular applications with different execution schemes show that the new adaptive runtime system leads to good performance also in such situations where other approaches fail to achieve comparable results.  相似文献   

4.
袁平鹏  曹文治  邝坪 《软件学报》2006,17(11):2314-2323
网格调度的目标提高网格资源的利用率、改善网格应用的性能,它是网格中需着力解决的问题之一.目前,围绕着网格中的任务调度算法,国内外已做了大量的研究工作,先后提出了各种调度算法.但是,这些调度算法不能很好地适应网格环境下的自治性、动态性、分布性等特征.针对目前网格调度机制存在的问题,提出了一种动态的网格调度技术--基于Cache的反馈调度方法(cache based feedback scheduling,简称CBFS).该调度方法依据Cache中所存放的最近访问过的资源信息,如最近一次请求提交时间、任务完成时间等信息进行反馈调度,将任务提交给负载较小或性能较优的资源来完成.实验结果表明,CBFS方法不但可以有效减少不必要的延迟,而且在任务响应时间的平滑性、任务的吞吐率及任务在调度器等待调度的时间方面比随机调度等传统算法要好.  相似文献   

5.
The problem of load balancing when executing parallel programs on computational systems with distributed memory is currently of great interest. The most general statement of this problem is that for one parallel loop: execution of a heterogeneous loop on a heterogeneous computational system. When stated in this way, the problem is NP-complete even in the case of two nodes, and no acceptable heuristics for solving it are found. Since the development of heuristics is a rather complicated task, we decided to examine the problem by elementary methods in order to refine (and, possibly, simplify) the original problem statement. The results of our studies are discussed in this paper. Estimates of efficiency of parallel loop execution as functions of the number of nodes of homogeneous and heterogeneous parallel computational systems are obtained. These estimates show that the use of heterogeneous parallel systems reduces the efficiency even in the case when their communication subsystems are scaleable (see the definition in Section 4). The use of local networks (heterogeneous parallel computational systems with nonscaleable communication subsystems) for parallel computations with heavy data exchange is not advantageous and is possible only for a small number of nodes (about five). An algorithm of optimal distribution of data between the nodes of a homogeneous or heterogeneous computational system is suggested. Results of numerical experiments substantiate the conclusions obtained.  相似文献   

6.
Data‐driven programming models such as many‐task computing (MTC) have been prevalent for running data‐intensive scientific applications. MTC applies over‐decomposition to enable distributed scheduling. To achieve extreme scalability, MTC proposes a fully distributed task scheduling architecture that employs as many schedulers as the compute nodes to make scheduling decisions. Achieving distributed load balancing and best exploiting data locality are two important goals for the best performance of distributed scheduling of data‐intensive applications. Our previous research proposed a data‐aware work‐stealing technique to optimize both load balancing and data locality by using both dedicated and shared task ready queues in each scheduler. Tasks were organized in queues based on the input data size and location. Distributed key‐value store was applied to manage task metadata. We implemented the technique in MATRIX, a distributed MTC task execution framework. In this work, we devise an analytical suboptimal upper bound of the proposed technique, compare MATRIX with other scheduling systems, and explore the scalability of the technique at extreme scales. Results show that the technique is not only scalable but can achieve performance within 15% of the suboptimal solution. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

7.
Load balancing increases the efficient use of existing resources for parallel and distributed applications. At a coarse level of granularity, advances in runtime systems for parallel programs have been proposed in order to control available resources as efficiently as possible by utilizing idle resources and using task migration. Simultaneously, at a finer granularity level, advances in algorithmic strategies for dynamically balancing computational loads by data redistribution have been proposed in order to respond to variations in processor performance during the execution of a given parallel application. Combining strategies from each level of granularity can result in a system which delivers advantages of both. The resulting integration is systemic in nature and transfers the responsibility of efficient resource utilization from the application programmer to the runtime system. This paper presents the design and implementation of a system that combines an algorithmic fine-grained data parallel load balancing strategy with a systemic coarse-grained task-parallel load balancing strategy, and reports on recent experimental results of running a computationally intensive scientific application under this integrated system. The experimental results indicate that a distributed runtime environment which combines both task and data migration can provide performance advantages with little overhead. It also presents proposals for performance enhancements of the implementation, as well as future explorations for effective resource management. Copyright © 2001 John Wiley & Sons, Ltd.  相似文献   

8.
Grid computing has become conventional in distributed systems due to technological advancements and network popularity. Grid computing facilitates distributed applications by integrating available idle network computing resources into formidable computing power. As a result, by using efficient integration and sharing of resources, this enables abundant computing resources to solve complicated problems that a single machine cannot manage. However, grid computing mines resources from accessible idle nodes and node accessibility varies with time. A node that is currently idle, may become occupied within a second of time and then be unavailable to provide resources. Accordingly, node selection must provide effective and sufficient resources over a long period to allow load assignment. This study proposes a hybrid load balancing policy to integrate static and dynamic load balancing technologies. Essentially, a static load balancing policy is applied to select effective and suitable node sets. This will lower the unbalanced load probability caused by assigning tasks to ineffective nodes. When a node reveals the possible inability to continue providing resources, the dynamic load balancing policy will determine whether the node in question is ineffective to provide load assignment. The system will then obtain a new replacement node within a short time, to maintain system execution performance.  相似文献   

9.
在异构资源环境中高效利用计算资源是提升任务效率和集群利用率的关键。Kuberentes作为容器编排领域的首选方案,在异构资源调度场景下调度器缺少GPU细粒度信息无法满足用户自定义需求,并且CPU/GPU节点混合部署下调度器无法感知异构资源从而导致资源竞争。综合考虑异构资源在节点上的分布及其硬件状态,提出一种基于Kubernetes的CPU/GPU异构资源细粒度调度策略。利用设备插件机制收集每个节点上GPU的详细信息,并将GPU资源指标提交给调度算法。在原有CPU和内存过滤算法的基础上,增加自定义GPU信息的过滤,从而筛选出符合用户细粒度需求的节点。针对CPU/GPU节点混合部署的情况,改进调度器的打分算法,动态感知应用类型,对CPU和GPU应用分别采用负载均衡算法和最小最合适算法,保证异构资源调度策略对不同类型应用的正确调度,并且在CPU资源不足的情况下充分利用GPU节点的碎片资源。通过对GPU细粒度调度和CPU/GPU节点混合部署情况下的调度效果进行实验验证,结果表明该策略能够有效进行GPU调度并且避免资源竞争。  相似文献   

10.
应用于高性能计算领域的通用GPU拥有强大的并行计算能力,以通用GPU作为主处理器的数据分析系统相较于传统数据库能够提供更好的性能。在大数据场景下,如何根据CPU和GPU的资源在处理器之间合理分配工作负载是亟待解决的问题。提出了一种CPU GPU异构数据分析系统上的负载均衡处理策略。该策略采用流水线模型将工作负载分解,基于流水线设计了负载均衡模型,将工作负载合理分配至异构处理器,减少系统总执行时间开销,实现了性能提升。实验结果表明,提出的基于流水线的负载均衡模型能适应不同查询请求下的不同数据量场景,具有良好的性能。  相似文献   

11.
Dynamic load balancing in heterogeneous systems is a fundamental research topic in parallel computing due to the high availability of such systems. The efficient utilization of the heterogeneous resources can significantly enhance the performance of the parallel system. At the same time, adapting parallel codes to state-of-the-art parallel computers composed of heterogeneous multinode–multicore processors becomes a very hard task because parallel codes are highly dependent on the parallel architectures. That means that applications must be tailored requiring a great deal of programming effort. We have developed the ALBIC (Adaptive Load Balancing of Iterative Computation) system that allows for the dynamic load balancing of iterative codes in heterogeneous dedicated and nondedicated Linux based systems. In order to validate the system several parallel codes have been analyzed in different scenarios. The results show that the ALBIC approach achieves better performance than the other proposal. This lightweighted library eases porting homogeneous parallel codes to heterogeneous platforms, since the code intrusion is low and the programming effort is quite reduced.  相似文献   

12.
Dynamic balancing of computation and communication load is vital for the execution stability and performance of distributed, parallel simulations deployed on the shared, unreliable resources of large-scale environments. High Level Architecture (HLA) based simulations can experience a decrease in performance due to imbalances that are produced initially and/or during run time. These imbalances are generated by the dynamic load changes of distributed simulations or by unknown, non-managed background processes resulting from the non-dedication of shared resources. Due to the dynamic execution characteristics of elements that compose distributed applications, the computational load and interaction dependencies of each simulation entity change during run time. These dynamic changes lead to an irregular load and communication distribution, which increases overhead of resources and latencies. A static partitioning of load is limited to deterministic applications and is incapable of predicting the dynamic changes caused by distributed applications or by external background processes. Therefore, a scheme for balancing the communication and computational load during the execution of distributed simulations is devised in a scalable hierarchical architecture. The proposed balancing system employs local and cluster monitoring mechanisms in order to observe the distributed load changes and identify imbalances, repartitioning policies to determine a distribution of load and minimize imbalances. A migration technique is also employed by this proposed balancing system to perform reliable and low-latency load transfers. Such a system successfully improves the use of shared resources and increases distributed simulations’ performance by minimizing communication latencies and partitioning the load evenly. Experiments and comparative analyses were conducted in order to identify the gains that the proposed balancing scheme provides to large-scale distributed simulations.  相似文献   

13.
In this paper, we present Jcluster, an efficient Java parallel environment that provides some critical services, in particular automatic load balancing and high‐performance communication, for developing parallel applications in Java on a large‐scale heterogeneous cluster. In the Jcluster environment, we implement a task scheduler based on a transitive random stealing (TRS) algorithm. Performance evaluations show that the scheduler based on TRS can make any idle node obtain a task from another node with much fewer stealing times than random stealing (RS), which is a well‐known dynamic load‐balancing algorithm, on a large‐scale cluster. In the performance aspects of communication, with the method of asynchronously multithreaded transmission, we implement a high‐performance PVM‐like and MPI‐like message‐passing interface in pure Java. The evaluation of the communication performance is conducted among the Jcluster environment, LAM‐MPI and mpiJava on LAM‐MPI based on the Java Grande Forum's pingpong benchmark. Copyright © 2005 John Wiley & Sons, Ltd.  相似文献   

14.
Stream computing applications require minimum latency and high throughput for efficiently processing real-time data. Typically, data-intensive applications where large datasets are required to be moved across execution nodes have low latency requirements. In this paper, a stream-based data processing model is adopted to develop an algorithm for optimal partitioning the input data such that the inter-partition data flow remains minimal. The proposed algorithm improves the execution of the data-intensive workflows in heterogeneous computing environments by partitioning the data-intensive workflow and mapping each partition on the available heterogeneous resources that offer minimum execution time. Minimum data movement between the partitions reduces the latency, which can be further reduced by applying advanced data parallelism techniques. In this paper, we apply data parallelism technique to the bottleneck (most compute-intensive) task in each partition that significantly reduces the latency. We study the effectiveness and the performance of the proposed approach by using synthesized workflows and real-world applications, such as Montage and Cybershake. Our evaluation shows that the proposed algorithm provides schedules with approximately 12% reduced latency and nearly 17% enhanced throughput as compared to the existing state of the art algorithms.  相似文献   

15.
In this paper, we present a topology-aware load balancing algorithm for parallel multi-core machines and its proof of asymptotic convergence to an optimal solution. The algorithm, named HwTopoLB, aims to improve the application performance by reducing core idleness and communication delays. HwTopoLB was designed taking into account the properties of current parallel systems composed of multi-core compute nodes, namely their network interconnection, and their complex and hierarchical core topology. The latter comprises multiple levels of cache, and a memory subsystem with NUMA design. These systems provide high processing power at the expense of asymmetric communication costs, which can hamper the performance of parallel applications depending on their communication patterns if ignored. Our load balancing algorithm models asymmetries in terms of latencies and bandwidths, representing the distances and communication costs among hardware components. We have implemented HwTopoLB using the Charm++ Parallel Runtime System and evaluated its performance with two different benchmarks and one application. Our experimental results with HwTopoLB exhibit scalability over clustered multi-core compute nodes, and average performance improvements of 23% over execution without load balancers and 19% over the existing load balancing strategies on different multi-core systems.  相似文献   

16.
FuzzyCLIPS is a rule-based programming language and it is very suitable for developing fuzzy expert systems. However, it usually requires much longer execution time than algorithmic languages such as C and Java. To address this problem, we propose a parallel version of FuzzyCLIPS to parallelize the execution of a fuzzy expert system with data dependence on a cluster system. We have designed some extended parallel syntax following the original FuzzyCLIPS style. To simplify the programming model of parallel FuzzyCLIPS, we hide, as much as possible, the tasks of parallel processing from programmers and implement them in the inference engine by using MPI, the de facto standard for parallel programming for cluster systems. Furthermore, a load balancing function has been implemented in the inference engine to adapt to the heterogeneity of computing nodes. It will intelligently allocate different amounts of workload to different computing nodes according to the results of dynamic performance monitoring. The programmer only needs to invoke the function in the program for better load balancing. To verify our design and evaluate the performance, we have implemented a human resource website. Experimental results show that the proposed parallel FuzzyCLIPS can garner a superlinear speedup and provide a more reasonable response time.  相似文献   

17.
The parallel computation capabilities of modern graphics processing units (GPUs) have attracted increasing attention from researchers and engineers who have been conducting high computational throughput studies. However, current single GPU based engineering solutions are often struggling to fulfill their real-time requirements. Thus, the multi-GPU-based approach has become a popular and cost-effective choice for tackling the demands. In those cases, the computational load balancing over multiple GPU “nodes” is often the key and bottleneck that affect the quality and performance of the real-time system. The existing load balancing approaches are mainly based on the assumption that all GPU nodes in the same computer framework are of equal computational performance, which is often not the case due to cluster design and other legacy issues. This paper presents a novel dynamic load balancing (DLB) model for rapid data division and allocation on heterogeneous GPU nodes based on an innovative fuzzy neural network (FNN). In this research, a 5-state parameter feedback mechanism defining the overall cluster and node performance is proposed. The corresponding FNN-based DLB model will be capable of monitoring and predicting individual node performance under different workload scenarios. A real-time adaptive scheduler has been devised to reorganize the data inputs to each node when necessary to maintain their runtime computational performance. The devised model has been implemented on two dimensional (2D) discrete wavelet transform (DWT) applications for evaluation. Experiment results show that this DLB model enables a high computational throughput while ensuring real-time and precision requirements from complex computational tasks.  相似文献   

18.
Computational Grids are emerging as a new infrastructure for high performance computing. Since the resources in a Grid can be heterogeneous and distributed, mesh-based applications require a mesh partitioner that considers both processor and network heterogeneity. We have developed a heterogeneous mesh partitioner, called PaGrid. PaGrid uses a multilevel graph partitioning approach, augmented by execution time load balancing in the final uncoarsening phase. We show that minimization of total communication cost (e.g., as used by JOSTLE) can lead to significant load being placed on processors connected by slow links, which results in higher application execution times. Therefore, PaGrid balances the estimated execution time of the application across processors. PaGrid performance is compared with two existing mesh partitioners, METIS 4.0 and JOSTLE 3.0, for mapping several application meshes to two models of heterogeneous computational Grids. PaGrid is found to produce significantly better partitions than JOSTLE and slightly better partitions than METIS in most cases, in terms of estimated application execution time averaged over a large number of runs with different random number seeds.  相似文献   

19.
The paper is devoted to the problem of effective query execution in cluster-based systems. An original approach to data placement and replication on the nodes of a cluster system is presented. Based on this approach, a load balancing method for parallel query processing is developed. A method for parallel query execution in cluster systems based on the load balancing method is suggested. Results of computational experiments are presented, and analysis of efficiency of the proposed approaches is performed.  相似文献   

20.
Grid computing has emerged a new field, distinguished from conventional distributed computing. It focuses on large-scale resource sharing, innovative applications and in some cases, high performance orientation. The Grid serves as a comprehensive and complete system for organizations by which the maximum utilization of resources is achieved. The load balancing is a process which involves the resource management and an effective load distribution among the resources. Therefore, it is considered to be very important in Grid systems. For a Grid, a dynamic, distributed load balancing scheme provides deadline control for tasks. Due to the condition of deadline failure, developing, deploying, and executing long running applications over the grid remains a challenge. So, deadline failure recovery is an essential factor for Grid computing. In this paper, we propose a dynamic distributed load-balancing technique called “Enhanced GridSim with Load balancing based on Deadline Failure Recovery” (EGDFR) for computational Grids with heterogeneous resources. The proposed algorithm EGDFR is an improved version of the existing EGDC in which we perform load balancing by providing a scheduling system which includes the mechanism of recovery from deadline failure of the Gridlets. Extensive simulation experiments are conducted to quantify the performance of the proposed load-balancing strategy on the GridSim platform. Experiments have shown that the proposed system can considerably improve Grid performance in terms of total execution time, percentage gain in execution time, average response time, resubmitted time and throughput. The proposed load-balancing technique gives 7 % better performance than EGDC in case of constant number of resources, whereas in case of constant number of Gridlets, it gives 11 % better performance than EGDC.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号