首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 328 毫秒
1.
基于任务-资源分配图优化选取的网格依赖任务调度   总被引:3,自引:0,他引:3  
任务调度是网格应用系统获得高性能的关键.网格计算中一个大型的应用程序往往被分解为具有依赖关系的多个任务.在资源个体差异较大、广域互连的网格环境下任务间的依赖关系对传统的调度策略提出了新的挑战.任务调度的主要工作是为任务分配资源以及确定任务的执行次序,将依赖任务的可能的资源分配方案表示为任务-资源分配图(T-RAG),在该图的基础上提出了基于T-RAG优化选取的依赖任务调度模型,将依赖任务调度问题转化为图的优化选取问题,解析最优任务-资源分配图可以同时确定资源分配方案和任务的执行次序即为最优调度方案.最后,实现了基于该模型的任务调度算法,该算法与ILHA算法的对比分析表明,在资源差异较大及任务间存在大量数据传输的情况下所提出的算法更优.  相似文献   

2.
网格环境下几种容错策略的比较   总被引:2,自引:0,他引:2  
使用simgrid模拟环境,预先设定好资源的出错情况,分别使用简单重试、主动备份、被动备份、检查点四种容错策略对作业调度进行模拟。比较批量作业的平均执行时间和最终完成时间,分析容错策略对上述两种时间的影响,以期给网格环境下容错的方案设计提供指导。  相似文献   

3.
基于多QoS属性的分类优化调度算法   总被引:1,自引:1,他引:0       下载免费PDF全文
实现用户的服务质量(Qos)是网格计算中力求达到的重要目标,网格资源的分布性、异构性、动态性等特征使网格环境下以服务质量为指导的资源调度成为一个复杂的问题,尤其是在用户的任务具有多种QoS属性的情况下。该文利用经济模型研究网格QoS控制的资源分配问题。以效用最大化为目标通过综合效用函数量化服务质量,设计了在时间和费用受限情况下对任务进行分类的优化调度算法,该调度算法满足用户多QoS属性。仿真实验显示了该算法的有效性。  相似文献   

4.
移动网格的资源环境具有很高的动态性,在任意时刻可能发生资源加入、退出、故障、移动等。采用任务复制策略实现对资源不可靠性的容错。用weibull分布刻画资源的可靠性,建立任务复制模型;形式化描述了基于复制策略的独立任务调度问题,给出调度目标和约束条件;通过遗传算法解决调度问题。仿真结果表明,调度算法具有良好的可扩展性,调度性能与资源可靠性呈线性关系。  相似文献   

5.
由于计算资源具有广域分布、异构、动态等特点,使得计算网格环境下资源管理和调度成为一个非常复杂且具有挑战性的问题。分布式系统的性能度量、分析及预报已变得日益重要,而精确预报任务的运行时间,对达到应用性能是至关重要的。该文描述了一个基于负载的任务运行时间预报系统模型,它是任务调度及资源分配算法的中心。  相似文献   

6.
边缘计算已被设想成为增强资源贫乏的智能设备计算能力的有效解决方案.通过任务卸载用户可以将计算复杂的任务卸载到边缘云端执行来满足其对资源的需求.然而,其仍然需要解决能量消耗、可靠性和延时的问题.文中提出了一种基于能耗感知的容错协同任务执行算法,以在减少设备能耗的同时保证卸载到边缘云上的任务成功执行.具体地,首先设计了一种具有容错能力的能耗感知协同任务执行模型,该模型通过将计算卸载模型和容错模型相结合,从而在应用程序的截止完成时间内减少设备能耗.然后,提出了一种基于能耗感知的容错协同任务执行调度算法,该算法包括协同任务执行、初始化调度和在线调度.协同任务执行是通过部分关键路径分析和one-climb策略来确定任务的执行决策;初始化调度是从副本和重新提交中为在边缘端执行的任务选择容错策略,以在发生故障时可针对任务采取相应容错措施;在线调度是在发生故障时实时调整容错策略以确保任务成功处理.最后,在3种具有代表性的任务拓扑上进行了广泛的仿真实验,评估了3种不同方案在任务完成率、能耗比方面的性能差异.结果表明,无论是截止完成时间、传输速率还是容错率的变化,该方法都可以保证任务在截止时间内顺利完成,相比协同任务执行更可靠,而且相比本地执行设备消耗的能量可至少减少30%.  相似文献   

7.
提出桌面网格平台下的一种面向资源可用性预测的任务调度算法.该算法充分考虑了计算资源在执行作业的过程中可能发生的行为,采用预测技术保证了任务的高效而合理的分配.当计算资源发生异常时,通过公平的转移权重预测方法估计资源在下一阶段可能的状态,计算出资源的可靠性概率,然后开始调度子任务给资源.通过建立实验环境,设置不同的可靠性域值T与历史检查资源天数N等参数,在桌面网格上进行了测试.最后把该调度算法的实验结果与PPS等调度策略进行比较,验证了本文的任务调度算法在子任务处理率与通信轮回时间上有比较好的性能.  相似文献   

8.
在硬实时系统的应用中,如果硬实时任务不能在规定的时限完成,将会产生人员伤亡, 失等严重后果,为了保证在系统出错的情况下,硬实时任务仍然在能戴止时限之前完成,必须研究实时容错技术。本文从实时容错调度算法的角度出发,提出一种基于分布式系统的实时容错调度算法,并研究了该算法的时间复杂度,同时给出一个实例说明该容错调度算法的调度过程。这种容错调算法称为“无容错需求后调度算法(NFRL),该实时容错调度算法  相似文献   

9.
网格计算中的资源是动态和异构的,常规的静态作业调度方法不适宜网格计算环境,对于网格计算中一类并行计算的有效执行有赖于网格资源(CPU和网络带宽等)与作业的有效匹配。提出了一种基于资源预测结果对作业进行调度的策略,首先阐述了网格主机负载预测的研究成果——IAR模型,并提出了一种预测网络带宽的工具——网络性能平面,利用资源预测结果构造了一种反馈作业调度模型并对一类基于时间平衡的作业进行实验。结果表明,该模型在与其他诸多方法比较中,取得了执行时间较短和稳定性较好的效果。  相似文献   

10.
根据网格工作流中任务的依赖关系和截止时间,以及资源的有效度和MIPS(每秒百万条指令),提出基于网格资源预测的任务优先级调度算法。把网格任务工作流抽象为有向无环图,找到该工作流的关键路径,计算每个任务的最迟开始执行时间,作为任务的优先级。在算法中考虑用户的要求和资源的类型,以及任务调度失败后重新分配的问题。实验验证了该算法的有效性。  相似文献   

11.
申德荣  陈翔宇  吕立昂  邵一川  于戈 《计算机工程》2006,32(21):124-126,129
为了实现服务网格系统内负载的均衡分布,提高资源利用率和系统的吞吐率,设计并实现了一种基于服务网格环境的动态负载平衡系统。提出了层次式负载平衡调度模式,给出了本系统结构形式,设计并实现了一种综合考虑各局部代理作业数和各个局部代理性能以及当前的负载情况的动态双阈值作业分配算法。实验结果表明,此算法能有效地基于负载分派作业,达到了提高网格内分布资源的利用率和减少作业调度时间的目的。  相似文献   

12.
The frequent and volatile unavailability of volunteer-based Grid computing resources challenges Grid schedulers to make effective job placements. The manner in which host resources become unavailable will have different effects on different jobs, depending on their runtime and their ability to be checkpointed or replicated. A multi-state availability model can help improve scheduling performance by capturing the various ways a resource may be available or unavailable to the Grid. This paper uses a multi-state model and analyzes a machine availability trace in terms of that model. Several prediction techniques then forecast resource transitions into the model’s states. We analyze the accuracy of our predictors, which outperform existing approaches. We also propose and study several classes of schedulers that utilize the predictions, and a method for combining scheduling factors. We characterize the inherent tradeoff between job makespan and the number of evictions due to failure, and demonstrate how our schedulers can navigate this tradeoff under various scenarios. Lastly, we propose job replication techniques, which our schedulers utilize to replicate those jobs that are most likely to fail. Our replication strategies outperform others, as measured by improved makespan and fewer redundant operations. In particular, we define a new metric for replication efficiency, and demonstrate that our multi-state availability predictor can provide information that allows our schedulers to be more efficient than others that blindly replicate all jobs or some static percentage of jobs.  相似文献   

13.
用爬山法实现无中心式网格调度   总被引:1,自引:0,他引:1  
为方便网格资源的扩展,网格调度应当是无中心的.为在尽可能多的计算资源中为单地点作业优化资源选择,这里采用了爬山算法.当一个网格调度器收到一个单地点作业,爬山法被激活,根据网格调度器之间的相邻关系为作业找出最适合的计算系统,这里每个计算系统的适合度用预测的作业响应时间表示.实验模拟了无中心式网格调度与计算系统之间的性能差别,每个计算系统的本地调度采用保守式装填法,网格工作负荷由模型得到,并用一段工作负荷的平均响应时间衡量调度性能.实验结果表明,即使在作业提交点分布不均匀且运行时间估计不准确情况下,爬山法仍可有效改善单地点作业的调度.  相似文献   

14.
The growing complexity and size of High Performance Computing systems (HPCs) lead to frequent job failures, which may cause significant performance degradation. In order to provide high performance and reliable computing services, an in-depth understanding of the characteristics of HPC job failures is essential. In this paper, we present an empirical study on job failures of 10 public workload data sets collected from 8 large-scale HPCs all over the world. Multiple analysis methods are applied to provide a comprehensive and in-depth understanding of job failures. In order to facilitate design, testing and management of HPCs, we study properties of job failures from the following four aspects: proportion in workload and resource consumption, submission inter-arrival time, locality, and runtime.Our analysis results show that job failure rates are significant in most HPCs, and on average, a failed job often consumes more computational resources than a successful job. We also observe that the submission inter-arrival time of failed jobs is better fit by Generalized Pareto and Lognormal distributions, and the probability of failed job submission follows a “V” shape: decreasing during the first 100 seconds right after the submission of the last failed job and increasing afterward. The majority of job failures come from a small number of users and applications, and furthermore these users are the primary factor related to job failures compared with these applications. We find evidence that failed jobs’ lifetime accuracy (runtime / request time) always follows the “bathtub curve”. Moreover, job failures exhibit strong locality properties that can support the prediction of failed jobs’ occurrence and runtime. Most of these findings are new contributions from the research community, and some findings also reveal important properties of job failures that were misunderstood or poorly understood before. The wide range of studies in this paper can directly and thoroughly facilitate fault tolerant, scheduling, workload modeling, etc. in HPCs, and lead to better system utility while reducing costs.  相似文献   

15.
可靠的网格作业调度机制   总被引:1,自引:1,他引:0  
陶永才  石磊 《计算机应用》2010,30(8):2066-2069
针对网格环境的动态性特征,提出了一种可靠的网格作业调度机制(DGJS)。按照作业完成时间期限,DGJS将作业分为:高QoS级、低QoS级和无QoS级,不同QoS级作业有不同的调度优先权;基于资源可用性预测,DGJS采用基于可靠性代价的作业调度策略,将作业尽可能调度到可靠性高的资源节点;另外,DGJS对不同QoS级作业采用不同的容错策略,在保证故障容错的同时,节省网格资源。实验表明:在动态的网格环境下,较之传统的网格作业调度算法,DGJS提高了作业成功率,减少了作业完成时间。  相似文献   

16.
多QoS约束网格作业调度问题的多目标演化算法   总被引:14,自引:2,他引:12  
针对网格计算中的多QoS约束网格作业调度问题,以独立作业为研究对象,将其规约为多目标组合最优化问题.通过深入剖析多目标最优化理论及其演化算法,结合网格作业调度自然特征,提出了一种解决多QoS约束网格作业调度问题的多目标演化算法.该算法求解多个QoS维度效用函数指标的非劣解集,尝试解决多管理域间网格用户、资源管理者等网格实体的多目标协同问题.仿真结果表明,在时间维度、可靠性维度、安全性维度QoS效用值等用户级QoS指标,以及丢弃作业数等系统级指标方面该算法与QoS-Min-min和QoS-Sufferage等同类算法相比具有较好的综合性能.  相似文献   

17.
Grid computing is a largely adopted paradigm to federate geographically distributed data centers. Due to their size and complexity, grid systems are often affected by failures that may hinder the correct and timely execution of jobs, thus causing a non-negligible waste of computing resources. Despite the relevance of the problem, state-of-the-art management solutions for grid systems usually neglect the identification and handling of failures at runtime. Among the primary goals to be considered, we claim the need for novel approaches capable to achieve the objectives of scalable integration with efficient monitoring solutions and of fitting large and geographically distributed systems, where dynamic and configurable tradeoffs between overhead and targeted granularity are necessary. This paper proposes GAMESH, a Grid Architecture for scalable Monitoring and Enhanced dependable job ScHeduling. GAMESH is conceived as a completely distributed and highly efficient management infrastructure, concentrating on two crucial aspects for large-scale and multi-domain grid environments: (i) the scalable dissemination of monitoring data and (ii) the troubleshooting of job execution failures. GAMESH has been implemented and tested in a real deployment encompassing geographically distributed data centers across Europe. Experimental results show that GAMESH (i) enables the collection of measurements of both computing resources and conditions of task scheduling at geographically sparse sites, while imposing a limited overhead on the entire infrastructure, and (ii) provides a failure-aware scheduler able to improve the overall system performance, even in the presence of failures, by coordinating local job schedulers at multiple domains.  相似文献   

18.
In grid computing, resource management and fault tolerance services are important issues. The availability of the selected resources for job execution is a primary factor that determines the computing performance. In this paper, we propose a resource manager for optimal resource selection. Our resource manager automatically selects the set of optimal resources among candidate resources that achieves optimal performance using a genetic algorithm. Typically, the probability of a failure is higher in the grid computing than in a traditional parallel computing and the failure of resources affects job execution fatally. Therefore, a fault tolerance service is essential in computational grids. And grid services are often expected to meet some minimum levels of Quality of Service (QoS) for a desirable operation. To address this issue, we also propose a fault tolerance service that satisfies QoS requirements. We extend the definition of failures from the conventional notion of failures in distribute systems in order to provide a fault tolerance service that deals with various types of resource failures, which include process failures, processor failures, and network failures. We also design and implement a fault detector and a fault manager. The implementation and simulation results indicate that our approaches are promising in that (1) the resource manager finds the optimal set of resources that guarantees efficient job execution, (2) the fault detector detects the occurrence of resource failures and (3) the fault manager guarantees that the submitted jobs complete and the performance of job execution is improved due to job migration even if some failures occur.  相似文献   

19.
Computational grid provides a wide distributed platform for high‐end compute intensive applications. Grid scheduling is often carried out to schedule the submitted jobs on the nodes of the grid so that some characteristic parameter is optimized. Availability of the computational nodes is one of the important characteristic parameters and measures the probability of the node availability for job execution. This paper addresses the availability of the grid computational nodes for the job execution and proposes a model to maximize it. As such, the task scheduling problem in grid is nondeterministic polynomial‐time hard, and often, metaheuristics techniques are applied to solve it. Genetic algorithm, a metaheuristic technique based on evolutionary computation, has been used to solve such complex optimization problem. This work proposes a technique for the grid scheduling problem using genetic algorithm with the objective to maximize availability. Simulation experiment, to evaluate the performance of the proposed algorithm, is conducted, and results reveal the effectiveness of the model. A comparative study has also been performed. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

20.
Adaptive Execution of Jobs in Computational Grid Environment   总被引:1,自引:0,他引:1       下载免费PDF全文
In a computational grid, jobs must adapt to the dynamically changing heterogeneous environment with an objective of maintaining the quality of service. In order to enable adaptive execution of multiple jobs running concurrently in a computational grid, we propose an integrated performance-based resource management framework that is supported by a multi-agent system (MAS). The multi-agent system initially allocates the jobs onto different resource providers based on a resource selection algorithm. Later, during runtime, if performance of any job degrades or quality of service cannot be maintained for some reason (resource failure or overloading), the multi-agent system assists the job to adapt to the system. This paper focuses on a part of our framework in which adaptive execution facility is supported. Adaptive execution facility is availed by reallocation and local tuning of jobs. Mobile, as well as static agents are employed for this purpose. The paper provides a summary of the design and implementation and demonstrates the efficiency of the framework by conducting experiments on a local grid test bed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号