首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 140 毫秒
1.
可靠的网格作业调度机制   总被引:1,自引:1,他引:0  
陶永才  石磊 《计算机应用》2010,30(8):2066-2069
针对网格环境的动态性特征,提出了一种可靠的网格作业调度机制(DGJS)。按照作业完成时间期限,DGJS将作业分为:高QoS级、低QoS级和无QoS级,不同QoS级作业有不同的调度优先权;基于资源可用性预测,DGJS采用基于可靠性代价的作业调度策略,将作业尽可能调度到可靠性高的资源节点;另外,DGJS对不同QoS级作业采用不同的容错策略,在保证故障容错的同时,节省网格资源。实验表明:在动态的网格环境下,较之传统的网格作业调度算法,DGJS提高了作业成功率,减少了作业完成时间。  相似文献   

2.
网格计算中的资源是动态和异构的,常规的静态作业调度方法不适宜网格计算环境,对于网格计算中一类并行计算的有效执行有赖于网格资源(CPU和网络带宽等)与作业的有效匹配。提出了一种基于资源预测结果对作业进行调度的策略,首先阐述了网格主机负载预测的研究成果——IAR模型,并提出了一种预测网络带宽的工具——网络性能平面,利用资源预测结果构造了一种反馈作业调度模型并对一类基于时间平衡的作业进行实验。结果表明,该模型在与其他诸多方法比较中,取得了执行时间较短和稳定性较好的效果。  相似文献   

3.
网格是一种复杂的分布式计算系统,研究其网格服务对网格作业的调度算法的分布式部署和性能分析问题具有重要的意义。网格服务调度系统的状态空间模型考虑了具有不同的输入速率和输出速率的作业队列,提出了清空型调度策略和服务调度算法,并在此基础上分析了其分布式部署问题,计算了系统QoS性能指标,指出了稳态吞吐量、稳态响应时间与负载系数的关系。  相似文献   

4.
信任关系是网格作业调度中一个很重要的因素,也是影响网格计算有效性和性能的关键技术之一。将信任机制引入到渲染网格作业调度中,建立渲染网格环境中基于信任机制的作业调度模型,在调度策略上对基本遗传算法进行了改进,提出了基于信任机制的遗传算法。实验结果表明,该算法可以提高任务完成率和平均信任效益,是适用于渲染网格的一种有效作业调度方法。  相似文献   

5.
丁敏敏  贾永库 《计算机工程》2010,36(21):286-287,290
根据网格计算中作业调度的特点,基于Platform公司的LSF系统,提出一种适合管理网格系统中作业调度策略的方案——插件机制。鉴于插件即插即用、易于扩充和实现的优势,对网格系统中的调度模块以插件的形式进行管理,以提高系统的整体调度性能,并为第三方软件提供良好的接口。  相似文献   

6.
HowU网格自适应调度模型   总被引:7,自引:0,他引:7  
随着互联网的迅速发展,基于异构分布式网络的高性能计算平台——计算网格,成为了一种新型的计算模式。介绍了基于G10bus的HowU网格的实现技术,通过资源请求代理进行网格作业提交。为了有效利用网格资源,以“参数考察”任务为例,提出了一种自适应网格任务调度模型,该模型充分考虑了任务的实时容错需求。最后比较分析了该模型与Globus协作分配策略和GRADS分配框架的性能差异.  相似文献   

7.
网格环境下几种容错策略的比较   总被引:2,自引:0,他引:2  
使用simgrid模拟环境,预先设定好资源的出错情况,分别使用简单重试、主动备份、被动备份、检查点四种容错策略对作业调度进行模拟。比较批量作业的平均执行时间和最终完成时间,分析容错策略对上述两种时间的影响,以期给网格环境下容错的方案设计提供指导。  相似文献   

8.
在商业网格和云计算环境中,作业有到达时间、计算量、预算、截止期等参数,其中,预算是时间的函数。准确区分作业的重要性和紧迫性是作业调度系统的一个关键问题。综合利用这四个参数来定义作业的优先级,并提出基于价值密度和相对截止期的网格作业调度算法。分别对弱实时和强实时网格作业的调度进行仿真。仿真结果显示,所提出的调度算法的性能在两种情况下都优于所有对比算法的性能,且在强实时作业情况下优势更明显。  相似文献   

9.
基于性能量化矩阵的计算网格作业调度算法研究   总被引:1,自引:0,他引:1  
提升计算网格系统运行效率的关键在于作业调度算法,如何综合各种因素使得调度策略更为全面是一个有挑战性的问题.通过建立网格资源性能量化矩阵,构建了一个作业调度模型,并基于此模型给出了一个具体的作业调度算法.通过性能分析和实验仿真,该算法在运行时间、占用资源等方面都有较大的改善,能较好地适应网格系统的动态性和可扩展性特点,提高了作业调度效率.  相似文献   

10.
网格计算是近年来得到快速发展的广域网格计算技术,其目标是把因特网整合成一种超大规模的巨大计算机系统,以实现计算资源,存储资源,信息资源.知识资源的全面共享,资源管理与调度是网格计算中的核心部分,在本文介绍网格资源管理,分析了三类资源调度策略;并提出了一个基于信任度的资源调度策略,详细描述了基于信任度的资源调度策略的算法实现.把服务质量有机集成在资源调度策略当中去.  相似文献   

11.
A grid is a distributed computational and storage environment often composed of heterogeneous autonomously managed subsystems. As a result, varying resource availability becomes commonplace, often resulting in loss and delay of executing jobs. To ensure good grid performance, fault tolerance should be taken into account. Commonly utilized techniques for providing fault tolerance in distributed systems are periodic job checkpointing and replication. While very robust, both techniques can delay job execution if inappropriate checkpointing intervals and replica numbers are chosen. This paper introduces several heuristics that dynamically adapt the above mentioned parameters based on information on grid status to provide high job throughput in the presence of failure while reducing the system overhead. Furthermore, a novel fault-tolerant algorithm combining checkpointing and replication is presented. The proposed methods are evaluated in a newly developed grid simulation environment dynamic scheduling in distributed environments (DSiDE), which allows for easy modeling of dynamic system and job behavior. Simulations are run employing workload and system parameters derived from logs that were collected from several large-scale parallel production systems. Experiments have shown that adaptive approaches can considerably improve system performance, while the preference for one of the solutions depends on particular system characteristics, such as load, job submission patterns, and failure frequency.  相似文献   

12.
In grid computing, resource management and fault tolerance services are important issues. The availability of the selected resources for job execution is a primary factor that determines the computing performance. In this paper, we propose a resource manager for optimal resource selection. Our resource manager automatically selects the set of optimal resources among candidate resources that achieves optimal performance using a genetic algorithm. Typically, the probability of a failure is higher in the grid computing than in a traditional parallel computing and the failure of resources affects job execution fatally. Therefore, a fault tolerance service is essential in computational grids. And grid services are often expected to meet some minimum levels of Quality of Service (QoS) for a desirable operation. To address this issue, we also propose a fault tolerance service that satisfies QoS requirements. We extend the definition of failures from the conventional notion of failures in distribute systems in order to provide a fault tolerance service that deals with various types of resource failures, which include process failures, processor failures, and network failures. We also design and implement a fault detector and a fault manager. The implementation and simulation results indicate that our approaches are promising in that (1) the resource manager finds the optimal set of resources that guarantees efficient job execution, (2) the fault detector detects the occurrence of resource failures and (3) the fault manager guarantees that the submitted jobs complete and the performance of job execution is improved due to job migration even if some failures occur.  相似文献   

13.
The grid provides an integrated computer platform composed of differentiated and distributed systems. These resources are dynamic and heterogeneous. In this paper, a novel fault-tolerant grid-scheduling model is presented based on Stochastic Petri Nets (SPN) to assure the heterogeneity and dynamism of the grid system. Also, a new grid-scheduling strategy, the dependable strategy for the shortest expected accomplishing time (DSEAT), is put forward, in which the dependability factor is introduced in the task-dispatching strategy. In the end, the performance of the scheduling strategy based on the fault-tolerant grid-scheduling model is analyzed by an software package, named SPNP. The numerical results show that dynamic resources will increase the response time for all classes of tasks in differing degrees. Compared with shortest expected accomplishing time (SEAT) strategy, the DSEAT strategy can reduce the negative effects of dynamic and autonomic resources to some extent so as to guarantee a high quality of service (QoS).  相似文献   

14.
Adaptive checkpointing strategy to tolerate faults in economy based grid   总被引:3,自引:2,他引:1  
In this paper, we develop a fault tolerant job scheduling strategy in order to tolerate faults gracefully in an economy based grid environment. We propose a novel adaptive task checkpointing based fault tolerant job scheduling strategy for an economy based grid. The proposed strategy maintains a fault index of grid resources. It dynamically updates the fault index based on successful or unsuccessful completion of an assigned task. Whenever a grid resource broker has tasks to schedule on grid resources, it makes use of the fault index from the fault tolerant schedule manager in addition to using a time optimization heuristic. While scheduling a grid job on a grid resource, the resource broker uses fault index to apply different intensity of task checkpointing (inserting checkpoints in a task at different intervals). To simulate and evaluate the performance of the proposed strategy, this paper enhances the GridSim Toolkit-4.0 to exhibit fault tolerance related behavior. We also compare “checkpointing fault tolerant job scheduling strategy” with the well-known time optimization heuristic in an economy based grid environment. From the measured results, we conclude that even in the presence of faults, the proposed strategy effectively schedules grid jobs tolerating faults gracefully and executes more jobs successfully within the specified deadline and allotted budget. It also improves the overall execution time and minimizes the execution cost of grid jobs.  相似文献   

15.
Computational grids are composed of heterogeneous autonomously managed resources. In such environment, any resource can join or leave the grid at any time. It makes the grid infrastructure unreliable in nature resulting in delay and failure of executing jobs. Thus, fault tolerance becomes a vital aspect of grid for realizing reliability, availability and quality-of-service. The most common technique, for achieving fault tolerance, used in High Performance Computing is rollback recovery. It relies on the availability of checkpoints and stability of storage media. Thus the checkpoints are replicated on storage media. It increases the job execution time, if replication is not done in proper manner. Furthermore, dedicating powerful resources solely as checkpoint storage results in loss of computation power of these resources. It may results in bottlenecks, when the load on the network is high. To address the problem, in this paper checkpoint replication based fault tolerance strategy named as Reliable Checkpoint Storage Strategy (RCSS) is proposed. In RCSS, the checkpoints are replicated on all checkpoint servers in the grid in distributed manner. It decreases the checkpoint replication time and in turn improves the overall job execution time. Additionally, if a resource fails during execution of a job, the RCSS restarts the job from its last valid checkpoint taken from any checkpoint server in the grid. Furthermore to increase the grid performance, CPU cycles of checkpoint servers are also utilized during high load on network. To evaluate the performance of RCSS simulations are carried out using GridSim. The simulation results show that RCSS outperforms in intra-cluster Checkpoint wave completion time by 12.5 % with varying number of checkpoint servers. RCSS also reduces checkpoint wave completion time by 50 % with varying number of clusters. Additionally RCSS reduces replication time within cluster by 39.5 %.  相似文献   

16.
网格计算环境下分布并行计算的一种实现方法   总被引:2,自引:1,他引:2  
网格计算为人们处理很多复杂问题提供了新方法。文章利用GlobusToolkit来构建网格计算环境,并把它扩展为分布并行计算的支撑环境,为实现分布并行计算提供了新方法。讨论了任务分布、系统通信和容错机制等关键问题。最后给出了基于网格计算环境下的一个分布并行计算实例,取得了令人满意的实验结果。  相似文献   

17.
In this paper, the problem of fault tolerance in grid computing is addressed and a novel adaptive task replication based fault tolerant job scheduling strategy for economy driven grid is proposed. The proposed strategy maintains fault history of the resources termed as resource fault index. Fault index entry for the resource is updated based on successful completion or failure of an assigned task by the grid resource. Grid Resource Broker then replicates the task (submitting the same task to different backup resources) with different intensity, based on vulnerability of resource towards faults suggested by resource fault index. Consequently, in case of possible fault at a resource the results of replicated task(s) on other backup resource(s) can be used. Hence, user job(s) can be completed within specified deadline and assigned budget, even on the event of faults at the grid resource(s). Through extensive simulations, performance of the proposed strategy is evaluated and compared with the Time Optimization and Checkpointing based Strategy in an economy driven grid environment. The experimental results demonstrate that in the presence of faults, proposed fault tolerant strategy improves the number of tasks completed with varied deadline and fixed budget as well as number of tasks completed with varied budget and fixed deadline. Additionally, the proposed strategy used a smaller percentage of deadline time as compare to both Time Optimization and Checkpointing based Strategy. Although the proposed strategy has a percentage of budget spent greater than that of Time Optimization Strategy and Checkpointing based Strategy, it is accepted as a proposed strategy in time optimization where the main objective is to maximize tasks completed within a given deadline. It can be concluded from the experiments that the proposed strategy shows improvement in satisfying the user QoS requirements. It can effectively schedule tasks and tolerate faults gracefully even in the presence of failures, but the costs are slightly higher in terms of budget consumption. Hence, the proposed fault tolerant strategy helps in sustaining user??s faith in the grid, by enabling the grid to deliver reliable and consistent performance in the presence of faults.  相似文献   

18.
Fault-tolerant grid architecture and practice   总被引:10,自引:0,他引:10       下载免费PDF全文
Grid computing emerges as effective technologies to couple geographically dis-tributed resources and solve large-scale computational problems in wide area networks. The fault tolerance is a significant and complex issue in grid computing systems. Various techniques have been investigated to detect and correct faults in distributed computing systems. Unreliable fault detection is one of the most effective techniques. Globus as a grid middleware manages resources in a wide area network. The Globns fault detection service uses the well-known techniques basedon unreliable fault detectors to detect and report component failures. However, more powerful techniques are required to detect and correct both system-level and application-level faults in agrid system, and a convenient toolkit is also needed to maintain the consistency in the grid. Afault-tolerant grid platform (FTGP) based on an unreliable fault detector and the Globus faultdetection service is presented in this paper. The platform offers effective strategies in such threeaspects as grid key components, user tasks, and high-level applications.  相似文献   

19.
A hybrid fault tolerance technique in grid computing system   总被引:1,自引:0,他引:1  
In order to achieve high level of reliability and availability, the grid infrastructure should be a foolproof fault tolerant. Fault tolerance plays a key role in order to assert availability and reliability of a grid system. Since the failure of resources affects job execution fatally, fault tolerance service is essential to satisfy QoS requirement in grid computing.  相似文献   

20.
一种自适应的网格计算资源组织与发现机制   总被引:4,自引:0,他引:4       下载免费PDF全文
资源发现是网格计算中一个重要的研究问题.计算资源作为支撑网格应用的基础资源,其组织与发现机制尤为重要,但现有的技术和方法在效率、可伸缩性、自适应的动态演化以及对查询方式的支持方面仍有较大的局限性.基于网格应用对计算资源需求特征的深入分析,通过引入计算资源的主属性概念,按照平衡二叉排序树对计算资源进行分类组织,提出基于资源分类树(resource category tree,简称RCT)的资源组织与发现机制.首先,讨论了基于RCT对计算资源的组织机制,包括RCT的基本概念和原理、支持资源动态加入和退出以及资源状态动态变化的自组织机制、负载感知的自适应演化机制和基于备份节点的容错机制;然后,在基于RCT的资源组织结构下,设计了支持4种查询方式的搜索算法,并对算法的复杂度进行了分析;最后,通过多组仿真实验对RCT的性能进行了评估.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号