首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A hybrid fault tolerance technique in grid computing system   总被引:1,自引:0,他引:1  
In order to achieve high level of reliability and availability, the grid infrastructure should be a foolproof fault tolerant. Fault tolerance plays a key role in order to assert availability and reliability of a grid system. Since the failure of resources affects job execution fatally, fault tolerance service is essential to satisfy QoS requirement in grid computing.  相似文献   

2.
As fault tolerance is the ability of a system to perform its function correctly even in the presence of faults. Therefore, different fault tolerance techniques (FTTs) are critical for improving the efficient utilization of expensive resources in high performance grid computing systems, and an important component of grid workflow management system.This paper presents a performance evaluation of most commonly used FTTs in grid computing system. In this study, we considered different system centric parameters, such as throughput, turnaround time, waiting time and network delay for the evaluation of these FTTs. For comprehensive evaluation we setup various conditions in which we vary the average percentage of faults in a system, along with different workloads in order to find out the behavior of FTTs under these conditions. The empirical evaluation shows that the workflow level alternative task techniques have performance priority on task level checkpointing techniques. This comparative study will help to grid computing researchers in order to understand the behavior and performance of different FTTs in detail.  相似文献   

3.
The resource management system is the central component of distributed network computing systems. There have been many projects focused on network computing that have designed and implemented resource management systems with a variety of architectures and services. In this paper, an abstract model and a comprehensive taxonomy for describing resource management architectures is developed. The taxonomy is used to identify approaches followed in the implementation of existing resource management systems for very large‐scale network computing systems known as Grids. The taxonomy and the survey results are used to identify architectural approaches and issues that have not been fully explored in the research. Copyright © 2001 John Wiley & Sons, Ltd.  相似文献   

4.
提出与描述了支持低延迟通信与容错的计算资源共享环境LF-CRSE (low latency and fault tolerance CRSE),LF-CRSE提出了节点功能角色的观点,由客户端功能节点、任务服务器、工作机服务提供器、工作机节点组成,形成一个可扩展的分布式网络体系结构.采用了任务缓存、任务预获取和任务服务器端计算等策略保证了通信过程的低延迟开销.在应用上利用分支界限模式的任务划分,使LF-CRSE支持主-从模式和分-治模式的灵活编程模型.通过工作机端的心跳消息和面向子任务的容错方式保证了LF-CRSE的正确性.测试过程选择了具有数据依赖的分布式旅行商问题,实验结果表明,LF-CRSE的加速比随着工作机的增加稳定提高,在低延迟通信和容错特性上也具有良好的性能.  相似文献   

5.
动态自适应性网格资源管理模型研究   总被引:2,自引:1,他引:2  
由于网格计算中的资源具有分布性、自治性、异构性和动态性,对计算网格资源进行高效的管理是一个具有挑战性的问题。提出了把移动代理技术引入网格计算资源管理,从而为资源管理提供一个自适应性的广域资源环境,使网格计算系统能够自动适应环境的变化,为网格计算资源管理的动态自适应性研究指引了方向。  相似文献   

6.
Due to the emergence of grid computing over the Internet, there is a need for a hybrid load balancing algorithm which takes into account the various characteristics of the grid computing environment. Hence, this research proposes a fault tolerant hybrid load balancing strategy namely AlgHybrid_LB, which takes into account grid architecture, computer heterogeneity, communication delay, network bandwidth, resource availability, resource unpredictability and job characteristics. AlgHybrid_LB juxtaposes the strong points of neighbor-based and cluster based load balancing algorithms. Our main objective is to arrive at job assignments that could achieve minimum response time and optimal computing node utilization. Major achievements include low complexity of proposed approach and drastic reduction of number of additional communications induced due to load balancing. A simulation of the proposed approach using Grid Simulation Toolkit (GridSim) is conducted. Experimental results show that the proposed algorithm performs very well in a large grid environment.  相似文献   

7.
错误的频繁发生已经成为阻碍网格稳健发展和大规模应用的主要障碍之一,网格系统的容错性研究显得尤为重要。根据网格计算的特点,提出了网格环境下的特殊容错需求;结合用户的服务质量要求,建立了包括网格错误检测与网格错误管理的动态容错服务架构,阐述了错误检测服务与错误管理服务的组织结构、各组成模块的具体功能;最后,给出了一个完整的容错服务实现过程。  相似文献   

8.
Computational grids that couple geographically distributed resources such as PCs, workstations, clusters, and scientific instruments, have emerged as a next generation computing platform for solving large-scale problems in science, engineering, and commerce. However, application development, resource management, and scheduling in these environments continue to be a complex undertaking. In this article, we discuss our efforts in developing a resource management system for scheduling computations on resources distributed across the world with varying quality of service (QoS). Our service-oriented grid computing system called Nimrod-G manages all operations associated with remote execution including resource discovery, trading, scheduling based on economic principles and a user-defined QoS requirement. The Nimrod-G resource broker is implemented by leveraging existing technologies such as Globus, and provides new services that are essential for constructing industrial-strength grids. We present the results of experiments using the Nimrod-G resource broker for scheduling parametric computations on the World Wide Grid (WWG) resources that span five continents.  相似文献   

9.
In grid computing, grid users who submit applications and resources providers who provide resources have different motivations when they join the grid. Application-centric scheduling aims to optimize the performance of individual application. Resource-centric scheduling aims to optimize the resource utilization of resources provider. Due to autonomy both in grid users and resource providers, the objectives of application-centric and resource-centric scheduling often conflict. The paper proposes a system-centric scheduling that provides a solution of joint optimization of the objectives for both the grid resource and grid application. Utility functions are used to express the objectives of grid resource and application. The system-centric scheduling policy can be formulated as joint optimization of utilities of grid applications and grid resources, which combine both application centric and resource-centric scheduling benefits. Simulations are conducted to study the performance of the system-centric scheduling algorithm. The experiment results show that the system-centric scheduling algorithm yields significantly better performance than application-centric scheduling algorithm and resource-centric scheduling algorithm.  相似文献   

10.
为了解决动态异构网格环境下任务调度多目标问题,提出了基于满意度概念的任务调度模型.该模型最优化目标是保证任务执行总费用最低,执行时间最短和负载平衡,结合满意度目标函数,实现了免疫粒子群(immuneparticle swarlnoptimization,IPSO)启发式调度算法.实验结果表明,随着进化过程的推进,与PSO算法相比,IPSO算法在全局寻优能力、搜索速度和避免早熟方面都有显著提高.  相似文献   

11.
网格计算为共享和访问大型且不同种类的远程资源集提供一种机制,例如电脑、联机装置、存储空间、数据和应用程序等资源。这些资源通过属性来标识。资源属性具有各种不同程度的动态性,从静态属性(比如操作系统版本)到高动态性(比如网络带宽或CPU负荷)。论文中,在P2P体系中进行大型的和动态的资源发现。在非集中式体系中,评估一套请求传递算法,该算法被设计成能适应不同资源成分(包括共享策略和资源类型)和动态性。为了达到这个目的,建立一个用来模拟两种应用特性的实验平台,这两种特性为:(1)资源按节点分布,而且在共享资源的数量和频率方面也不尽相同:(2)对资源的多种请求模式。  相似文献   

12.
We present a new approach to fault tolerance for High Performance Computing system. Our approach is based on a careful adaptation of the Algorithm-Based Fault Tolerance technique [K. Huang, J. Abraham, Algorithm-based fault tolerance for matrix operations, IEEE Transactions on Computers (Spec. Issue Reliable & Fault-Tolerant Comp.) 33 (1984) 518–528] to the need of parallel distributed computation. We obtain a strongly scalable mechanism for fault tolerance. We can also detect and correct errors (bit-flip) on the fly of a computation. To assess the viability of our approach, we have developed a fault-tolerant matrix–matrix multiplication subroutine and we propose some models to predict its running time. Our parallel fault-tolerant matrix–matrix multiplication scores 1.4 TFLOPS on 484 processors (cluster jacquard.nersc.gov) and returns a correct result while one process failure has happened. This represents 65% of the machine peak efficiency and less than 12% overhead with respect to the fastest failure-free implementation. We predict (and have observed) that, as we increase the processor count, the overhead of the fault tolerance drops significantly.  相似文献   

13.
制造网格中基于SLA的资源管理模型研究   总被引:2,自引:0,他引:2  
沈彬  刘丽兰  俞涛 《计算机应用》2006,26(2):512-0514
为确保制造网格平台提供的制造资源及服务能够达到使用者的功能与质量要求,建立了基于SLA的资源管理模型,并对模型中的资源调度算法进行了深入研究。该模型以制造资源的服务质量(QoS)属性为评价标准,以服务水平协议(SLA)为约束条件,通过资源调度方法和策略为消费者提供有质量保证的制造服务。最后,以快速成型制造为应用,验证了模型的实用性及资源调度算法的可行性。  相似文献   

14.
针对微电网资源配置与优化、潮流分析、经济调度等问题,可通过基于云计算的分层控制结构进行处理.该结构可通过更加经济的方式进行资源配置与优化,同时集成分析历史数据与实时数据,为用户提供更精确的分析和更灵活的服务.  相似文献   

15.
考虑通信实体之间的距离、可用带宽以及通信和资源使用费用,提出了抽象距离的数学模型,并结合网格资源和网格应用模型,设计了局部性网格资源调度算法,该算法在选择资源时首先考虑在同一节点的资源,其次通过抽象距离选择邻近的节点。实验表明,局部性调度在通信开销、成本、任务完成时间以及任务执行的成功率等方面都得到了改善。  相似文献   

16.
由于网格资源的自治、分布、异构和动态变化等特性,如何有效地管理和调度资源是网格计算领域中的一个关键问题,至今仍未得到满意的解决。提出了一种基于Agent的网格资源管理模型,为各类Agent设计了动态模糊知识库,并研究了基于动态模糊知识的模糊Q学习算法。算法较好地满足了网格资源管理中的智能适应性、扩展性及调度优化的需要。通过模拟实验验证了所研究模型和算法的有效性和效率。  相似文献   

17.
针对已有模型不能实现更多范围内获取所需要的资源,提出了一个改进的通用抽象体系结构模型,通过接口向网格资源管理系统发布、获取自己所需要的资源,实现用户在更大范围内获取所需要的资源.创建了该模型的原型系结构模型,通过接口向网格资源管理系统发布、获取自己所需要的资源,实现用户在更大范围内获取所需要的资源.创建了该模型的原型系统的网格,并在该网格上进行了一些算法研究.实验结果证实了该实践中的可行性和优越性.  相似文献   

18.
在以服务为核心的网格计算体系结构中,为了实现信任域之间安全任务调度,服务和用户被赋予信任度标签,利用置信度策略成为解决网格计算资源调度安全的一种手段。基于面向服务体系结构,提出了一种服务资源交换协商的参考模型,来解决跨信任域的服务请求和响应问题。文中首先列举了相关的概念,给出参考模型结构,阐述了跨信任域的服务交换协商安全步骤,最后指出了该参考模型中面临的两个重要问题,为进一步研究基于服务交换的网格服务资源分配模型设定方向。  相似文献   

19.
基于资源预测的网格任务调度模型   总被引:1,自引:0,他引:1  
程宏兵 《计算机应用》2010,30(9):2530-2534
跨越虚拟组织中多个域(或集群)的网格任务调度由于资源的不确定性(如动态性和异构性)而成为网格应用中亟待解决的问题。提出了一种有效的基于资源预测的网格任务调度模型——RPTS,该模型利用加权最小二乘方法进行参数估计的自回归滑动平均(ARMA)预测方法对网格环境下的主机负载进行预测。利用上述资源预测结果和一类数据并行性网格任务的建模结果,对它们进行预处理、匹配并调度执行。RPTS充分考虑了网格环境下资源的动态性和异构性,为解决网格环境下任务调度问题提供了一种较好的方法。与其他一些网格任务调度方法进行了一系列的仿真实验,结果表明RPTS模型具有任务执行时间最短和稳定性较好的特点。  相似文献   

20.
首先给出了网格计算中访问控制的特点和需求。现有的访问控制技术以及分布式授权模型,均不能满足网格计算中对访问控制的需求。通过建立实体间的信任关系,在CAS基础上,提出了基于信任度的访问控制机制。为了提高资源的利用率,提出了Ticket机制。最后给出模拟实验结果,验证有效性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号