期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A hybrid fault tolerance technique in grid computing system 总被引：1，自引：0，他引：1

Kalim Qureshi Fiaz Gul Khan Paul Manuel Babar Nazir 《The Journal of supercomputing》2011,56(1):106-128

In order to achieve high level of reliability and availability, the grid infrastructure should be a foolproof fault tolerant. Fault tolerance plays a key role in order to assert availability and reliability of a grid system. Since the failure of resources affects job execution fatally, fault tolerance service is essential to satisfy QoS requirement in grid computing. 相似文献

2.

Semantic-enabled CARE Resource Broker (SeCRB) for managing grid and cloud environment

Thamarai Selvi Somasundaram Kannan Govindarajan Usha Kiruthika Rajkumar Buyya 《The Journal of supercomputing》2014,68(2):509-556

Grid computing is mainly helpful for executing high-performance computing applications. However, conventional grid resources sometimes fail to offer a dynamic application execution environment and this increases the rate at which the job requests of users are rejected. Integrating emerging virtualization technologies in grid and cloud computing facilitates the provision of dynamic virtual resources in the required execution environment. Resource brokers play a significant role in managing grid and cloud resources as well as identifying potential resources that satisfy users’ application requests. This research paper proposes a semantic-enabled CARE Resource Broker (SeCRB) that provides a common framework to describe grid and cloud resources, and to discover them in an intelligent manner by considering software, hardware and quality of service (QoS) requirements. The proposed semantic resource discovery mechanism classifies the resources into three categories viz., exact, high-similarity subsume and high-similarity plug-in regions. To achieve the necessary user QoS requirements, we have included a service level agreement (SLA) negotiation mechanism that pairs users’ QoS requirements with matching resources to guarantee the execution of applications, and to achieve the desired QoS of users. Finally, we have implemented the QoS-based resource scheduling mechanism that selects the resources from the SLA negotiation accepted list in an optimal manner. The proposed work is simulated and evaluated by submitting real-world bio-informatics and image processing application for various test cases. The result of the experiment shows that for jobs submitted to the resource broker, job rejection rate is reduced while job success and scheduling rates are increased, thus making the resource management system more efficient. 相似文献

3.

Adaptive checkpointing strategy to tolerate faults in economy based grid 总被引：3，自引：2，他引：1

Babar Nazir Kalim Qureshi Paul Manuel 《The Journal of supercomputing》2009,50(1):1-18

In this paper, we develop a fault tolerant job scheduling strategy in order to tolerate faults gracefully in an economy based grid environment. We propose a novel adaptive task checkpointing based fault tolerant job scheduling strategy for an economy based grid. The proposed strategy maintains a fault index of grid resources. It dynamically updates the fault index based on successful or unsuccessful completion of an assigned task. Whenever a grid resource broker has tasks to schedule on grid resources, it makes use of the fault index from the fault tolerant schedule manager in addition to using a time optimization heuristic. While scheduling a grid job on a grid resource, the resource broker uses fault index to apply different intensity of task checkpointing (inserting checkpoints in a task at different intervals). To simulate and evaluate the performance of the proposed strategy, this paper enhances the GridSim Toolkit-4.0 to exhibit fault tolerance related behavior. We also compare “checkpointing fault tolerant job scheduling strategy” with the well-known time optimization heuristic in an economy based grid environment. From the measured results, we conclude that even in the presence of faults, the proposed strategy effectively schedules grid jobs tolerating faults gracefully and executes more jobs successfully within the specified deadline and allotted budget. It also improves the overall execution time and minimizes the execution cost of grid jobs. 相似文献

4.

An efficient job management of computing service using integrated idle VM resources for high-performance computing based on OpenStack

Han Seok-Hyeon Kim Hyun-Woo Jeong Young-Sik 《The Journal of supercomputing》2019,75(8):4388-4407

In recent years, various studies on OpenStack-based high-performance computing have been conducted. OpenStack combines off-the-shelf physical computing devices and creates a resource pool of logical computing. The configuration of the logical computing resource pool provides computing infrastructure according to the user’s request and can be applied to the infrastructure as a service (laaS), which is a cloud computing service model. The OpenStack-based cloud computing can provide various computing services for users using a virtual machine (VM). However, intensive computing service requests from a large number of users during large-scale computing jobs may delay the job execution. Moreover, idle VM resources may occur and computing resources are wasted if users do not employ the cloud computing resources. To resolve the computing job delay and waste of computing resources, a variety of studies are required including computing task allocation, job scheduling, utilization of idle VM resource, and improvements in overall job’s execution speed according to the increase in computing service requests. Thus, this paper proposes an efficient job management of computing service (EJM-CS) by which idle VM resources are utilized in OpenStack and user’s computing services are processed in a distributed manner. EJM-CS logically integrates idle VM resources, which have different performances, for computing services. EJM-CS improves resource wastes by utilizing idle VM resources. EJM-CS takes multiple computing services rather than single computing service into consideration. EJM-CS determines the job execution order considering workloads and waiting time according to job priority of computing service requester and computing service type, thereby providing improved performance of overall job execution when computing service requests increase.

相似文献

5.

Enhancing Availability of Grid Computational Services to Ubiquitous Computing Applications

Roy Nirmalya Das Sajal K. 《Parallel and Distributed Systems, IEEE Transactions on》2009,20(7):953-967

The Grid is an integrated infrastructure that can play the dual roles of a coordinated resource consumer as well as a donator in distributed computing environments. The enormous growth in the use of mobile and embedded devices in ubiquitous computing environment and their interaction with human beings produces a huge amount of data that need to be processed efficiently anytime anywhere. However, such devices often have limited resources in terms of CPU, storage, battery power, and communication bandwidth. Thus, there is a need to transfer ubiquitous computing application services to more powerful computational resources. In this paper, we investigate the use of the Grid as a candidate for provisioning computational services to applications in ubiquitous computing environments. In particular, we present a competitive model that describes the possible interaction between the competing resources in the Grid Infrastructure as service providers and ubiquitous applications as subscribers. The competition takes place in terms of quality of service (QoS) and cost offered by different Grid Service Providers (GSPs). We also investigate the job allocation of different GSPs by exploiting the noncooperativeness among the strategies. We present the equilibrium behavior of our model facing global competition under stochastic demand and estimate guaranteed QoS assurance level by efficiently satisfying the requirement of ubiquitous application. We have also performed extensive experiments over Distributed Parallel Computing Cluster (DPCC) and studied overall job execution performance of different GSPs under a wide range of QoS parameters using different strategies. Our model and performance evaluation results can serve as a valuable reference for designing appropriate strategies in a practical grid environment. 相似文献

6.

Replication based fault tolerant job scheduling strategy for economy driven grid

Babar Nazir Kalim Qureshi Paul Manuel 《The Journal of supercomputing》2012,62(2):855-873

In this paper, the problem of fault tolerance in grid computing is addressed and a novel adaptive task replication based fault tolerant job scheduling strategy for economy driven grid is proposed. The proposed strategy maintains fault history of the resources termed as resource fault index. Fault index entry for the resource is updated based on successful completion or failure of an assigned task by the grid resource. Grid Resource Broker then replicates the task (submitting the same task to different backup resources) with different intensity, based on vulnerability of resource towards faults suggested by resource fault index. Consequently, in case of possible fault at a resource the results of replicated task(s) on other backup resource(s) can be used. Hence, user job(s) can be completed within specified deadline and assigned budget, even on the event of faults at the grid resource(s). Through extensive simulations, performance of the proposed strategy is evaluated and compared with the Time Optimization and Checkpointing based Strategy in an economy driven grid environment. The experimental results demonstrate that in the presence of faults, proposed fault tolerant strategy improves the number of tasks completed with varied deadline and fixed budget as well as number of tasks completed with varied budget and fixed deadline. Additionally, the proposed strategy used a smaller percentage of deadline time as compare to both Time Optimization and Checkpointing based Strategy. Although the proposed strategy has a percentage of budget spent greater than that of Time Optimization Strategy and Checkpointing based Strategy, it is accepted as a proposed strategy in time optimization where the main objective is to maximize tasks completed within a given deadline. It can be concluded from the experiments that the proposed strategy shows improvement in satisfying the user QoS requirements. It can effectively schedule tasks and tolerate faults gracefully even in the presence of failures, but the costs are slightly higher in terms of budget consumption. Hence, the proposed fault tolerant strategy helps in sustaining user??s faith in the grid, by enabling the grid to deliver reliable and consistent performance in the presence of faults. 相似文献

7.

Fault-tolerant grid architecture and practice 总被引：10，自引：0，他引：10

下载免费PDF全文

金海邹德清陈汉华孙建华吴松《计算机科学技术学报》2003,18(4):0-0

Grid computing emerges as effective technologies to couple geographically dis-tributed resources and solve large-scale computational problems in wide area networks. The fault tolerance is a significant and complex issue in grid computing systems. Various techniques have been investigated to detect and correct faults in distributed computing systems. Unreliable fault detection is one of the most effective techniques. Globus as a grid middleware manages resources in a wide area network. The Globns fault detection service uses the well-known techniques basedon unreliable fault detectors to detect and report component failures. However, more powerful techniques are required to detect and correct both system-level and application-level faults in agrid system, and a convenient toolkit is also needed to maintain the consistency in the grid. Afault-tolerant grid platform (FTGP) based on an unreliable fault detector and the Globus faultdetection service is presented in this paper. The platform offers effective strategies in such threeaspects as grid key components, user tasks, and high-level applications. 相似文献

8.

一种面向服务的网格作业管理机制 总被引：14，自引：0，他引：14

余海燕查礼李伟《计算机研究与发展》2003,40(12):1770-1774

开放网格服务体系结构(OGSA)的出现表明让网格资源以服务形式提供标准化的接口已成为趋势，然而目前的网格作业管理系统主要针对以程序形式提交的批处理类型作业，其管理的资源对象和调度目标主要面向科学计算，在基于服务的网格环境下远不能满足应用的需求：一方面，用户的使用模式从批处理为主转向带有交互性质的服务访问模式；另一方面，不同的应用对服务质量(QoS)有着不同级别的需求，为了解决这些问题，提出了一种面向服务的作业管理机制，它作为用户访问网格资源(服务)的代理，为用户提供透明的、与资源物理位置无关的并带有会话支持的作业服务接口。还引入了服务水平协议(SLA)的概念来表示用户需求的不同网格服务级别，作业管理系统则根据可定制的服务水平实现配置(SLAP)将sIA中规定的各项QoS特性映射到具体的作业管理行为，该作业管理机制已应用于织女星网格系统软件中，并能够为基于服务网格的应用提供灵活有效的支持。相似文献

9.

一种利用云环境实现Web服务组合容错的方法

牛天飞王志坚叶枫张雪洁沈一尘《计算机与数字工程》2012,40(10):95-98

随着Web服务迅猛发展,基于SOA的服务组合容错性研究成为了业界关注的焦点.服务组合中组件的失效,会导致整个组合执行失败,而资源不足、服务器负载过重是引发组件失效的主要原因之一.云计算作为一种新的计算泛型,其资源整合、动态分配的特点为解决传统SOA下因资源受限而引发的失效提供了新的容错思路.FTEL层将中间件技术应用到服务组合容错上,借助云环境完成服务替换,对传统SOA下较难解决的由于资源问题引起的组件失效进行了容错. 相似文献

10.

A reliable checkpoint storage strategy for grid

Sana Malik Babar Nazir Kalim Qureshi Imran Ali Khan 《Computing》2013,95(7):611-632

Computational grids are composed of heterogeneous autonomously managed resources. In such environment, any resource can join or leave the grid at any time. It makes the grid infrastructure unreliable in nature resulting in delay and failure of executing jobs. Thus, fault tolerance becomes a vital aspect of grid for realizing reliability, availability and quality-of-service. The most common technique, for achieving fault tolerance, used in High Performance Computing is rollback recovery. It relies on the availability of checkpoints and stability of storage media. Thus the checkpoints are replicated on storage media. It increases the job execution time, if replication is not done in proper manner. Furthermore, dedicating powerful resources solely as checkpoint storage results in loss of computation power of these resources. It may results in bottlenecks, when the load on the network is high. To address the problem, in this paper checkpoint replication based fault tolerance strategy named as Reliable Checkpoint Storage Strategy (RCSS) is proposed. In RCSS, the checkpoints are replicated on all checkpoint servers in the grid in distributed manner. It decreases the checkpoint replication time and in turn improves the overall job execution time. Additionally, if a resource fails during execution of a job, the RCSS restarts the job from its last valid checkpoint taken from any checkpoint server in the grid. Furthermore to increase the grid performance, CPU cycles of checkpoint servers are also utilized during high load on network. To evaluate the performance of RCSS simulations are carried out using GridSim. The simulation results show that RCSS outperforms in intra-cluster Checkpoint wave completion time by 12.5 % with varying number of checkpoint servers. RCSS also reduces checkpoint wave completion time by 50 % with varying number of clusters. Additionally RCSS reduces replication time within cluster by 39.5 %. 相似文献

11.

A computational economy for grid computing and its implementation in the Nimrod-G resource broker 总被引：6，自引：0，他引：6

David Rajkumar Jonathan 《Future Generation Computer Systems》2002,18(8)

Computational grids that couple geographically distributed resources such as PCs, workstations, clusters, and scientific instruments, have emerged as a next generation computing platform for solving large-scale problems in science, engineering, and commerce. However, application development, resource management, and scheduling in these environments continue to be a complex undertaking. In this article, we discuss our efforts in developing a resource management system for scheduling computations on resources distributed across the world with varying quality of service (QoS). Our service-oriented grid computing system called Nimrod-G manages all operations associated with remote execution including resource discovery, trading, scheduling based on economic principles and a user-defined QoS requirement. The Nimrod-G resource broker is implemented by leveraging existing technologies such as Globus, and provides new services that are essential for constructing industrial-strength grids. We present the results of experiments using the Nimrod-G resource broker for scheduling parametric computations on the World Wide Grid (WWG) resources that span five continents. 相似文献

12.

计算网格中基于时间均衡的并行粗粒度任务调度算法

胡艳丽张维明肖卫东汤大权《小型微型计算机系统》2008,29(1):124-129

考虑网格资源异构、自治、动态等特性,讨论本地用户具有强占优先权情况下的任务调度问题,提出了TBBS(Time-Balancing Based Scheduling Algorithm)算法.建立调度优化模型,以期望完成时间最小为目标选择执行任务的最佳资源组合.以时间均衡策略将任务分解并调度到资源上执行,减少了子任务同步时因等待而产生的延时,获得较好的并行计算性能.采用重复调度策略,适应计算网格中资源的特性. 相似文献

13.

Modeling and analysis of the effects of QoS and reliability on pricing, profitability, and risk management in multiperiod grid-computing networks

Jose M. CruzAuthor Vitae Zugang LiuAuthor Vitae 《Decision Support Systems》2012,52(3):562-576

In this paper we develop a network equilibrium model for optimal pricing and resource allocation in Computational Grid Network. We consider a general network economy model with Grid Resource Providers, Grid Resource Brokers and Grid Users. The proposed framework allows for the modeling and theoretical analysis of Computational Grid Markets that considers a non-cooperative behavior of decision-makers in the same tier of the grid computing network (such as, for example, Grid Resource Providers) as well as cooperative behavior between tiers (between Resource Providers and Grid Brokers). We introduce risk management into the decision making process by analyzing the decision-marker's reliability and quality of service (QoS) requirement. We analyze resource allocation patterns as well as equilibrium price based on demand, supply, and cost structure of the grid computing market network. We specifically answer the following questions with several numerical examples: How do system reliability levels affect the QoS levels of the service providers and brokers under competition? How do system reliability levels affect the profits of resource providers and brokers in a competitive market? How do system reliability levels influence the pricing of the services in a competitive environment? How do users' service request types, QoS requirements, and timing concerns affect users' behaviors, costs and risks in equilibrium? How does the market mechanism allocate resources to satisfy the demands of users? We find that for users who request same services certain timing flexibility can not only reduce the costs but also lower the risks. The results indicated that the value of QoS can be efficiently priced based on the heterogeneous service demands. 相似文献

14.

On the Building of a Job Scheduler System for Globus Grid Environment

Sugree Phatanapherom Putchong Uthayopas 《计算机工程》2002,28(Z1)

相似文献

15.

A scalable multi-attribute hybrid overlay for range queries on the cloud

Kuan-Chou Lai You-Fu Yu 《Information Systems Frontiers》2012,14(4):895-908

Cloud computing has become a promising paradigm as next generation computing model, by providing computation, software, data access, and storage services that do not need to know the location of physical resources interconnected across the globe providing such services. In such an environment, important issues as information sharing and resource/service discovery arise. In order to overcome critical limitations in centralized approaches for information sharing and resource/service discovery, this paper proposes a framework of a scalable multi-attribute hybrid overlay featured with decentralized information sharing, flexible resource/service discovery, fault tolerance and load balancing. Additionally, the proposed hybrid overlay integrates a structured P2P system with an unstructured one to support complex queries. Mechanisms such as load balancing and fault tolerance implemented in our proposed system to improve the overall system performance are also discussed. Experimental results show that the performance of the proposed approach is feasible and stable, as the proposed hybrid overlay improves system performance by reducing the number of routing hops and balancing the load by migrating requests. 相似文献

16.

信任驱动的网格调度算法 总被引：1，自引：0，他引：1

下载免费PDF全文

李冉于炯侯勇《计算机工程与应用》2009,45(23):118-122

针对目前网格资源管理中任务与资源匹配问题的不足,基于信任效益函数与匹配概念,提出了信任驱动的网格调度匹配算法。在调度中同时还考虑了任务和资源效益值,对已经提出的两种信任驱动的网格调度算法进行改进。结果证明:该算法较传统基于的信任驱动调度算法而言,信任效益值,资源效益值,负载平衡和失效服务数等方面有较好的综合性能。相似文献

17.

Grid Service Discovery with Rough Sets 总被引：3，自引：0，他引：3

Maozhen Li Bin Yu Rana O. Zidong Wang 《Knowledge and Data Engineering, IEEE Transactions on》2008,20(6):851-862

The computational grid is rapidly evolving into a service-oriented computing infrastructure that facilitates resource sharing and large-scale problem solving over the Internet. Service discovery becomes an issue of vital importance in utilizing grid facilities. This paper presents ROSSE, a Rough sets-based search engine for grid service discovery. Building on the Rough sets theory, ROSSE is novel in its capability to deal with the uncertainty of properties when matching services. In this way, ROSSE can discover the services that are most relevant to a service query from a functional point of view. Since functionally matched services may have distinct nonfunctional properties related to the quality of service (QoS), ROSSE introduces a QoS model to further filter matched services with their QoS values to maximize user satisfaction in service discovery. ROSSE is evaluated from the aspects of accuracy and efficiency in discovery of computing services. 相似文献

18.

基于可信QoS的路径查找及负载均衡策略研究

董学文刘启航《信息网络安全》2020,(5):29-38

Web服务是云计算中资源调用的有效方式。单一Web服务功能往往有限,只能完成特定任务。服务组合则可以将多种Web服务形成有效的调用序列,实现更为强大的功能。服务发布量以及服务请求量的迅速激增带来了新的安全问题。首先,现有的服务组合方案均以服务质量(QoS)为依据进行Web服务选择,但服务质量通常由服务发布者提供,存在服务发布者发布虚假QoS值诱骗用户的欺诈现象;其次,传统的服务组合方案只生成一条最优路径,当恶意请求持续访问时,会造成某服务节点瘫痪,甚至整个服务组合系统失效。因此,针对服务质量恶意欺诈的问题,文章提出一种可信的QoS计算模型,根据Web服务发布者的信用综合评估服务质量;针对单一最优路径无法满足大量请求的问题,文章提出一种路径发现和负载均衡的多路径方法。仿真结果表明,文章提出的方法不仅能提高服务组合的成功率,满足用户的需求,而且能找到更多的服务组合方案执行。相似文献

19.

网格动态容错服务架构研究*

姬晓波陈蜀宇田东王荣斌a 《计算机应用研究》2008,25(8):2534-2536

错误的频繁发生已经成为阻碍网格稳健发展和大规模应用的主要障碍之一,网格系统的容错性研究显得尤为重要。根据网格计算的特点,提出了网格环境下的特殊容错需求;结合用户的服务质量要求,建立了包括网格错误检测与网格错误管理的动态容错服务架构,阐述了错误检测服务与错误管理服务的组织结构、各组成模块的具体功能;最后,给出了一个完整的容错服务实现过程。相似文献

20.

GAMESH: A grid architecture for scalable monitoring and enhanced dependable job scheduling

《Future Generation Computer Systems》2017

Grid computing is a largely adopted paradigm to federate geographically distributed data centers. Due to their size and complexity, grid systems are often affected by failures that may hinder the correct and timely execution of jobs, thus causing a non-negligible waste of computing resources. Despite the relevance of the problem, state-of-the-art management solutions for grid systems usually neglect the identification and handling of failures at runtime. Among the primary goals to be considered, we claim the need for novel approaches capable to achieve the objectives of scalable integration with efficient monitoring solutions and of fitting large and geographically distributed systems, where dynamic and configurable tradeoffs between overhead and targeted granularity are necessary. This paper proposes GAMESH, a Grid Architecture for scalable Monitoring and Enhanced dependable job ScHeduling. GAMESH is conceived as a completely distributed and highly efficient management infrastructure, concentrating on two crucial aspects for large-scale and multi-domain grid environments: (i) the scalable dissemination of monitoring data and (ii) the troubleshooting of job execution failures. GAMESH has been implemented and tested in a real deployment encompassing geographically distributed data centers across Europe. Experimental results show that GAMESH (i) enables the collection of measurements of both computing resources and conditions of task scheduling at geographically sparse sites, while imposing a limited overhead on the entire infrastructure, and (ii) provides a failure-aware scheduler able to improve the overall system performance, even in the presence of failures, by coordinating local job schedulers at multiple domains. 相似文献