首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 618 毫秒
1.
Fault-tolerant grid architecture and practice   总被引:10,自引:0,他引:10       下载免费PDF全文
Grid computing emerges as effective technologies to couple geographically dis-tributed resources and solve large-scale computational problems in wide area networks. The fault tolerance is a significant and complex issue in grid computing systems. Various techniques have been investigated to detect and correct faults in distributed computing systems. Unreliable fault detection is one of the most effective techniques. Globus as a grid middleware manages resources in a wide area network. The Globns fault detection service uses the well-known techniques basedon unreliable fault detectors to detect and report component failures. However, more powerful techniques are required to detect and correct both system-level and application-level faults in agrid system, and a convenient toolkit is also needed to maintain the consistency in the grid. Afault-tolerant grid platform (FTGP) based on an unreliable fault detector and the Globus faultdetection service is presented in this paper. The platform offers effective strategies in such threeaspects as grid key components, user tasks, and high-level applications.  相似文献   

2.
The Globus project: a status report   总被引:8,自引:0,他引:8  
The Globus project is a multi-institutional research effort that seeks to enable the construction of computational grids providing pervasive, dependable, and consistent access to high-performance computational resources, despite geographical distribution of both resources and users. Computational grid technology is being viewed as a critical element of future high-performance computing environments that will enable entirely new classes of computation-oriented applications, much as the World Wide Web fostered the development of new classes of information-oriented applications. In this paper, we report on the status of the Globus project as of early 1998. We describe the progress that has been achieved to date in the development of the Globus toolkit, a set of core services for constructing grid tools and applications. We also discuss the Globus Ubiquitous Supercomputing Testbed Organization (GUSTO) that we have constructed to enable large-scale evaluation of Globus technologies, and we review early experiences with the development of large-scale grid applications on the GUSTO testbed.  相似文献   

3.
The advent of service-oriented architectures in Grid environments has fostered the development of applications in distributed deployments. The Globus Toolkit 4 (GT4) and its implementation of stateful Web services, via the WS-Resource Framework (WSRF), is a suitable platform to develop these Grid services. This way, its increased usage in many scientific areas reveals new scenarios where fault-tolerance and high availability should be considered. This paper describes a library that manages the automatic replication of WSRF-based Grid services. This functionality can be plugged to existing Grid services, by means of minimal changes in its source code, to achieve state replication through WS-Resources. The architecture of the library and its performance evaluation are described. In particular, two different replica topologies are addressed: ring-based and leaf-to-root complete binary tree, in order to achieve resource state update in logarithmic time with respect to the number of replicas. Finally, the paper describes the integration of the replication library into a service-oriented metascheduler to enhance fault-tolerance and to guarantee service availability.  相似文献   

4.
Emerging Web-based applications require distributed multimedia information system (DMIS) infrastructures. Examples of such applications abound in the domains of medicine, entertainment, manufacturing, e-commerce, as well as military and critical national infrastructures. Development of DMIS for such applications need a broad range of technological solutions for organizing, storing, and delivering multimedia information in an integrated, secure and timely manner with guaranteed end-to-end (E2E) quality of presentation (QoP). DMIS are viewed as catalysts for new research in many areas, ranging from basic research to applied technology. This view is a result of the fact that no single monolithic end-to-end architecture for DMIS can meet the wide spectrum of characteristics and requirements of various Web-based multimedia applications. One size does not fit all in this medium of communication. Management of integrated end-to-end QoP and ensuring information security in DMIS, when viewed in conjunction with real world constraints and system-wide performance requirements, present formidable research and implementation challenges. These challenges encompass all the sub-system components of a DMIS. The ultimate objective of achieving a comprehensive end-to-end QoP management relies on the performance and allocation of resources of each of the DMIS sub-system components including networks, databases, and end-systems. In this paper, we elaborate on these challenges and present a high level distributed architecture aimed at providing the critical functionality for a DMIS.
Arif GhafoorEmail:
  相似文献   

5.
When parallel applications are run in large‐scale distributed environments, such as grids, peer‐to‐peer (P2P) systems, and clouds, the set of resources used can change dynamically as machines crash, reservations end, and new resources become available. It is vital for applications to respond to these changes. Therefore, it is necessary to keep track of the available resources—a problem which is known to be notoriously difficult. In this article we argue that resource tracking must be provided as the standard functionality in the lower parts of the software stack. We propose a general solution to resource tracking: the Join–Elect–Leave (JEL) model. JEL provides unified resource tracking for parallel and distributed applications across environments. JEL is a simple yet powerful model based on notifying when resources have Joined or Left the computation. We demonstrate that JEL is suitable for resource tracking in a wide variety of programming models, ranging from the fixed resource sets traditionally used in MPI‐1 to flexible grid‐oriented programming models. We compare several JEL implementations, and show these to perform and scale well in several real‐world scenarios involving grids, clouds and P2P systems applied concurrently, and wide‐area systems with failing resources. Using JEL, we have won the first prize in a number of international distributed computing competitions. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

6.
Most health-related issues such as public health outbreaks and epidemiological threats are better understood from a spatial–temporal perspective and, clearly demand related geospatial datasets and services so that decision makers may jointly make informed decisions and coordinate response plans. Although current health applications support a kind of geospatial features, these are still disconnected from the wide range of geospatial services and datasets that geospatial information infrastructures may bring into health. In this paper we are questioning the hypothesis whether geospatial information infrastructures, in terms of standards-based geospatial services, technologies, and data models as operational assets already in place, can be exploited by health applications for which the geospatial dimension is of great importance. This may be certainly addressed by defining better collaboration strategies to uncover and promote geospatial assets to the health community. We discuss the value of collaboration, as well as the opportunities that geographic information infrastructures offer to address geospatial challenges in health applications.  相似文献   

7.
Globus Nexus is a professionally hosted Platform-as-a-Service that provides identity, profile and group management functionality for the research community. Many collaborative e-Science applications need to manage large numbers of user identities, profiles, and groups. However, developing and maintaining such capabilities is often challenging given the complexity of modern security protocols and requirements for scalable, robust, and highly available implementations. By outsourcing this functionality to Globus Nexus, developers can leverage best-practice implementations without incurring development and operations overhead. Users benefit from enhanced capabilities such as identity federation, flexible profile management, and user-oriented group management. In this paper we present Globus Nexus, describe its capabilities and architecture, summarize how several e-Science applications leverage these capabilities, and present results that characterize its scalability, reliability, and availability.  相似文献   

8.
Recently scientific communities produce a growing number of computation-intensive applications, which calls for the interoperation of distributed infrastructures including Clouds, Grids and private clusters. The European SHIWA and ER-flow projects have enabled the combination of heterogeneous scientific workflows, and their execution in a large-scale system consisting of multiple Distributed Computing Infrastructures. One of the resource management challenges of these projects is called parameter study job scheduling. A parameter study job of a workflow generally has a large number of input files to be consumed by independent job instances. In this paper we propose a meta-brokering framework for science gateways to support the execution of such workflows. In order to cope with the high uncertainty and unpredictable load of the utilized distributed infrastructures, we introduce the so called resource priority services. These tools are capable of determining and dynamically updating priorities of the available infrastructures to be selected for job instances. Our evaluations show that this approach implies an efficient distribution of job instances among the available computing resources resulting in shorter makespan for parameter study workflows.  相似文献   

9.
During the last decade, the number of distributed application domains with temporal requirements has significantly augmented, arising the necessity of exploring new concepts and paradigms that allow, on the one hand, the development of dynamic and flexible distributed applications and, on the other hand, the reusability of code. Service‐oriented paradigms have been successfully applied to distributed environments, increasing their flexibility and allowing the reusability of their components. Besides, distributed real‐time Java technologies have shown to be a good candidate to deploy real‐time distributed applications. This paper presents a model for service‐oriented applications on a time‐triggered distributed real‐time Java environment, focusing on the definition of the temporal model of an application and its schedulability, applying and evaluating this model in real‐time service‐oriented composition algorithms. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

10.
Scheduling constitutes an integral feature of Grid computing infrastructures, being also a key to realizing several of the Grid promises. In particular, scheduling can maximize the resources available to end users, accelerate the execution of jobs, while also supporting scalable and autonomic management of the resources comprising a Grid. Grid scheduling functionality hinges on middleware components called meta-schedulers, which undertake to automatically distribute jobs across the dispersed heterogeneous resources of a Grid. In this paper we present the design and implementation of a Grid meta-scheduler, which we call EMPEROR. EMPEROR provides a framework for implementing scheduling algorithms based on performance criteria. In implementing a particular instantiation of this framework, we have devised models for predicting host load and memory resources, and accordingly for estimating the running time of a task. These models hinge on time series analysis techniques and take into account results of the cluster computing literature. Apart from incorporating these models, EMPEROR provides fully fledged Grid scheduling functionality, which complies with OGSA standards as the later are reflected in the Globus toolkit. Specifically, EMPEROR interfaces to Globus middleware services (i.e., GSI, MDS, GRAM) towards discovering resources, implementing the scheduling algorithm and ultimately submitting jobs to local scheduling systems. By and large, EMPEROR is one of the few standards based meta-schedulers making use of dynamic scheduling information.  相似文献   

11.
The inherent complex nature of current distributed computing architectures hinders the widespread adoption of these systems for mainstream use. In general, users have access to a highly heterogeneous set of compute resources, which may include clusters, grids, desktop grids, clouds, and other compute platforms. This heterogeneity is especially problematic when running parallel and distributed applications. Software is needed which easily combines as many resources as possible into one coherent computing platform. In this paper, we introduce Zorilla: peer‐to‐peer (P2P) middleware that creates a single distributed environment from any available set of compute resources. Zorilla imposes minimal requirements on the resource used, is platform independent, and does not rely on central components. In addition to providing functionality on bare resources, Zorilla can exploit locally available middleware. Zorilla explicitly supports distributed and parallel applications, and allows resources from multiple sites to cooperate in a single computation. Zorilla makes extensive use of both virtualization and P2P techniques. We will demonstrate how virtualization and P2P combine into a simple design, while enhancing functionality and ease of use. Together, these techniques bring our goal a step closer: transparent, easy use of resources, even on very heterogeneous distributed systems. Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   

12.
《Parallel Computing》2007,33(4-5):328-338
Large-scale Grid is a computing environment composed of Internet-wide distributed resources shared by a number of applications. Although WSRF and Java-based hosting environment can successfully deal with the heterogeneity of resources and the diversity of applications, the current Grid systems have several limitations to support the dynamic nature of large-scale Grid.This paper proposes DynaGrid, a new framework for building large-scale Grid for WSRF-compliant applications. Compared to the existing Grid systems, DynaGrid provides three new mechanisms: dynamic service deployment, resource migration, and transparent request dispatching. Two core components, ServiceDoor and dynamic service launcher (DSL), have been implemented as WSRF-compliant Web services to realize DynaGrid, which are applicable to any Java-based WSRF hosting environment. We construct a real testbed with DynaGrid on the Globus Toolkit 4 and evaluate the effectiveness of our framework using two practical applications. The evaluation results show that dynamic service deployment and resource migration in DynaGrid bring many advantages to large-scale Grid in terms of performance and reliability with minimal overhead.  相似文献   

13.
Assembling and simultaneously using different types of distributed computing infrastructures (DCI) like Grids and Clouds is an increasingly common situation. Because infrastructures are characterized by different attributes such as price, performance, trust, and greenness, the task scheduling problem becomes more complex and challenging. In this paper we present the design for a fault-tolerant and trust-aware scheduler, which allows to execute Bag-of-Tasks applications on elastic and hybrid DCI, following user-defined scheduling strategies. Our approach, named Promethee scheduler, combines a pull-based scheduler with multi-criteria Promethee decision making algorithm. Because multi-criteria scheduling leads to the multiplication of the possible scheduling strategies, we propose SOFT, a methodology that allows to find the optimal scheduling strategies given a set of application requirements. The validation of this method is performed with a simulator that fully implements the Promethee scheduler and recreates an hybrid DCI environment including Internet Desktop Grid, Cloud and Best Effort Grid based on real failure traces. A set of experiments shows that the Promethee scheduler is able to maximize user satisfaction expressed accordingly to three distinct criteria: price, expected completion time and trust, while maximizing the infrastructure useful employment from the resources owner point of view. Finally, we present an optimization which bounds the computation time of the Promethee algorithm, making realistic the possible integration of the scheduler to a wide range of resource management software.  相似文献   

14.
The rise of virtualized and distributed infrastructures has led to new challenges to accomplish the effective use of compute resources through the design and orchestration of distributed applications. As legacy, monolithic applications are replaced with service-oriented applications, questions arise about the steps to be taken in order to maximize the usefulness of the infrastructures and to provide users with tools for the development and execution of distributed applications. One of the issues to be solved is the existence of multiple cloud solutions that are not interoperable, which forces the user to be locked to a specific provider or to continuously adapt applications. With the objective of simplifying the programmers challenges, ServiceSs provides a straightforward programming model and an execution framework that helps on abstracting applications from the actual execution environment. This paper presents how ServiceSs transparently interoperates with multiple providers implementing the appropriate interfaces to execute scientific applications on federated clouds.  相似文献   

15.
Low-overhead resource monitoring is key to the successful management of distributed high-performance computing environments, particularly when applications have well-defined quality of service (QoS) requirements. The dproc system-level monitoring mechanisms provide tools both for efficiently monitoring system-level events and for notifying remote hosts of events relevant to their operation. Implemented as extension to the Linux kernel, dproc provides several key functions. First, utilizing the familiar /proc virtual filesystem, dproc extends this interface with resource information collected from both local and remote hosts. Second, to predictably capture and distribute monitoring information, dproc uses a kernel-level group communication facility, termed KECho, which implements events and event channels. Third, and the focus of this paper, is dproc's run-time customizability for resource monitoring, which includes the generation and deployment of monitoring functionality within remote operating system kernels. Using dproc, we show that (a) data streams can be customized according to a client's resource availabilities (dynamic stream management), (b) by dynamically varying distributed monitoring (dynamic filtering of monitoring information), an appropriate balance can be maintained between monitoring overheads and application quality, and (c) by performing monitoring at kernel-level, the information captured enables decision making that takes into account the multiple resources used by applications.  相似文献   

16.
The service‐oriented architecture paradigm can be exploited for the implementation of data and knowledge‐based applications in distributed environments. The Web services resource framework (WSRF) has recently emerged as the standard for the implementation of Grid services and applications. WSRF can be exploited for developing high‐level services for distributed data mining applications. This paper describes Weka4WS, a framework that extends the widely used open source Weka toolkit to support distributed data mining on WSRF‐enabled Grids. Weka4WS adopts the WSRF technology for running remote data mining algorithms and managing distributed computations. The Weka4WS user interface supports the execution of both local and remote data mining tasks. On every computing node, a WSRF‐compliant Web service is used to expose all the data mining algorithms provided by the Weka library. The paper describes the design and implementation of Weka4WS using the WSRF libraries and services provided by Globus Toolkit 4. A performance analysis of Weka4WS for executing distributed data mining tasks in different network scenarios is presented. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

17.
The vision of the Internet of Things (IoT) foresees a future Internet incorporating smart physical objects that offer hosted functionality as IoT services. These services when integrated with the traditional enterprise level services form the creation of ambient intelligence for a wide range of applications. To facilitate seamless access and service life cycle management of large, distributed and heterogeneous IoT resources, service oriented computing and resource oriented approaches have been widely used as promising technologies. However, a reference architecture integrating IoT services into either of these two technologies is still an open research challenge. In this article, we adopt the resource oriented approach to provide an end-to-end integration architecture of front-end IoT devices with the back-end business process applications. The proposed architecture promises a programmer friendly access to IoT services, an event management mechanism to propagate context information of IoT devices, a service replacement facility upon service failure, and a decentralized execution of the IoT aware business processes.  相似文献   

18.
The grid is a promising infrastructure that can allow scientists and engineers to access resources among geographically distributed environments. Grid computing is a new technology which focuses on aggregating resources (e.g., processor cycles, disk storage, and contents) from a large-scale computing platform. Making grid computing a reality requires a resource broker to manage and monitor available resources. This paper presents a workflow-based resource broker whose main functions are matching available resources with user requests and considering network information statuses during matchmaking in computational grids. The resource broker provides a graphic user interface for accessing available and the appropriate resources via user credentials. This broker uses the Ganglia and NWS tools to monitor resource status and network-related information, respectively. Then we propose a history-based execution time estimation model to predict the execution time of parallel applications, according to previous execution results. The experimental results show that our model can accurately predict the execution time of embarrassingly parallel applications. We also report on using the Globus Toolkit to construct a grid platform called the TIGER project that integrates resources distributed across five universities in Taichung city, Taiwan, where the resource broker was developed.
Po-Chi ShihEmail:
  相似文献   

19.
The Media Accelerating Peer Services system extends P2P infrastructures to improve multimedia services across heterogeneous computing platforms. In this article, we present an architecture and resource management and adaptation framework that transcends existing infrastructures to accommodate and accelerate multimedia peer applications and services. We also propose key technology components that support seamless adaptation of resources to enhance quality of service and the building of better tools and applications that utilize the peer-computing network's underlying power  相似文献   

20.
《Computer》2001,34(7):99-101
Distributed mission-critical environments employ a mixture of hard and soft real-time applications that usually expect a guaranteed range of quality of service (QoS). These applications have different levels of criticality and varied structures ranging from periodic independent tasks to distributed pipelines or event-driven modules. The underlying distributed system must evolve and adapt to the high variability in resource demands that competing applications impose. The current industry trend is to use commercial off-the-shelf (COTS) hardware and software components to build distributed environments for mission-critical applications. The paper considers how adding a middleware layer above the COTS components facilitates consistent management of system resources, decreases system complexity, and reduces development costs  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号