首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 673 毫秒
1.
The inherent complex nature of current distributed computing architectures hinders the widespread adoption of these systems for mainstream use. In general, users have access to a highly heterogeneous set of compute resources, which may include clusters, grids, desktop grids, clouds, and other compute platforms. This heterogeneity is especially problematic when running parallel and distributed applications. Software is needed which easily combines as many resources as possible into one coherent computing platform. In this paper, we introduce Zorilla: peer‐to‐peer (P2P) middleware that creates a single distributed environment from any available set of compute resources. Zorilla imposes minimal requirements on the resource used, is platform independent, and does not rely on central components. In addition to providing functionality on bare resources, Zorilla can exploit locally available middleware. Zorilla explicitly supports distributed and parallel applications, and allows resources from multiple sites to cooperate in a single computation. Zorilla makes extensive use of both virtualization and P2P techniques. We will demonstrate how virtualization and P2P combine into a simple design, while enhancing functionality and ease of use. Together, these techniques bring our goal a step closer: transparent, easy use of resources, even on very heterogeneous distributed systems. Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   

2.
Cloud computing has permeated into the information technology industry in the last few years, and it is emerging nowadays in scientific environments. Science user communities are demanding a broad range of computing power to satisfy the needs of high‐performance applications, such as local clusters, high‐performance computing systems, and computing grids. Different workloads are needed from different computational models, and the cloud is already considered as a promising paradigm. The scheduling and allocation of resources is always a challenging matter in any form of computation and clouds are not an exception. Science applications have unique features that differentiate their workloads; hence, their requirements have to be taken into consideration to be fulfilled when building a Science Cloud. This paper will discuss what are the main scheduling and resource allocation challenges for any Infrastructure as a Service provider supporting scientific applications.  相似文献   

3.
In multicluster systems, and more generally in grids, jobs may require co‐allocation, that is, the simultaneous or coordinated access of single applications to resources of possibly multiple types in multiple locations managed by different resource managers. Co‐allocation presents new challenges to resource management in grids, such as locating sufficient resources in geographically distributed sites, allocating and managing resources in multiple, possibly heterogeneous sites for single applications, and coordinating the execution of single jobs at multiple sites. Moreover, as single jobs now may have to rely on multiple resource managers, co‐allocation introduces reliability problems. In this paper, we present the design and implementation of a co‐allocating grid scheduler named KOALA that meets these co‐allocation challenges. In addition, we report on the results of an analysis of the performance in our multicluster testbed of the co‐allocation policies built into KOALA . We also include the results of a performance and reliability test of KOALA while our testbed was unstable. Copyright © 2007 John Wiley & Sons, Ltd.  相似文献   

4.
This paper presents a convergence of distributed key‐value storage systems in clouds and supercomputers. It specifically presents ZHT, a zero‐hop distributed key‐value store system, which has been tuned for the requirements of high‐end computing systems. ZHT aims to be a building block for future distributed systems, such as parallel and distributed file systems, distributed job management systems, and parallel programming systems. ZHT has some important properties, such as being lightweight, dynamically allowing nodes join and leave, fault tolerant through replication, persistent, scalable, and supporting unconventional operations such as append, compare and swap, callback in addition to the traditional insert/lookup/remove. We have evaluated ZHT's performance under a variety of systems, ranging from a Linux cluster with 64 nodes, an Amazon EC2 virtual cluster up to 96 nodes, to an IBM Blue Gene/P supercomputer with 8K nodes. We compared ZHT against other key‐value stores and found it offers superior performance for the features and portability it supports. This paper also presents several real systems that have adopted ZHT, namely, FusionFS (a distributed file system), IStore (a storage system with erasure coding), MATRIX (distributed scheduling), Slurm++ (distributed HPC job launch), Fabriq (distributed message queue management); all of these real systems have been simplified because of key‐value storage systems and have been shown to outperform other leading systems by orders of magnitude in some cases. It is important to highlight that some of these systems are rooted in HPC systems from supercomputers, while others are rooted in clouds and ad hoc distributed systems; through our work, we have shown how versatile key‐value storage systems can be in such a variety of environments. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

5.
Computational grids hold great promise in utilizing geographically separated heterogeneous resources to solve large-scale complex problems. However, they suffer from a number of major technical hurdles, including distributed resource management and effective job scheduling. The main focus of this work is devoted on online scheduling of real time applications in distributed environments such as grids. Specifically, we are interested in applications with several independent tasks, each task with a prespecified lifecycle called deadline. Here, our goal is to schedule applications within an optimum overall time considering the specified deadlines. To achieve this, the resource performance prediction based on workload modeling and with the help of queuing techniques is employed. Afterward, a mathematical neural model is used to schedule the subtasks of the application. The main contributions of this work is to incorporate the impatiency factor as well as resource fault in performance modeling of nondedicated distributed systems, and also presenting an efficient and fast parallel scheduling algorithm under time constraint and heterogeneous resources. The proposed model is appropriate for implementation on parallel machines and in O(1) time. The new model was implemented on GridSim toolkit and under various conditions and with different parameters to evaluate the performance of scheduling algorithm. Simulation outcomes have shown that approximately in 87.8% of cases, our model schedules the tasks in such a way that all constraints are satisfied.
Mohammad Kazem AkbariEmail:
  相似文献   

6.
Resource scheduling in large-scale distributed systems, such as grids and clouds, is difficult due to the size, dynamism, and volatility of resources. These resources are eclectic and autonomous, and may exhibit different usage policies, levels of participation, capabilities, local load, and reliability. Moreover, applications are likely to exhibit various patterns and levels, and distributed resources may organize into various different overlay topologies for information and query dissemination. Researchers have proposed a wide variety of approaches and policies for mapping offered load onto resources and for solving the various component parts of the scheduling problem. However, production clouds and grids may be underutilized, and may not exhibit the load to effectively characterize all of the scheduling system inputs. The composition of large-scale systems is also changing, potentially to include more individual and peer-to-peer resources. These factors will influence the effectiveness of proposed scheduling solutions. Therefore, a simulation environment is necessary to study different approaches under different scenarios, especially those that are expected, but that are not currently characteristic of existing systems. This article describes a general-purpose peer-to-peer simulation environment that allows a wide variety of parameters, protocols, strategies and policies to be varied and studied. To provide a proof of concept, utilization of the simulation environment is presented in a large-scale distributed system problem that includes a core model and related mechanisms. In particular, this article presents a definition and possible peer-to-peer solutions for the large-scale scheduling problem. Moreover, this article describes a general simulation model, some policies that can be varied, an implementation, and some sample results.  相似文献   

7.
网格计算为共享和访问大型且不同种类的远程资源集提供一种机制,例如电脑、联机装置、存储空间、数据和应用程序等资源。这些资源通过属性来标识。资源属性具有各种不同程度的动态性,从静态属性(比如操作系统版本)到高动态性(比如网络带宽或CPU负荷)。论文中,在P2P体系中进行大型的和动态的资源发现。在非集中式体系中,评估一套请求传递算法,该算法被设计成能适应不同资源成分(包括共享策略和资源类型)和动态性。为了达到这个目的,建立一个用来模拟两种应用特性的实验平台,这两种特性为:(1)资源按节点分布,而且在共享资源的数量和频率方面也不尽相同:(2)对资源的多种请求模式。  相似文献   

8.
This paper proposes a coordinated load management protocol for Peer-to-Peer?(P2P) coupled federated Grid systems. The participants in the system, such as the resource providers and the consumers who belong to multiple control domains, work together to enable a coordinated federation. The coordinated load management protocol embeds a logical spatial index over a Distributed Hash Table?(DHT) space for efficient management of the coordination objects; the DHT-based space serves as a kind of decentralized blackboard system. We show that our coordination protocol has a message complexity that is logarithmic to the number of nodes in the system, which is significantly better than existing broadcast based coordination protocols. The proposed load management protocol can be applied for efficiently coordinating resource brokering services of distributed computing systems such as grids and PlanetLab. Resource brokering services are the main components that control the way applications are scheduled, managed and allocated in a distributed, heterogeneous, and dynamic Grid computing environments. Existing Grid resource brokers, e-Science application work-flow schedulers, operate in tandem but still lack a coordination mechanism that can lead to efficient application schedules across distributed resources. Further, lack of coordination exacerbates the utilization of various resources (such as computing cycles and network bandwidth). The feasibility of the proposed coordinated load management protocol is studied through extensive simulations.  相似文献   

9.
Compute-intensive applications have gradually changed focus from massively parallel supercomputers to capacity as a resource obtained on-demand. This is particularly true for the large-scale adoption of cloud computing and MapReduce in industry, while it has been difficult for traditional high-performance computing (HPC) usage in scientific and engineering computing to exploit this type of resources. However, with the strong trend of increasing parallelism rather than faster processors, a growing number of applications target parallelism already on the algorithm level with loosely coupled approaches based on sampling and ensembles. While these cannot trivially be formulated as MapReduce, they are highly amenable to throughput computing. There are many general and powerful frameworks, but in particular for sampling-based algorithms in scientific computing there are some clear advantages from having a platform and scheduler that are highly aware of the underlying physical problem. Here, we present how these challenges are addressed with combinations of dataflow programming, peer-to-peer techniques and peer-to-peer networks in the Copernicus platform. This allows automation of sampling-focused workflows, task generation, dependency tracking, and not least distributing these to a diverse set of compute resources ranging from supercomputers to clouds and distributed computing (across firewalls and fragile networks). Workflows are defined from modules using existing programs, which makes them reusable without programming requirements. The system achieves resiliency by handling node failures transparently with minimal loss of computing time due to checkpointing, and a single server can manage hundreds of thousands of cores e.g. for computational chemistry applications.  相似文献   

10.
A mobile ad hoc computational grid is a distributed computing infrastructure that allows mobile nodes to share computing resources in a mobile ad hoc environment. Compared to traditional distributed systems such as grids and clouds, resource allocation in mobile ad hoc computational grids is not straightforward because of node mobility, limited battery power and an infrastructure‐less network environment. The existing schemes are either based on a decentralized architecture that results in poor allocation decisions or assume independent tasks. This paper presents a scheme that allocates interdependent tasks and aims to reduce task completion time and the amount of energy consumed in transmission of data. This scheme comprises two key algorithms: resource selection and resource allocation. The resource selection algorithm is designed to select nodes that remain connected for a longer period, whereas the resource assignment or allocation algorithm is developed to allocate interdependent tasks to the nodes that are accessible at the minimum transmission power. The scheme is based on a hybrid architecture that results in effective allocation decisions, reduces the communication cost associated with the exchange of control information, and distributes the processing burden among the nodes. The paper also investigates the relationship between the data transfer time and transmission energy consumption and presents a power‐based routing protocol to reduce data transfer costs and transmission energy consumption. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

11.
High-speed wide-area networks are expected to enable innovative applications that integrate geographically distributed, high-performance computing, database, graphics and networking resources. However, there is as yet little understanding of the higher-level services required to support these applications, or of the techniques required to implement these services in a scalable, secure manner. We report on a large-scale prototyping effort that has yielded some insights into these issues. Building on the hardware base provided by the I-WAY, a national-scale asynchronous transfer mode (ATM) network, we developed an integrated management and application programming system, called I-Soft. This system was deployed at most of the 17 I-WAY sites and used by many of the 60 applications demonstrated on the I-WAY network. In this paper we describe the I-Soft design and report on lessons learned from application experiments. We focus on four novel concepts that we believe may have relevance to future, similar systems: point of presence machines as a means of simplifying implementation and management; scheduler proxies to integrate local schedulers to computational resource brokers; authentication proxies to provide a uniform authentication environment across multiple administrative domains; and network-aware parallel programming tools to hide heterogeneity and improve performance in heterogeneous environments. Lessons learned in building I-Soft have motivated subsequent research and development efforts in the Globus project. © 1998 John Wiley & Sons, Ltd.  相似文献   

12.
This paper presents a new approach to implementing an adaptability loop in Autonomic Computing (AC) systems, which is based on adaptable aspects. The approach utilizes the concept of adaptable aspect‐oriented programming (AAOP) in which a set of AOP aspects is used to run an application in the manner specified by its adaptability strategy. We present a model execution environment based on this concept, enabling the execution of applications with applied adaptability strategies. In the AAOP‐based AC system, the application is instrumented with aspects selected by the system from a set of all available aspects (sensors, effectors, and goal aspects) in such a way that the system can monitor and manage the application. This model can be used to implement systems that are able to monitor an application and its execution environment, and perform actions such as changing the current set of non‐functional constraints in response to changes in the application or its environment. The model can be used for various types of non‐functional goals, in various programming languages, both in centralized and distributed environments. This paper describes its Java‐based implementation and non‐functional goals referring to resource management. As a consequence, the application uses resources in a way specified in its adaptability strategy. Resource consumption management logic is transparent for the application, meaning that no modifications in the application source code are needed. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

13.
The grid is a promising infrastructure that can allow scientists and engineers to access resources among geographically distributed environments. Grid computing is a new technology which focuses on aggregating resources (e.g., processor cycles, disk storage, and contents) from a large-scale computing platform. Making grid computing a reality requires a resource broker to manage and monitor available resources. This paper presents a workflow-based resource broker whose main functions are matching available resources with user requests and considering network information statuses during matchmaking in computational grids. The resource broker provides a graphic user interface for accessing available and the appropriate resources via user credentials. This broker uses the Ganglia and NWS tools to monitor resource status and network-related information, respectively. Then we propose a history-based execution time estimation model to predict the execution time of parallel applications, according to previous execution results. The experimental results show that our model can accurately predict the execution time of embarrassingly parallel applications. We also report on using the Globus Toolkit to construct a grid platform called the TIGER project that integrates resources distributed across five universities in Taichung city, Taiwan, where the resource broker was developed.
Po-Chi ShihEmail:
  相似文献   

14.
QoS in grid computing   总被引:1,自引:0,他引:1  
Grid computing is already a mainstream paradigm for resource-intensive scientific applications, but it also promises to become the future model for enterprise applications. The grid enables resource sharing and dynamic allocation of computational resources, thus increasing access to distributed data, promoting operational flexibility and collaboration, and allowing service providers to scale efficiently to meet variable demands. Large-scale grids are complex systems composed of thousands of components from disjoined domains. Planning the capacity to guarantee quality of service (QoS) in such environments is a challenge because global service-level agreements (SLAs) depend on local SLAs. We provide a motivating example for grid computing in an enterprise environment and then discuss the how resource allocation affects SLAs.  相似文献   

15.
Predicting the resources that are consumed by a program component is crucial for many parallel or distributed systems. In this context, the main resources of interest are execution time, space and communication/synchronisation costs. There has recently been significant progress in resource analysis technology, notably in type‐based analyses and abstract interpretation. At the same time, parallel and distributed computing are becoming increasingly important. This paper synthesises progress in both areas to survey the state‐of‐the‐art in resource analysis for parallel and distributed computing. We articulate a general model of resource analysis and describe parallel/distributed resource analysis together with the relationship to sequential analysis. We use three parallel or distributed resource analyses as examples and provide a critical evaluation of the analyses. We investigate why the chosen analysis is effective for each application and identify general principles governing why the resource analysis is effective. Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   

16.
并行程序设计环境作为分布并行系统中,并行应用程序开发设计与调试运行控制的工具,对于并行处理技术的研究发展与推广应用,具有重要的作用、本文将分析讨论并行程序环境Express与PVM的系统特点及其实现方法,同时,介绍Express系统在并行图归约智能工作站上的系统移植实现。  相似文献   

17.
In recent years, a variety of computational sites and resources have emerged, and users often have access to multiple resources that are distributed. These sites are heterogeneous in nature and performance of different tasks in a workflow varies from one site to another. Additionally, users typically have a limited resource allocation at each site capped by administrative policies. In such cases, judicious scheduling strategy is required in order to map tasks in the workflow to resources so that the workload is balanced among sites and the overhead is minimized in data transfer. Most existing systems either run the entire workflow in a single site or use naïve approaches to distribute the tasks across sites or leave it to the user to optimize the allocation of tasks to distributed resources. This results in a significant loss in productivity. We propose a multi-site workflow scheduling technique that uses performance models to predict the execution time on resources and dynamic probes to identify the achievable network throughput between sites. We evaluate our approach using real world applications using the Swift parallel and distributed execution framework. We use two distinct computational environments-geographically distributed multiple clusters and multiple clouds. We show that our approach improves the resource utilization and reduces execution time when compared to the default schedule.  相似文献   

18.
A distributed system consists of a collection of autonomous heterogeneous resources that provide resource sharing and a common platform for running parallel compute‐intensive applications. The different application characteristics combined with the heterogeneity and performance variations of the distributed system make it difficult to find the optimal set of needed resources. When deployed, user applications are usually handled by application domain experts or system administrators who depending on the infrastructure provide a scheduling strategy for selecting the best candidate resource over a set of available resources. However, the provided strategy is usually generic, aimed at handling a wide array of applications and does not take into consideration specific application resource requirements. As such, an intelligent method for selecting the best resources based on expert knowledge is needed. In this paper, we propose a neural network‐based multi‐agent resource selection technique capable of mimicking the services of an expert user. In addition, to cope with the geographical distribution of the underlying system, we employ a multi‐agent coordination mechanism. The proposed neural network‐based scheduling framework combined with the multi‐agent intelligence is a unique approach to efficiently deal with the resource selection problem. Results run on a simulated environment show the efficiency of our proposed method. Several scheduling simulations were conducted to compare the performance of some conventional resource selection methods against the proposed agent‐based neural network technique. The results obtained indicate that the agent‐based approach outperformed the classical algorithms by reducing the amount of time required to search for suitable resources irrespective of the resource size. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

19.
Cloud computing is a recent advancement wherein IT infrastructure and applications are provided as ‘services’ to end‐users under a usage‐based payment model. It can leverage virtualized services even on the fly based on requirements (workload patterns and QoS) varying with time. The application services hosted under Cloud computing model have complex provisioning, composition, configuration, and deployment requirements. Evaluating the performance of Cloud provisioning policies, application workload models, and resources performance models in a repeatable manner under varying system and user configurations and requirements is difficult to achieve. To overcome this challenge, we propose CloudSim: an extensible simulation toolkit that enables modeling and simulation of Cloud computing systems and application provisioning environments. The CloudSim toolkit supports both system and behavior modeling of Cloud system components such as data centers, virtual machines (VMs) and resource provisioning policies. It implements generic application provisioning techniques that can be extended with ease and limited effort. Currently, it supports modeling and simulation of Cloud computing environments consisting of both single and inter‐networked clouds (federation of clouds). Moreover, it exposes custom interfaces for implementing policies and provisioning techniques for allocation of VMs under inter‐networked Cloud computing scenarios. Several researchers from organizations, such as HP Labs in U.S.A., are using CloudSim in their investigation on Cloud resource provisioning and energy‐efficient management of data center resources. The usefulness of CloudSim is demonstrated by a case study involving dynamic provisioning of application services in the hybrid federated clouds environment. The result of this case study proves that the federated Cloud computing model significantly improves the application QoS requirements under fluctuating resource and service demand patterns. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

20.
互联互通是现有P2P系统面临的一个核心问题。本文提出了一种基于分布式哈希表DHT结构的跨协议P2P资源共享模型,将多种P2P系统的资源共享信息分布式存储于系统的底层DHT网络中。模型中的每个节点都具有多种P2P客户端的实现,并能够参与到多P2P系统中,智能发现相异P2P系统中的相同共享资源。通过从多P2P系统中并行下载数据,系统极大地减少了用户响应时间和文件下载完成时间,并提供了内容完整性QoS保障。仿真结果表明,该模型在系统鲁棒性、可扩展性及用户体验等方面优于现有的P2P系统。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号