首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
The SweGrid Accounting System (SGAS) allocates capacity in collaborative Grid environments by coordinating enforcement of Grid‐wide usage limits as a means to offer usage guarantees and prevent overuse. SGAS employs a credit‐based allocation model where Grid capacity is granted to projects via Grid‐wide quota allowances that can be spent across the Grid resources. The resources collectively enforce these allowances in a soft, real‐time manner. SGAS is built on service‐oriented principles with a strong focus on interoperability and Web services standards. This article covers the SGAS design and implementation, which, besides addressing inherent Grid challenges (scale, security, heterogeneity, decentralization), emphasizes generality and flexibility to produce a customizable system with lightweight integration into different middleware and scheduling system combinations. We focus the discussion around the system design, a flexible allocation model, middleware integration experiences and scalability improvements via a distributed virtual banking system, and finally, an extensive set of testbed experiments. The experiments evaluate the performance of SGAS in terms of response times, request throughput, overall system scalability, and its performance impact on the Globus Toolkit 4 job submission software. We conclude that, for all practical purposes, the quota enforcement overhead incurred by SGAS on job submissions is not a limiting factor for the job‐handling capacity of the job submission software. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

The problem of Grid‐middleware interoperability is addressed by the design and analysis of a feature‐rich, standards‐based framework for all‐to‐all cross‐middleware job submission. The architecture is designed with focus on generality and flexibility and builds on extensive use, internally and externally, of (proposed) Web and Grid services standards such as WSRF, JSDL, GLUE, and WS‐Agreement. The external use provides the foundation for easy integration into specific middlewares, which is performed by the design of a small set of plugins for each middleware. Currently, plugins are provided for integration into Globus Toolkit 4 and NorduGrid/ARC. The internal use of standard formats facilitates customization of the job submission service by replacement of custom components for performing specific well‐defined tasks. Most importantly, this enables the easy replacement of resource selection algorithms by algorithms that address the specific needs of a particular Grid environment and job submission scenario. By default, the service implements a decentralized brokering policy, striving to optimize the performance for the individual user by minimizing the response time for each job submitted. The algorithms in our implementation perform resource selection based on performance predictions, and provide support for advance reservations as well as coallocation of multiple resources for coordinated use. The performance of the system is analyzed with focus on overall service throughput (up to over 250 jobs per min) and individual job submission response time (down to under 1 s). Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

A resource broker with a user-friendly interface for job submission developed on a platform constructed using the Globus toolkit is proposed. The broker employs a domain-based network information model and dynamic version to measure network statuses, and also monitors and collects resource statuses and network-related information as the basis of its brokerage. A network bandwidth-aware job scheduling algorithm for brokering suitable Grid resources to communication-intensive jobs based on improving and preserving the advantages of our previously developed network information model is also proposed. Using timely information, the resource broker effectively matches Grid resources and user requests, thus improving job execution efficiency.  相似文献   

To achieve high performance distributed data access and computing in Grid environment, monitoring of resource and network performance is vital. Our proposed Grid network monitoring architecture is modeled by the Grid scheduler. The proposed Grid network monitoring retrieves network metrics using sensors as network monitoring tools. The mobile agents are migrated to start the sensors to measure the network metrics in all Grid Resources from the Resource Broker. The raw data provided by the monitoring tools is used to produce a high level view of the Grid through the set of internal cost functions. The network cost function is formed by combining various network metrics such as bandwidth, Round Trip Time, jitter and packet loss to measure the network performance. This paper presents the Grid Resource Brokering strategy which analyzes the network metrics along with the resource metrics for the selection of the Grid resource to submit the job and the proposed approach is integrated with CARE Resource Broker (CRB) for job submission. The experimental results are evident for the minimization of job completion time for the submitted job. The simulation results also prove that the more number of jobs are completed with the proposed strategy which influences the better utilization of the Grid resources.  相似文献   

网格资源管理体系结构模型研究   总被引:16,自引:0,他引:16  
网格中的资源是在地理上分布的、异构的,并且由多个组织所拥有的具有不同的使用、访问及消费模型的资源。在如此巨大而又分布式的环境中对资源进行管理是一项十分复杂的任务。该文介绍了并讨论了目前主要的三种网格资源管理体系结构模型及其在网格计算系统中的应用。分层模型是目前大多数网格计算系统进行资源管理时所使用的模型;抽象所有者模型在作业提交和结果收集时遵循定购和交货模式;计算经济模型综合了分层模型和抽象所有者模型的实质,并且体现了计算经济在网格资源管理系统中的应用。最后,分析了三种模型的特性并对文章进行总结。  相似文献   

The Grid paradigm for accessing heterogeneous distributed resources proved to be extremely effective, as many organizations are relying on Grid middlewares for their computational needs. Many different middlewares exist, the result being a proliferation of self-contained, non interoperable “Grid islands”. This means that different Grids, based on different middlewares, cannot share resources, e.g. jobs submitted on one Grid cannot be forwarded for execution on another one. To address this problem, standard interfaces are being proposed for some of the important functionalities provided by most Grids, namely job submission and management, authorization and authentication, resource modeling, and others. In this paper we review some recent standards which address interoperability for three types of services: the BES/JSDL specifications for job submission and management, the SAML notation for authorization and authentication, and the GLUE specification for resource modeling. We describe how standards-enhanced Grid components can be used to create interoperable building blocks for a Grid architecture. Furthermore, we describe how existing components from the gLite middleware have been re-engineered to support BES/JSDL, GLUE and SAML. From this experience we draw some conclusions on the strengths and weaknesses of these specifications, and how they can be improved.  相似文献   

Data Grid has evolved to be the solution for data-intensive applications, such as High Energy Physics (HEP), astrophysics, and computational genomics. These applications usually have large input of data to be analyzed and these input data are widely replicated across Data Grid to improve the performance. The job scheduling performance on traditional computing jobs can be studied using queuing theory. However, with the addition of data transfer, the job scheduling performance is too complex to be modeled. In this research, we study the impact of data transfer on the performance of job scheduling in the Data Grid environment. We have proposed a parallel downloading system that supports replicating data fragments and parallel downloading of replicated data fragments, to improve the job scheduling performance. The performance of the parallel downloading system is compared with non-parallel downloading system, using three scheduling heuristics: Shortest Turnaround Time (STT), Least Relative Load (LRL) and Data Present (DP). Our simulation results show that the proposed parallel download approach greatly improves the Data Grid performance for all three scheduling algorithms, in terms of the geometric mean of job turnaround time. The advantage of parallel downloading system is most evident when the Data Grid has relatively low network bandwidth and relatively high computing power.  相似文献   

In this paper, a distributed and scalable Grid service management architecture is presented. The proposed architecture is capable of monitoring task submission behaviour and deriving Grid service class characteristics, for use in performing automated computational, storage and network resource-to-service partitioning. This partitioning of Grid resources amongst service classes (each service class is assigned exclusive usage of a distinct subset of the available Grid resources), along with the dynamic deployment of Grid management components dedicated and tuned to the requirements of a particular service class introduces the concept of Virtual Private Grids. We present two distinct algorithmic approaches for the resource partitioning problem, the first based on Divisible Load Theory (DLT) and the second built on Genetic Algorithms (GA). The advantages and drawbacks of each approach are discussed and their performance is evaluated on a sample Grid topology using NSGrid, an ns-2 based Grid simulator. Results show that the use of this Service Management Architecture in combination with the proposed algorithms improves computational and network resource efficiency, simplifies schedule making decisions, reduces the overall complexity of managing the Grid system, and at the same time improves Grid QoS support (with regard to job response times) by automatically assigning Grid resources to the different service classes prior to scheduling.  相似文献   

面向高性能计算环境的作业优化调度模型的设计与实现   总被引:1,自引:0,他引:1  
高性能计算环境聚合了多个分布在不同地域、不同组织机构的高性能计算资源,面向用户提供统一的访问入口和使用方式,由系统中间件根据用户作业请求匹配合适的高性能计算资源。随着环境应用编程接口的开放以及作业请求数量的大幅增加,面对高并发作业提交请求时,目前采用的即时调度模型会由于网络等原因导致一定数量的请求处理失败,同时缺乏灵活性。针对此问题,优化了环境作业调度模型,引入作业环境队列,细化了作业系统层状态,增加了作业调度策略可配置性,并基于环境中间件SCE实现了系统原型。经测试,在单核心服务每分钟处理近200个作业提交请求的工作负载下,无因系统和网络原因引起的作业提交出错现象;在共计1 000个作业中,近500个作业提交命令请求在0.3s以内完成,800余个作业提交命令请求在0.5s以内完成。  相似文献   

作业管理是用户使用网格服务和网格资源的接口.本文提出了通用的作业管理系统框图和基于Web的作业提交模型,并详细分析了作业提交的过程,时用户管理、提交服务和任务处理等关键问题的实现做了阐述.  相似文献   

Grids offer a dramatic increase in the number of available processing and storing resources that can be delivered to applications. However, efficient job submission and management continue being far from accessible to ordinary scientists and engineers due to their dynamic and complex nature. This paper describes a new Globus based framework that allows an easier and more efficient execution of jobs in a ‘submit and forget’ fashion. The framework automatically performs the steps involved in job submission and also watches over its efficient execution. In order to obtain a reasonable degree of performance, job execution is adapted to dynamic resource conditions and application demands. Adaptation is achieved by supporting automatic application migration following performance degradation, ‘better’ resource discovery, requirement change, owner decision or remote resource failure. The framework is currently functional on any Grid testbed based on Globus because it does not require new system software to be installed in the resources. The paper also includes practical experiences of the behavior of our framework on the TRGP and UCM‐CAB testbeds. Copyright © 2004 John Wiley & Sons, Ltd.  相似文献   

Pilot-job systems emerged as a computation paradigm to cope with the heterogeneity of large-scale production grids, greatly reducing fault ratios and middleware overheads. They are now widely adopted to sustain the computation of scientific applications on such platforms. However, a model of pilot-job systems is still lacking, making it difficult to build realistic experimental setups for their study (e.g. simulators or controlled platforms). The variability of production conditions, background loads and resource characteristics further complicate this issue. This paper presents a model of pilot-job resource provisioning. Based on a probabilistic modeling of pilot submission and registration, the number of pilots registered to the application host and the makespan of a divisible-load application are derived. The model takes into account job failures and it does not make any assumption on the characteristics of the computing resources, on the scheduling algorithm or on the background load. Only a minimally invasive monitoring of the grid is required. The model is evaluated in production conditions, using logs acquired on a pilot-job server deployed in the biomed virtual organization of the European Grid Infrastructure. Experimental results show that the model is able to accurately describe the number of registered pilots along time periods ranging from a few hours to a few days and in different pilot submission conditions.  相似文献   

网格应用程序执行时间预测系统   总被引:2,自引:0,他引:2  
为了合理利用网格资源和改进应用程序的执行性能,需要对应用程序的执行时间进行实时预测,为任务调度系统和网格用户提供调度依据.本文工作建立了执行时间预测系统,周期性的产生预测信息并转换为统一格式注册到网格信息服务中.本文使用资源映射方法预测应用程序执行时间,并且设计一组实验测试系统性能.实验结果表明,本系统能够低开销,灵敏地预测执行时间,并且预测误差较小.  相似文献   

网格环境下的集群系统作业管理研究   总被引:2,自引:4,他引:2  
网格计算已经逐渐形成一个重要的新领域。相对于传统的分布式计算,它的显著之处在于它能够共享网络上的各种资源,包括地理上分布的各种计算资源。PBS是广泛应用于并行计算机的作业管理系统,它可以按照用户定义的配置参数相对公平地为每个作业分配系统资源。但是在网格环境范围内对集群系统进行管理仍然是一门有待研究的课题。利用网格系统软件和集群系统管理软件,实现了一种在网格环境下对集群系统作业进行管理的方法。  相似文献   

网格计算市场模型是把经济学的概念应用到网格资源管理和调度的模型。基于计算市场模型的网格资源管理系统借鉴人类社会竞争的市场调节机制,根据用户的经济需求进行资源管理与任务调度,不仅使资源所有者和资源消费者都能实现各自的经济目标,而且使资源消费者使用轻负栽和廉价的资源,达到整个网格资源整体的全局最优及合理利用。  相似文献   

We present algorithms, methods, and software for a Grid resource manager, that performs resource brokering and job scheduling in production Grids. This decentralized broker selects computational resources based on actual job requirements, job characteristics, and information provided by the resources, with the aim to minimize the total time to delivery for the individual application. The total time to delivery includes the time for program execution, batch queue waiting, and transfer of executable and input/output data to and from the resource. The main features of the resource broker include two alternative approaches to advance reservations, resource selection algorithms based on computer benchmark results and network performance predictions, and a basic adaptation facility. The broker is implemented as a built-in component of a job submission client for the NorduGrid/ARC middleware.  相似文献   

Data Grid integrates graphically distributed resources for solving data intensive scientific applications. Effective scheduling in Grid can reduce the amount of data transferred among nodes by submitting a job to a node, where most of the requested data files are available. Scheduling is a traditional problem in parallel and distributed system. However, due to special issues and goals of Grid, traditional approach is not effective in this environment any more. Therefore, it is necessary to propose methods specialized for this kind of parallel and distributed system. Another solution is to use a data replication strategy to create multiple copies of files and store them in convenient locations to shorten file access times. To utilize the above two concepts, in this paper we develop a job scheduling policy, called hierarchical job scheduling strategy (HJSS), and a dynamic data replication strategy, called advanced dynamic hierarchical replication strategy (ADHRS), to improve the data access efficiencies in a hierarchical Data Grid. HJSS uses hierarchical scheduling to reduce the search time for an appropriate computing node. It considers network characteristics, number of jobs waiting in queue, file locations, and disk read speed of storage drive at data sources. Moreover, due to the limited storage capacity, a good replica replacement algorithm is needed. We present a novel replacement strategy which deletes files in two steps when free space is not enough for the new replica: first, it deletes those files with minimum time for transferring. Second, if space is still insufficient then it considers the last time the replica was requested, number of access, size of replica and file transfer time. The simulation results show that our proposed algorithm has better performance in comparison with other algorithms in terms of job execution time, number of intercommunications, number of replications, hit ratio, computing resource usage and storage usage.  相似文献   

The Data Grid provides massive aggregated computing resources and distributed storage space to deal with data-intensive applications. Due to the limitation of available resources in the grid as well as production of large volumes of data, efficient use of the Grid resources becomes an important challenge. Data replication is a key optimization technique for reducing access latency and managing large data by storing data in a wise manner. Effective scheduling in the Grid can reduce the amount of data transferred among nodes by submitting a job to a node where most of the requested data files are available. In this paper two strategies are proposed, first a novel job scheduling strategy called Weighted Scheduling Strategy (WSS) that uses hierarchical scheduling to reduce the search time for an appropriate computing node. It considers the number of jobs waiting in a queue, the location of the required data for the job and the computing capacity of the sites Second, a dynamic data replication strategy, called Enhanced Dynamic Hierarchical Replication (EDHR) that improves file access time. This strategy is an enhanced version of the Dynamic Hierarchical Replication strategy. It uses an economic model for file deletion when there is not enough space for the replica. The economic model is based on the future value of a data file. Best replica placement plays an important role for obtaining maximum benefit from replication as well as reducing storage cost and mean job execution time. So, it is considered in this paper. The proposed strategies are implemented by OptorSim, the European Data Grid simulator. Experiment results show that the proposed strategies achieve better performance by minimizing the data access time and avoiding unnecessary replication.  相似文献   

We present a framework for a parallel programming model by remote procedure calls, which bridge large-scale computing resource pools managed by multiple Grid-enabled job scheduling systems. With this system, the user can exploit not only remote servers and clusters, but also the computing resources provided by Grid-enabled job scheduling systems located on different sites. This framework requires a Grid remote procedure call (RPC) system to decouple the computation in a remote node from the Grid RPC mechanism and uses document-based communication rather than connection-based communication. We implemented the proposed framework as an extension of the OmniRPC system, which is a Grid RPC system for parallel programming. We designed a general interface to easily adapt the OmniRPC system to various Grid-enabled job scheduling systems, including XtremWeb, CyberGRIP, Condor and Grid Engine. We show the preliminary performance of these implementations using a phylogenetic application. We found that the proposed system can achieve approximately the same performance as OmniRPC and can handle interruptions in worker programs on remote nodes. Yoshihiro Nakajima is a Research Fellow of the Japan Society for the Promotion of Science  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号