首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 640 毫秒
1.
为了提高云作业调度的成功率,保证用户的服务质量,提出了一种基于QoS模型感知的云作业调度算法。首先从作业分为大作业和小作业两种,小作业直接在私有云平台中完成调度,然后将大作业在混合云中台上进行调度,并通过权值较小的子任务优选部署到公有云上,最后采用仿真测试实验检验算法的性能。仿真结果表明,相对于其他云作业调度算法,该算法提高了作业调度的成功率,缩短了作业的实际完工时间,获得了满足用户服务质量的云作业调度结果。  相似文献   

2.
Computational grid provides a wide distributed platform for high‐end compute intensive applications. Grid scheduling is often carried out to schedule the submitted jobs on the nodes of the grid so that some characteristic parameter is optimized. Availability of the computational nodes is one of the important characteristic parameters and measures the probability of the node availability for job execution. This paper addresses the availability of the grid computational nodes for the job execution and proposes a model to maximize it. As such, the task scheduling problem in grid is nondeterministic polynomial‐time hard, and often, metaheuristics techniques are applied to solve it. Genetic algorithm, a metaheuristic technique based on evolutionary computation, has been used to solve such complex optimization problem. This work proposes a technique for the grid scheduling problem using genetic algorithm with the objective to maximize availability. Simulation experiment, to evaluate the performance of the proposed algorithm, is conducted, and results reveal the effectiveness of the model. A comparative study has also been performed. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

3.
云计算的作为分布式系统中的一种新的服务配置模式,鼓励研究人员在科学应用时探讨其利弊.云资源的动态变化给资源管理带来了很大的困难.在云计算环境中的调度工作中使用一个分割负载理论(DLT)设计有效的策略来最小化总的处理时间,处理器是负载均衡的,推导出一个封闭形式的解决方案将负载分段分配给每个处理器.以这种方式进行工作调度使得云提供商可以获得最大的服务效益并满足客户的服务质量(QoS)需求.最后,通过严格的仿真研究量化该策略的性能.  相似文献   

4.
Virtualization technology makes data centers more dynamic and easier to administrate. Today, cloud providers offer customers access to complex applications running on virtualized hardware. Nevertheless, big virtualized data centers become stochastic environments and the simplification on the user side leads to many challenges for the provider. He has to find cost-efficient configurations and has to deal with dynamic environments to ensure service level objectives (SLOs). We introduce a software solution that reduces the degree of human intervention to manage clouds. It is designed as a multi-agent system (MAS) and placed on top of the Infrastructure as a Service (IaaS) layer. Worker agents allocate resources, configure applications, check the feasibility of requests, and generate cost estimates. They are equipped with application specific knowledge allowing it to estimate the type and number of necessary resources. During runtime, a worker agent monitors the job and adapts its resources to ensure the specified quality of service—even in noisy clouds where the job instances are influenced by other jobs. They interact with a scheduler agent, which takes care of limited resources and does a cost-aware scheduling by assigning jobs to times with low costs. The whole architecture is self-optimizing and able to use public or private clouds. Building a private cloud needs to face the challenge to find a mapping of virtual machines (VMs) to hosts. We present a rule-based mapping algorithm for VMs. It offers an interface where policies can be defined and combined in a generic way. The algorithm performs the initial mapping at request time as well as a remapping during runtime. It deals with policy and infrastructure changes. An energy-aware scheduler and the availability of cheap resources provided by a spot market are analyzed. We evaluated our approach by building up an SaaS stack, which assigns resources in consideration of an energy function and that ensures SLOs of two different applications, a brokerage system and a high-performance computing software. Experiments were done on a real cloud system and by simulations.  相似文献   

5.
Grids facilitate creation of wide-area collaborative environment for sharing computing or storage resources and various applications. Inter-connecting distributed Grid sites through peer-to-peer routing and information dissemination structure (also known as Peer-to-Peer Grids) is essential to avoid the problems of scheduling efficiency bottleneck and single point of failure in the centralized or hierarchical scheduling approaches. On the other hand, uncertainty and unreliability are facts in distributed infrastructures such as Peer-to-Peer Grids, which are triggered by multiple factors including scale, dynamism, failures, and incomplete global knowledge.In this paper, a reputation-based Grid workflow scheduling technique is proposed to counter the effect of inherent unreliability and temporal characteristics of computing resources in large scale, decentralized Peer-to-Peer Grid environments. The proposed approach builds upon structured peer-to-peer indexing and networking techniques to create a scalable wide-area overlay of Grid sites for supporting dependable scheduling of applications. The scheduling algorithm considers reliability of a Grid resource as a statistical property, which is globally computed in the decentralized Grid overlay based on dynamic feedbacks or reputation scores assigned by individual service consumers mediated via Grid resource brokers. The proposed algorithm dynamically adapts to changing resource conditions and offers significant performance gains as compared to traditional approaches in the event of unsuccessful job execution or resource failure. The results evaluated through an extensive trace driven simulation show that our scheduling technique can reduce the makespan up to 50% and successfully isolate the failure-prone resources from the system.  相似文献   

6.
A hybrid cloud integrates private clouds and public clouds into one unified environment. For the economy and the efficiency reasons, the hybrid cloud environment should be able to automatically maximize the utilization rate of the private cloud and minimize the cost of the public cloud when users submit their computing jobs to the environment. In this paper, we propose the Adaptive-Scheduling-with-QoS-Satisfaction algorithm, namely AsQ, for the hybrid cloud environment to raise the resource utilization rate of the private cloud and to diminish task response time as much as possible. We exploit runtime estimation and several fast scheduling strategies for near-optimal resource allocation, which results in high resource utilization rate and low execution time in the private cloud. Moreover, the near-optimal allocation in the private cloud can reduce the amount of tasks that need to be executed on the public cloud to satisfy their deadline. For the tasks that have to be dispatched to the public cloud, we choose the minimal cost strategy to reduce the cost of using public clouds based on the characteristics of tasks such as workload size and data size. Therefore, the AsQ can achieve a total optimization regarding cost and deadline constraints. Many experiments have been conducted to evaluate the performance of the proposed AsQ. The results show that the performance of the proposed AsQ is superior to recent similar algorithms in terms of task waiting time, task execution time and task finish time. The results also show that the proposed algorithm achieves a better QoS satisfaction rate than other similar studies.  相似文献   

7.
This paper presents the results and experiences of adapting and improving the Many-Task Computing (MTC) framework Kestrel for use with bag of tasks applications and the STAR experiment in particular. Kestrel is a lightweight, highly available job scheduling framework for Virtual Organization Clusters (VOCs) constructed in the cloud. Kestrel uses the Extensible Message and Presence Protocol (XMPP) for increasing MTC platform scalability and mitigating faults in Wide Area Network (WAN) communications. Kestrel’s architecture is based upon pilot job frameworks used extensively in Grid computing, with fault-tolerant communications inspired by command-and-control botnets. The extensibility of XMPP has allowed development of protocols for identifying manager nodes, discovering the capabilities of worker agents, and for distributing tasks. Presence notifications provided by XMPP allow Kestrel to monitor the global state of the pool and to perform task dispatching based on worker availability. Since its inception, Kestrel has been modified based on its performance managing operational scientific workloads from the STAR group at Brookhaven National Laboratories. STAR provided a virtual machine image with applications for simulating proton collisions using PYTHIA and GEANT3. A Kestrel-based Virtual Organization Cluster, created on top of Clemson University’s Palmetto cluster, CERN, and Amazon EC2, was able to provide over 400,000 CPU hours of computation over the course of a month using an average of 800 virtual machine instances every day, generating nearly seven terabytes of data and the largest PYTHIA production run that STAR has achieved to date.  相似文献   

8.
The objective of this paper is a study of minimizing the maximum completion time min F max, or cycle time of the last job of a given family of jobs using flow shop heuristic scheduling techniques. Three methods are presented: minimize idle time (MIT); Campbell, Dudek and Smith (CDS); and Palmer. An example problem with ten jobs and five machines is used to compare results of these methods. A deterministic t-timed colored Petri net model has been developed for scheduling problem. An execution of the deterministic timed Petri net allows to compute performance measures by applying graph traversing algorithms starting from initial global state and going into a desirable final state(s) of the production system. The objective of the job scheduling policy is minimizing the cycle time of the last job scheduled in the pipeline of a given family of jobs. Three heuristic scheduling methods have been implemented. First, a sub-optimal sequence of jobs to be scheduled is generated. Second, a Petri net-based simulator with graphical user interface to monitor execution of the sequence of tasks on machines is dynamically designed. A deterministic t-timed colored Petri net model has been developed and implemented for flexible manufacturing systems (FMS). An execution of the deterministic timed Petri net into a reachability graph allows to compute performance measures by applying graph traversing algorithms starting from initial global state to a desirable final state(s) of the production system.  相似文献   

9.
The effectiveness of distributed execution of computationally intensive applications (jobs) largely depends on the quality of the applied scheduling approach. However, most of the existing non-trivial scheduling algorithms rely on prior knowledge or on prediction of application parameters, such as execution time, size of input and output, dependencies, etc., to assign applications to the available computational resources. A major issue is that these parameters are hard to determine in advance, especially if the end user does not possess an extensive history of previous application runs. In this work we propose an online method for execution time prediction of applications, for which execution progress can be collected at run-time. Using dynamic progress information, the total job execution time can be predicted using extrapolation. However, the predictions achieved by extrapolation are far from precise and often vary over time as a result of changing application dynamics and varying resource load. Therefore, to compute the actual job execution time we match a number of predefined prediction evolution models against the consecutive extrapolations, by adopting nonlinear curve-fitting. The ??best-fit?? coefficients allow for more accurate execution time prediction. The predictions made are used to enhance a dynamic scheduling algorithm for workflows introduced in our earlier work. The scheduling algorithm is run with and without curve-fitting, showing a performance improvement of up to 15% in the former case.  相似文献   

10.
可靠的网格作业调度机制   总被引:1,自引:1,他引:0  
陶永才  石磊 《计算机应用》2010,30(8):2066-2069
针对网格环境的动态性特征,提出了一种可靠的网格作业调度机制(DGJS)。按照作业完成时间期限,DGJS将作业分为:高QoS级、低QoS级和无QoS级,不同QoS级作业有不同的调度优先权;基于资源可用性预测,DGJS采用基于可靠性代价的作业调度策略,将作业尽可能调度到可靠性高的资源节点;另外,DGJS对不同QoS级作业采用不同的容错策略,在保证故障容错的同时,节省网格资源。实验表明:在动态的网格环境下,较之传统的网格作业调度算法,DGJS提高了作业成功率,减少了作业完成时间。  相似文献   

11.
In Grids scheduling decisions are often made on the basis of jobs being either data or computation intensive: in data intensive situations jobs may be pushed to the data and in computation intensive situations data may be pulled to the jobs. This kind of scheduling, in which there is no consideration of network characteristics, can lead to performance degradation in a Grid environment and may result in large processing queues and job execution delays due to site overloads. In this paper we describe a Data Intensive and Network Aware (DIANA) meta-scheduling approach, which takes into account data, processing power and network characteristics when making scheduling decisions across multiple sites. Through a practical implementation on a Grid testbed, we demonstrate that queue and execution times of data-intensive jobs can be significantly improved when we introduce our proposed DIANA scheduler. The basic scheduling decisions are dictated by a weighting factor for each potential target location which is a calculated function of network characteristics, processing cycles and data location and size. The job scheduler provides a global ranking of the computing resources and then selects an optimal one on the basis of this overall access and execution cost. The DIANA approach considers the Grid as a combination of active network elements and takes network characteristics as a first class criterion in the scheduling decision matrix along with computations and data. The scheduler can then make informed decisions by taking into account the changing state of the network, locality and size of the data and the pool of available processing cycles.  相似文献   

12.
随着新型基础设施建设(新基建)的加速,云计算将获得新的发展契机.数据中心作为云计算的基础设施,其内部服务器不断升级换代,这造成计算资源的异构化.如何在异构云环境下,对作业进行高效调度是当前的研究热点之一.针对异构云环境多目标优化调度问题,设计一种AHP定权的多目标强化学习作业调度方法.首先定义执行时间、平台运行能耗、成...  相似文献   

13.
A resource broker with a user-friendly interface for job submission developed on a platform constructed using the Globus toolkit is proposed. The broker employs a domain-based network information model and dynamic version to measure network statuses, and also monitors and collects resource statuses and network-related information as the basis of its brokerage. A network bandwidth-aware job scheduling algorithm for brokering suitable Grid resources to communication-intensive jobs based on improving and preserving the advantages of our previously developed network information model is also proposed. Using timely information, the resource broker effectively matches Grid resources and user requests, thus improving job execution efficiency.  相似文献   

14.
当今云计算环境下,Hadoop已经成为大数据处理的事实标准。然而云计算具有大规模、高复杂和动态性的特点,容易导致故障的发生,影响Hadoop上运行的作业。虽然Hadoop具有内置的故障检测和恢复机制,但云环境中不同节点负载大小的变化,被调度的作业仍然导致失败。针对此问题提出自响应故障感知的检测调度方法,对异构环境负载能力的不同,而做出服务器快节点和慢节点的判断,把作业分配调度到合适的节点上执行,调整任务决策来尽可能的防止任务失败的发生。最后在Hadoop框架下与基本调度器进行实验性能比较,结果显示该方法减少作业失败率最高达19%,并缩短了作业执行时间,同时也减少CPU和内存的使用。  相似文献   

15.
The complexity, heterogeneity, device mobility and the unpredictable user behavior demands proper automation of monitoring activity in the wireless Grid to enable the user needs. Since the wireless devices can dynamically join/leave the Grid, and its state may be affected by various parameters (like the battery power, signal strength, the number of jobs submitted to it, device mobility, etc.) leading to overload state, it is essential to monitor the devices so that long term resource planning can be achieved. This paper proposes a Wireless Grid Monitoring Model using Agents (WiGriMMA) that monitor the device mobility and state, communicates the state to Grid information server (GIS), provides the resource availability information, controls the selfish users and the device state so that the device is not overloaded. The model is simulated to test its operation effectiveness considering the performance parameters such as resource availability, resource stability, device state, job execution rate, user behavior and agent overhead. The results show that the proposed WiGriMMA performs better than the existing Grid monitoring model (GridView) in terms of the resource availability, device states and the job execution rate.  相似文献   

16.
The goal of Grid computing is to integrate the usage of computer resources from cooperating partners in the form of Virtual Organizations (VO). One of its key functions is to match jobs to execution resources efficiently. For interoperability between VOs, this matching operation occurs in resource brokering middleware, commonly referred to as the meta-scheduler or meta-broker. In this paper, we present an approach to a meta-scheduler architecture, combining hierarchical and peer-to-peer models for flexibility and extensibility. Interoperability is further promoted through the introduction of a set of protocols, allowing meta-schedulers to maintain sessions and exchange job and resource state using Web Services. Our architecture also incorporates a resource model that enables an efficient resource matching across multiple Virtual Organizations, especially where the compute resources and state are dynamic. Experiments demonstrate these new functional features across three distributed organizations (BSC, FIU, and IBM), that internally use different job scheduling technologies, computing infrastructure and security mechanisms. Performance evaluations through actual system measurements and simulations provide the insights on the architecture’s effectiveness and scalability.  相似文献   

17.
In this paper we address a multicriteria scheduling problem for computational Grid systems. We focus on the two-level hierarchical Grid scheduling problem, in which at the first level (the Grid level) a Grid broker makes scheduling decisions and allocates jobs to Grid nodes. Jobs are then sent to the Grid nodes, where local schedulers generate local schedules for each node accordingly. A general approach is presented taking into account preferences of all the stakeholders of Grid scheduling (end-users, Grid administrators, and local resource providers) and assuming a lack of knowledge about job time characteristics. A single-stakeholder, single-criterion version of the approach has been compared experimentally with the existing approaches.  相似文献   

18.
针对混合云调度中私有云利用率不高和公有云费用偏高的问题,基于性能和费用目标提出了两个调度策略—截止时间优先和费用优先策略,建立了混合云中的任务和资源模型,能够根据用户提交的任务需求自适应选择合适的调度资源,对截止时间要求比较高的任务可以优先调度至公有云,对费用要求高的任务可以优先调度至私有云,而且两种策略均满足截止时间和一定的费用约束,因此相对于其它类似的基准调度方法,本文的两种调度策略在调度完成时间、费用、截止时间超出率和私有云利用率等方面均有很好的表现,尤其是当任务量比较大的时候,两种调度策略表现出更好的自适应性和优势。  相似文献   

19.
Large-scale computation is frequently limited to the performance of computer hardware or associated cost. However, as the development of information and network technologies thrives, idle computers all over the world can be utilized and organized to enhance overall computation performance; that is, Grid environments that facilitate distributed computation. Hence, the dispatching and scheduling of tasks should be considered as an important issue. Previous studies have demonstrated Grid environments that are composed of idled computers around the globe and are categorized as a type of Heterogeneous Computing (HC). However, scheduling heuristics currently applied to HC focus on the search of minimum makespan, instead of the reduction of cost. In addition, relevant studies usually presume that HC is based on high-speed bandwidth and the communication time is ignored. Further, in response to the call for user-pay policy, as a user dispatches a job to a Grid environment for computation, each execution task would be charged. It is difficult to estimate a job will be dispatched to which and how many computers; it is impossible to predetermine scheduling heuristic which is proposed in previous studies will result in the optimal makespan, and mention actual cost and risk. Therefore, this study proposes ATCS-MCT (Apparent Tardiness Cost Setups-Minimum Completion Time) scheduling algorithm that composes of execution time, weight, due date, and communication time factors to testify that the ATCS-MCT scheduling algorithm not only achieves better makespan than Min–min scheduling heuristics do but also reduces costs.  相似文献   

20.
Grid computing is a largely adopted paradigm to federate geographically distributed data centers. Due to their size and complexity, grid systems are often affected by failures that may hinder the correct and timely execution of jobs, thus causing a non-negligible waste of computing resources. Despite the relevance of the problem, state-of-the-art management solutions for grid systems usually neglect the identification and handling of failures at runtime. Among the primary goals to be considered, we claim the need for novel approaches capable to achieve the objectives of scalable integration with efficient monitoring solutions and of fitting large and geographically distributed systems, where dynamic and configurable tradeoffs between overhead and targeted granularity are necessary. This paper proposes GAMESH, a Grid Architecture for scalable Monitoring and Enhanced dependable job ScHeduling. GAMESH is conceived as a completely distributed and highly efficient management infrastructure, concentrating on two crucial aspects for large-scale and multi-domain grid environments: (i) the scalable dissemination of monitoring data and (ii) the troubleshooting of job execution failures. GAMESH has been implemented and tested in a real deployment encompassing geographically distributed data centers across Europe. Experimental results show that GAMESH (i) enables the collection of measurements of both computing resources and conditions of task scheduling at geographically sparse sites, while imposing a limited overhead on the entire infrastructure, and (ii) provides a failure-aware scheduler able to improve the overall system performance, even in the presence of failures, by coordinating local job schedulers at multiple domains.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号