首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 9 毫秒
1.
Assembling and simultaneously using different types of distributed computing infrastructures (DCI) like Grids and Clouds is an increasingly common situation. Because infrastructures are characterized by different attributes such as price, performance, trust, and greenness, the task scheduling problem becomes more complex and challenging. In this paper we present the design for a fault-tolerant and trust-aware scheduler, which allows to execute Bag-of-Tasks applications on elastic and hybrid DCI, following user-defined scheduling strategies. Our approach, named Promethee scheduler, combines a pull-based scheduler with multi-criteria Promethee decision making algorithm. Because multi-criteria scheduling leads to the multiplication of the possible scheduling strategies, we propose SOFT, a methodology that allows to find the optimal scheduling strategies given a set of application requirements. The validation of this method is performed with a simulator that fully implements the Promethee scheduler and recreates an hybrid DCI environment including Internet Desktop Grid, Cloud and Best Effort Grid based on real failure traces. A set of experiments shows that the Promethee scheduler is able to maximize user satisfaction expressed accordingly to three distinct criteria: price, expected completion time and trust, while maximizing the infrastructure useful employment from the resources owner point of view. Finally, we present an optimization which bounds the computation time of the Promethee algorithm, making realistic the possible integration of the scheduler to a wide range of resource management software.  相似文献   

2.
Workflow applications are a popular paradigm used by scientists for modelling applications to be run on heterogeneous high-performance parallel and distributed computing systems. Today, the increase in the number and heterogeneity of multi-core parallel systems facilitates the access to high-performance computing to almost every scientist, yet entailing additional challenges to be addressed. One of the critical problems today is the power required for operating these systems for both environmental and financial reasons. To decrease the energy consumption in heterogeneous systems, different methods such as energy-efficient scheduling are receiving increasing attention. Current schedulers are, however, based on simplistic energy models not matching the reality, use techniques like DVFS not available on all types of systems, or do not approach the problem as a multi-objective optimisation considering both performance and energy as simultaneous objectives. In this paper, we present a new Pareto-based multi-objective workflow scheduling algorithm as an extension to an existing state-of-the-art heuristic capable of computing a set of tradeoff optimal solutions in terms of makespan and energy efficiency. Our approach is based on empirical models which capture the real behaviour of energy consumption in heterogeneous parallel systems. We compare our new approach with a classical mono-objective scheduling heuristic and state-of-the-art multi-objective optimisation algorithm and demonstrate that it computes better or similar results in different scenarios. We analyse the different tradeoff solutions computed by our algorithm under different experimental configurations and we observe that in some cases it finds solutions which reduce the energy consumption by up to 34.5% with a slight increase of 2% in the makespan.  相似文献   

3.
Large scale distributed systems typically comprise hundreds to millions of entities (applications, users, companies, universities) that have only a partial view of resources (computers, communication links). How to fairly and efficiently share such resources between entities in a distributed way has thus become a critical question.  相似文献   

4.
Automation of the execution of computational tasks is at the heart of improving scientific productivity. Over the last years, scientific workflows have been established as an important abstraction that captures data processing and computation of large and complex scientific applications. By allowing scientists to model and express entire data processing steps and their dependencies, workflow management systems relieve scientists from the details of an application and manage its execution on a computational infrastructure. As the resource requirements of today’s computational and data science applications that process vast amounts of data keep increasing, there is a compelling case for a new generation of advances in high-performance computing, commonly termed as extreme-scale computing, which will bring forth multiple challenges for the design of workflow applications and management systems. This paper presents a novel characterization of workflow management systems using features commonly associated with extreme-scale computing applications. We classify 15 popular workflow management systems in terms of workflow execution models, heterogeneous computing environments, and data access methods. The paper also surveys workflow applications and identifies gaps for future research on the road to extreme-scale workflows and management systems.  相似文献   

5.
6.
Factory management plays an important role in improving the productivity and quality of service in the production process. In particular, the distributed permutation flow shop scheduling problem with multiple factories is considered a priority factor in the factory automation. This study proposes a novel model of the developed distributed scheduling by supplementing the reentrant characteristic into the model of distributed reentrant permutation flow shop (DRPFS) scheduling. This problem is described as a given set of jobs with a number of reentrant layers is processed in the factories, which compromises a set of machines, with the same properties. The aim of the study is to determine the number of factory needs to be used, jobs assignment to certain factory and sequence of job assigned to the factory in order to simultaneously satisfy three objectives of minimizing makespan, total cost and average tardiness. To do this, a novel multi-objective adaptive large neighborhood search (MOALNS) algorithm is developed for finding the near optimal solutions based on the Pareto front. Various destroy and repair operators are presented to balance between intensification and diversification of searching process. The numerical examples of computational experiments are carried out to validate the proposed model. The analytical results on the performance of proposed algorithm are checked and compared with the existing methods to validate the effectiveness and robustness of the proposed potential algorithm in handling the DRPFS problem.  相似文献   

7.
资源调度问题一直是云计算环境下的热点研究问题,然而当前的大部分研究都集中在满足用户的时间或成本需求上,很少考虑用户在调度过程中对安全的需求。针对这一问题,在对常见的云环境下工作流任务的资源调度问题进行建模的基础上,提出了一个安全约束模型,并使用变近邻粒子群算法对该问题进行了求解。最后在CloudSim仿真平台上,用最大 最小蚁群算法和遗传算法与该算法进行了对比,实验结果表明,该算法具有很好的可用性和寻优能力。关键词:  相似文献   

8.
Volunteer computing systems offer high computing power to the scientific communities to run large data intensive scientific workflows. However, these computing environments provide the best effort infrastructure to execute high performance jobs. This work aims to schedule scientific and data intensive workflows on hybrid of the volunteer computing system and Cloud resources to enhance the utilization of these environments and increase the percentage of workflow that meets the deadline. The proposed workflow scheduling system partitions a workflow into sub-workflows to minimize data dependencies among the sub-workflows. Then these sub-workflows are scheduled to distribute on volunteer resources according to the proximity of resources and the load balancing policy. The execution time of each sub-workflow on the selected volunteer resources is estimated in this phase. If any of the sub-workflows misses the sub-deadline due to the large waiting time, we consider re-scheduling of this sub-workflow into the public Cloud resources. This re-scheduling improves the system performance by increasing the percentage of workflows that meet the deadline. The proposed Cloud-aware data intensive scheduling algorithm increases the percentage of workflow that meet the deadline with a factor of 75% in average with respect to the execution of workflows on the volunteer resources.  相似文献   

9.
Unpredictable fluctuations in resource availability often lead to rescheduling decisions that sacrifice a success rate of job completion in batch job scheduling. To overcome this limitation, we consider the problem of assigning a set of sequential batch jobs with demands to a set of resources with constraints such as heterogeneous rescheduling policies and capabilities. The ultimate goal is to find an optimal allocation such that performance benefits in terms of makespan and utilization are maximized according to the principle of Pareto optimality, while maintaining the job failure rate close to an acceptably low bound. To this end, we formulate a multihybrid policy decision problem (MPDP) on the primary-backup fault tolerance model and theoretically show its NP-completeness. The main contribution is to prove that our multihybrid job scheduling (MJS) scheme confidently guarantees the fault-tolerant performance by adaptively combining jobs and resources with different rescheduling policies in MPDP. Furthermore, we demonstrate that the proposed MJS scheme outperforms the five rescheduling heuristics in solution quality, searching adaptability and time efficiency by conducting a set of extensive simulations under various scheduling conditions.  相似文献   

10.
闫歌  于炯  杨兴耀 《计算机应用》2014,34(3):673-677
经过对已有云工作流调度算法中可靠性问题进行分析研究,针对一些算法在任务调度过程中只考虑提高整个工作流的可靠性而牺牲了时间或增加花费的问题,结合云计算的特点,提出一种基于可靠性的工作流调度策略。该策略结合了工作流中任务的可靠性,充分考虑任务的优先顺序并结合复制的思想,在减少传输过程失败率的同时降低传输时间,使整个工作流在降低完成时间的同时,提高整体可靠性。通过实验和分析表明,通过该策略云工作流在不同任务数和通信运算比(CCR)的可靠性比异态最早结束时间算法(HEFT)算法及其改进算法--SHEFTEX都有所提升,完成时间比HEFT算法有所减少。  相似文献   

11.
The ongoing increase of energy consumption by IT infrastructures forces data center managers to find innovative ways to improve energy efficiency. The latter is also a focal point for different branches of computer science due to its financial, ecological, political, and technical consequences. One of the answers is given by scheduling combined with dynamic voltage scaling technique to optimize the energy consumption. The way of reasoning is based on the link between current semiconductor technologies and energy state management of processors, where sacrificing the performance can save energy.This paper is devoted to investigate and solve the multi-objective precedence constrained application scheduling problem on a distributed computing system, and it has two main aims: the creation of general algorithms to solve the problem and the examination of the problem by means of the thorough analysis of the results returned by the algorithms.The first aim was achieved in two steps: adaptation of state-of-the-art multi-objective evolutionary algorithms by designing new operators and their validation in terms of performance and energy. The second aim was accomplished by performing an extensive number of algorithms executions on a large and diverse benchmark and the further analysis of performance among the proposed algorithms. Finally, the study proves the validity of the proposed method, points out the best-compared multi-objective algorithm schema, and the most important factors for the algorithms performance.  相似文献   

12.
为了实现任务执行效率与执行代价的同步优化,提出了一种云计算环境中的DAG任务多目标调度优化算法。算法将多目标最优化问题以满足Pareto最优的均衡最优解集合的形式进行建模,以启发式方式对模型进行求解;同时,为了衡量多目标均衡解的质量,设计了基于hypervolume方法的评估机制,从而可以得到相互冲突目标间的均衡调度解。通过配置云环境与三种人工合成工作流和两种现实科学工作流的仿真实验测试,结果表明,比较同类单目标算法和多目标启发式算法,算法不仅求解质量更高,而且解的均衡度更好,更加符合现实云的资源使用特征与工作流调度模式。  相似文献   

13.
Distributed computing infrastructures are commonly used through scientific gateways, but operating these gateways requires important human intervention to handle operational incidents. This paper presents a self-healing process that quantifies incident degrees of workflow activities from metrics measuring long-tail effect, application efficiency, data transfer issues, and site-specific problems. These metrics are simple enough to be computed online and they make little assumptions on the application or resource characteristics. From their degree, incidents are classified in levels and associated to sets of healing actions that are selected based on association rules modeling correlations between incident levels. We specifically study the long-tail effect issue, and propose a new algorithm to control task replication. The healing process is parametrized on real application traces acquired in production on the European Grid Infrastructure. Experimental results obtained in the Virtual Imaging Platform show that the proposed method speeds up execution up to a factor of 4, consumes up to 26% less resource time than a control execution and properly detects unrecoverable errors.  相似文献   

14.
15.
Today, almost everyone is connected to the Internet and uses different Cloud solutions to store, deliver and process data. Cloud computing assembles large networks of virtualized services such as hardware and software resources. The new era in which ICT penetrated almost all domains (healthcare, aged-care, social assistance, surveillance, education, etc.) creates the need of new multimedia content-driven applications. These applications generate huge amount of data, require gathering, processing and then aggregation in a fault-tolerant, reliable and secure heterogeneous distributed system created by a mixture of Cloud systems (public/private), mobile devices networks, desktop-based clusters, etc. In this context dynamic resource provisioning for Big Data application scheduling became a challenge in modern systems. We proposed a resource-aware hybrid scheduling algorithm for different types of application: batch jobs and workflows. The proposed algorithm considers hierarchical clustering of the available resources into groups in the allocation phase. Task execution is performed in two phases: in the first, tasks are assigned to groups of resources and in the second phase, a classical scheduling algorithm is used for each group of resources. The proposed algorithm is suitable for Heterogeneous Distributed Computing, especially for modern High-Performance Computing (HPC) systems in which applications are modeled with various requirements (both IO and computational intensive), with accent on data from multimedia applications. We evaluate their performance in a realistic setting of CloudSim tool with respect to load-balancing, cost savings, dependency assurance for workflows and computational efficiency, and investigate the computing methods of these performance metrics at runtime.  相似文献   

16.
In distributed query processing systems, load balancing plays an important role in maximizing system throughput. When queries can leverage cached intermediate results, improving the cache hit ratio becomes as important as load balancing in query scheduling, especially when dealing with computationally expensive queries. The scheduling policies must be designed to take into consideration the dynamic contents of the distributed caching infrastructure. In this paper, we propose and discuss several distributed query scheduling policies that directly consider the available cache contents by employing distributed multidimensional indexing structures and an exponential moving average approach to predicting cache contents. These approaches are shown to produce better query plans and faster query response times than traditional scheduling policies that do not predict dynamic contents in distributed caches. We experimentally demonstrate the utility of the scheduling policies using MQO, which is a distributed, Grid-enabled, multiple query processing middleware system we developed to optimize query processing for data analysis and visualization applications.  相似文献   

17.
Scientific workflow orchestration interoperating HTC and HPC resources   总被引:1,自引:0,他引:1  
In this work we describe our developments towards the provision of a unified access method to different types of computing infrastructures at the interoperation level. For that, we have developed a middleware suite which bridges not interoperable middleware stacks used for building distributed computing infrastructures, UNICORE and gLite. Our solution allows to transparently access and operate on HPC and HTC resources from a single interface. Using Kepler as workflow manager, we provide users with the needed integration of codes to create scientific workflows accessing both types of infrastructures.  相似文献   

18.
Deadline-sensitive workflows require careful coordination of user constraints with resource availability. Current distributed resource access models provide varying degrees of resource control: from limited or none in grid batch systems to explicit in cloud systems. Additionally applications experience variability due to competing user loads, performance variations, failures, etc. These variations impact the quality of service (QoS) that goes unaccounted for in planning strategies. In this paper we propose Workflow ORchestrator for Distributed Systems (WORDS) architecture based on a least common denominator resource model that abstracts the differences and captures the QoS properties provided by grid and cloud systems. We investigate algorithms for effective orchestration (i.e., resource procurement and task mapping) for deadline-sensitive workflows atop the resource abstraction provided in WORDS. Our evaluation compares orchestration methodologies over TeraGrid and Amazon EC2 systems. Experimental results show that WORDS enables effective orchestration possible at reasonable costs on batch queue grid and cloud systems with or without explicit resource control.  相似文献   

19.
针对云计算环境下任务调度问题,为减少任务完工时间,同时降低任务执行费用,提出一种改进的基于多目标免疫系统的任务调度算法IMISA来寻找较优的可行分配方案。与传统分配适应度值不同,该算法将抗体群划分为非支配解集和支配解集,分别将非支配解的独立支配区域面积、支配解与所有非支配解所围成的多边形面积作为相应的抗体-抗原亲和力,根据相应亲和度计算克隆比例后克隆变异生成子代。在CloudSim平台上进行仿真实验,结果表明,与NSGA-Ⅱ及多目标免疫系统算法(MISA)相比,IMISA能够找到具有更短完工时间及更小的执行费用的调度方案,同时获得的Pareto解集也具有更好的分布性。  相似文献   

20.
GridMD is a C++ class library intended for constructing simulation applications and running them in distributed environments. The library abstracts away from details of distributed environments, so that almost no knowledge of distributed computing is required from a physicist working with the library. She or he just uses GridMD function calls inside the application C++ code to perform parameter sweeps or other tasks that can be distributed at run-time. In this paper we briefly review the GridMD architecture. We also describe the job manager component which submits jobs to a remote system. The C++ source code of our PBS job manager may be used as a standalone tool and it is freely available as well as the full library source code. As illustrative examples we use simple expression evaluation codes and the real application of Coulomb cluster explosion simulation by Molecular Dynamics.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号