期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Science in the Cloud: Allocation and Execution of Data-Intensive Scientific Workflows

Claudia Szabo Quan Z. Sheng Trent Kroeger Yihong Zhang Jian Yu 《Journal of Grid Computing》2014,12(2):245-264

An important challenge for the adoption of cloud computing in the scientific community remains the efficient allocation and execution of data-intensive scientific workflows to reduce execution time and the size of transferred data. The transferred data overhead is becoming significant with emerging scientific workflows that have input/output files and intermediate data products ranging in the hundreds of gigabytes. The allocation of scientific workflows on public clouds can be described through a variety of perspectives and parameters, and has been proved to be NP-complete. This paper proposes an evolutionary approach for task allocation on public clouds considering data transfer and execution time. In our framework, a solution is represented using an allocation chromosome that encodes the allocation of tasks to nodes, and an ordering chromosome that defines the execution order according to the scientific workflow representation. We propose a multi-objective optimization that relies on a cloud cost model and employs tailored evolution operators. Starting from a population of possible solutions, we employ crossover and mutation operators on both chromosomes aiming at optimizing the data transferred between nodes as well as the total workflow runtime. The crossover operators combine parts of solutions to reduce data overhead, whereas the mutation operators swamp between parts of the same chromosome according to pre-defined rules. Our experimental study compares between the proposed approach and current state-of-the art approaches using synthetic and real-life workflows. Our algorithm performs similarly to existing heuristics for small workflows and shows up to 80 % improvements for larger synthetic workflows. To further validate our approach we compare between the allocation and scheduling obtained by our approach with that obtained by popular scientific workflow managers, when real workflows with hundreds of tasks are executed on a public cloud. The results show a 10 % improvement in runtime over existing schedulers, caused by a 80 % reduction in transferred data and optimized allocation and ordering of tasks. This improved data locality has greater impact as it can be employed to improve and study data provenance and facilitate data persistence for scientific workflows. 相似文献

2.

允许违反局部时间约束的科学工作流调度策略

陈旺虎段菊俞茂义《计算机工程与科学》2016,38(11):2165-2171

提高科学工作流在云环境中的执行效率、降低执行费用受到广泛关注。用户期望的局部QoS约束与工作流的总体执行效率之间往往存在矛盾。针对该现象,在前期的研究基础上提出一种允许违反局部时间约束的科学工作流调度策略。通过对已聚簇的工作流任务集使用任务后向优先合并的方法,可实现任务间空闲时间片的合理利用,进而优化科学工作流的执行时间;另外,为充分利用任务的松弛时间,提高工作流的整体执行效率,允许部分任务的调度违反局部最晚完成时间的约束。实验结果表明,该策略能提前科学工作流的最早完成时间,提高处理机的利用率,并最终降低工作流的执行费用。相似文献

3.

A Provenance-based Adaptive Scheduling Heuristic for Parallel Scientific Workflows in Clouds

Daniel de Oliveira Kary A. C. S. Oca?a Fernanda Bai?o Marta Mattoso 《Journal of Grid Computing》2012,10(3):521-552

In the last years, scientific workflows have emerged as a fundamental abstraction for structuring and executing scientific experiments in computational environments. Scientific workflows are becoming increasingly complex and more demanding in terms of computational resources, thus requiring the usage of parallel techniques and high performance computing (HPC) environments. Meanwhile, clouds have emerged as a new paradigm where resources are virtualized and provided on demand. By using clouds, scientists have expanded beyond single parallel computers to hundreds or even thousands of virtual machines. Although the initial focus of clouds was to provide high throughput computing, clouds are already being used to provide an HPC environment where elastic resources can be instantiated on demand during the course of a scientific workflow. However, this model also raises many open, yet important, challenges such as scheduling workflow activities. Scheduling parallel scientific workflows in the cloud is a very complex task since we have to take into account many different criteria and to explore the elasticity characteristic for optimizing workflow execution. In this paper, we introduce an adaptive scheduling heuristic for parallel execution of scientific workflows in the cloud that is based on three criteria: total execution time (makespan), reliability and financial cost. Besides scheduling workflow activities based on a 3-objective cost model, this approach also scales resources up and down according to the restrictions imposed by scientists before workflow execution. This tuning is based on provenance data captured and queried at runtime. We conducted a thorough validation of our approach using a real bioinformatics workflow. The experiments were performed in SciCumulus, a cloud workflow engine for managing scientific workflow execution. 相似文献

4.

A security and cost aware scheduling algorithm for heterogeneous tasks of scientific workflow in clouds

《Future Generation Computer Systems》2016

Security is increasingly critical for various scientific workflows that are big data applications and typically take quite amount of time being executed on large-scale distributed infrastructures. Cloud computing platform is such an infrastructure that can enable dynamic resource scaling on demand. Nevertheless, based on pay-per-use and hourly-based pricing model, users should pay attention to the cost incurred by renting virtual machines (VMs) from cloud data centers. Meanwhile, workflow tasks are generally heterogeneous and require different instance series (i.e., computing optimized, memory optimized, storage optimized, etc.). In this paper, we propose a security and cost aware scheduling (SCAS) algorithm for heterogeneous tasks of scientific workflow in clouds. Our proposed algorithm is based on the meta-heuristic optimization technique, particle swarm optimization (PSO), the coding strategy of which is devised to minimize the total workflow execution cost while meeting the deadline and risk rate constraints. Extensive experiments using three real-world scientific workflow applications, as well as CloudSim simulation framework, demonstrate the effectiveness and practicality of our algorithm. 相似文献

5.

一种云环境下科学工作流执行计划的优化方法

郭宏乐陈旺虎马生俊李新田乔保民《计算机工程与科学》2019,41(3):433-439

为降低云环境下科学工作流的执行代价,提出了一种执行计划的优化方法。引入猴群算法,依靠对当前执行计划的层内和层间优化,在保证工作流全局截止时间约束的前提下,通过同层任务的逻辑聚合和任务的层间调整,尽可能减少各层任务数的差异,以避免资源的闲置浪费,缩短任务的等待时间。实验表明,该方法与类似研究相比,可降低资源消耗量,减小总的延迟时间。相似文献

6.

PSO-DS: a scheduling engine for scientific workflow managers

Israel Casas Javid Taheri Rajiv Ranjan Albert Y. Zomaya 《The Journal of supercomputing》2017,73(9):3924-3947

Cloud computing, an important source of computing power for the scientific community, requires enhanced tools for an efficient use of resources. Current solutions for workflows execution lack frameworks to deeply analyze applications and consider realistic execution times as well as computation costs. In this study, we propose cloud user–provider affiliation (CUPA) to guide workflow’s owners in identifying the required tools to have his/her application running. Additionally, we develop PSO-DS, a specialized scheduling algorithm based on particle swarm optimization. CUPA encompasses the interaction of cloud resources, workflow manager system and scheduling algorithm. Its featured scheduler PSO-DS is capable of converging strategic tasks distribution among resources to efficiently optimize makespan and monetary cost. We compared PSO-DS performance against four well-known scientific workflow schedulers. In a test bed based on VMware vSphere, schedulers mapped five up-to-date benchmarks representing different scientific areas. PSO-DS proved its efficiency by reducing makespan and monetary cost of tested workflows by 75 and 78%, respectively, when compared with other algorithms. CUPA, with the featured PSO-DS, opens the path to develop a full system in which scientific cloud users can run their computationally expensive experiments. 相似文献

7.

Cost-Time Efficient Scheduling Plan for Executing Workflows in the Cloud

Amandeep Verma Sakshi Kaushal 《Journal of Grid Computing》2015,13(4):495-506

The emergence of Cloud Computing as a model of service provisioning in distributed systems instigated researchers to explore its pros and cons on executing different large scale scientific applications, i.e., Workflows. One of the most challenging problems in clouds is to execute workflows while minimizing the execution time as well as cost incurred by using a set of heterogeneous resources over the cloud simultaneously. In this paper, we present, Budget and Deadline Constrained Heuristic based upon Heterogeneous Earliest Finish Time (HEFT) to schedule workflow tasks over the available cloud resources. The proposed heuristic presents a beneficial trade-off between execution time and execution cost under given constraints. The proposed heuristic is evaluated for different synthetic workflow applications by a simulation process and comparison is done with state-of-art algorithm i.e. BHEFT. The simulation results show that our proposed scheduling heuristic can significantly decrease the execution cost while producing makespan as good as the best known scheduling heuristic under the same deadline and budget constraints. 相似文献

8.

基于任务分配和数据集副本的科学工作流数据布局

尚蕾刘茜萍《计算机工程》2020,46(5):122-130,138

云环境下科学工作流的数据布局成为当前工作流研究中的一个热点问题,对科学工作流中任务和数据之间多对多关系进行分析,可以发现不同数据布局方案在数据传输上的费用各不相同,在很大程度上影响工作流的运行成本。为降低科学工作流数据集传输费用,提出一种基于任务分配和数据集副本的科学工作流数据布局方法。该方法从任务分配开始,在定量计算任务依赖度的基础上进行任务分配,根据分配结果给出一个基于数据集副本的两阶段数据布局方法,以实现科学工作流运行中传输费用的优化。实例结果表明,与工作流层方法相比,该方法可以有效降低科学工作流的运行成本。相似文献

9.

Cost optimal scheduling in IaaS for dependent workload with particle swarm optimization

Nuttapong Netjinda Booncharoen Sirinaovakul Tiranee Achalakul 《The Journal of supercomputing》2014,68(3):1579-1603

Optimizing cloud provisioning for scientific workflow applications is a challenging problem, since the workflows generally contain dependency between tasks and require specific deadlines. Usually, cloud providers offer many options to the consumers. These options include the number of virtual machines, the type of each virtual machine and the purchasing method for each machine. Currently, cloud provisioning cost optimization is an active research topic. Most of this literature is concerned with task scheduling, cloud option selection, and cloud option selection for scientific workflow applications. However, research that attempts to find solutions which cover both cloud option selection and workflow task scheduling is very limited. In this paper, we focus on optimizing the cost of purchasing infrastructure-as-a-service cloud capabilities to achieve scientific work flow execution within the specific deadlines. The proposed system considers the number of purchased instances, instance types, purchasing options, and task scheduling as constraints in an optimization process. Particle swarm optimization augmented with a variable neighborhood search technique is used to find the optimal solution. Our approach finds the configurations of purchasing options with the optimum budget for a specified workflow application based on the required performance. The solutions from the proposed system show promising performance from the perspectives of the total cost and fitness convergence when compared with other state-of-the-art algorithms. 相似文献

10.

Data-Locality Aware Scientific Workflow Scheduling Methods in HPC Cloud Environments

Jieun Choi Theodora Adufu Yoonhee Kim 《International journal of parallel programming》2017,45(5):1128-1141

Efficient data-aware methods in job scheduling, distributed storage management and data management platforms are necessary for successful execution of data-intensive applications. However, research about methods for data-intensive scientific applications are insufficient in large-scale distributed cloud and cluster computing environments and data-aware methods are becoming more complex. In this paper, we propose a Data-Locality Aware Workflow Scheduling (D-LAWS) technique and a locality-aware resource management method for data-intensive scientific workflows in HPC cloud environments. D-LAWS applies data-locality and data transfer time based on network bandwidth to scientific workflow task scheduling and balances resource utilization and parallelism of tasks at the node-level. Our method consolidates VMs and consider task parallelism by data flow during the planning of task executions of a data-intensive scientific workflow. We additionally consider more complex workflow models and data locality pertaining to the placement and transfer of data prior to task executions. We implement and validate the methods based on fairness in cloud environments. Experimental results show that, the proposed methods can improve performance and data-locality of data-intensive workflows in cloud environments. 相似文献

11.

A delay-based dynamic scheduling algorithm for bag-of-task workflows with stochastic task execution times in clouds

《Future Generation Computer Systems》2017

Bag-of-Tasks (BoT) workflows are widespread in many big data analysis fields. However, there are very few cloud resource provisioning and scheduling algorithms tailored for BoT workflows. Furthermore, existing algorithms fail to consider the stochastic task execution times of BoT workflows which leads to deadline violations and increased resource renting costs. In this paper, we propose a dynamic cloud resource provisioning and scheduling algorithm which aims to fulfill the workflow deadline by using the sum of task execution time expectation and standard deviation to estimate real task execution times. A bag-based delay scheduling strategy and a single-type based virtual machine interval renting method are presented to decrease the resource renting cost. The proposed algorithm is evaluated using a cloud simulator ElasticSim which is extended from CloudSim. The results show that the dynamic algorithm decreases the resource renting cost while guaranteeing the workflow deadline compared to the existing algorithms. 相似文献

12.

On-demand minimum cost benchmarking for intermediate dataset storage in scientific cloud workflow systems

Dong Yuan^{Author Vitae} Yun Yang Author VitaeXiao Liu Author Vitae Jinjun Chen Author Vitae 《Journal of Parallel and Distributed Computing》2011,71(2):316-332

Many scientific workflows are data intensive: large volumes of intermediate datasets are generated during their execution. Some valuable intermediate datasets need to be stored for sharing or reuse. Traditionally, they are selectively stored according to the system storage capacity, determined manually. As doing science on clouds has become popular nowadays, more intermediate datasets in scientific cloud workflows can be stored by different storage strategies based on a pay-as-you-go model. In this paper, we build an intermediate data dependency graph (IDG) from the data provenances in scientific workflows. With the IDG, deleted intermediate datasets can be regenerated, and as such we develop a novel algorithm that can find a minimum cost storage strategy for the intermediate datasets in scientific cloud workflow systems. The strategy achieves the best trade-off of computation cost and storage cost by automatically storing the most appropriate intermediate datasets in the cloud storage. This strategy can be utilised on demand as a minimum cost benchmark for all other intermediate dataset storage strategies in the cloud. We utilise Amazon clouds’ cost model and apply the algorithm to general random as well as specific astrophysics pulsar searching scientific workflows for evaluation. The results show that benchmarking effectively demonstrates the cost effectiveness over other representative storage strategies. 相似文献

13.

Scheduling large-scale scientific workflow on virtual machines with different numbers of vCPUs

Wu Hao Chen Xin Song Xiaoyu Zhang Chi Guo He 《The Journal of supercomputing》2021,77(1):679-710

With the wide deployment of cloud computing in scientific computing, cost minimization is increasingly critical for large-scale scientific workflow. Unfortunately, due to the highly intricate directed acyclic graph (DAG)-based workflow and the flexible usage of virtual machines (VMs) in cloud platform, the existing workflow scheduling approaches are inefficient to strike a balance between the parallelism and the topology of the DAG-based workflow while using the VMs, which causes a low utilization of VMs and consumes more cost. To address these issues, this paper presents a novel task scheduling framework named cost minimization approach with the DAG splitting method (COMSE) for minimizing the cost of running a deadline-constrained large-scale scientific workflow. First, we provide comprehensive theoretical analyses on how to improve the utilization of a resource-balanced multi-vCPU VM for running multiple tasks simultaneously. Second, considering the balance between the parallelism and the topology of a workflow, we simplify the DAG-based workflow, and based on the simplified DAG, a DAG splitting method is devised to preprocess the workflow. Third, since the cloud is charged by hours, we also design an exact algorithm to find the optimal operation pattern for a given schedule to make the consumed instance hours minimum, and this algorithm is named as instance hours minimization by Dijkstra (TOID). Finally, by employing the DAG splitting method and the TOID, the COMSE schedules a deadline-constrained large-scale scientific workflow on the multi-vCPU VMs and incorporates two important objects: minimizing the computation cost and the communication cost. Our solution approach is evaluated through rigorous performance evaluation study using real-word workflows, and the results show that the proposed COMSE approach outperforms existing algorithms in terms of computation cost and communication cost.

相似文献

14.

满足工作流执行时限的可抢占虚拟机实例配置和调度方法研究

廖建锦孙庆骁杨海龙栾钟治钱德沛《计算机工程与科学》2020,42(11):1956-1964

随着云计算的迅速发展,将工作流部署到云计算平台已经成为了常见的选择。相比于传统的本地工作流,云工作流不仅要考虑计算时长等要求,还要考虑其产生的经济开销。而云计算服务商为了提高资源利用率,提供了可抢占虚拟机实例这种非常廉价但是不稳定的资源。针对工作流在云计算中的调度和执行问题,提出一种满足工作流执行时限的可抢占虚拟机实例配置和调度方法。该方法使用马尔科夫模型和动态规划方法,对可抢占虚拟机实例的价格进行预测,并得到成本最低的出价策略。同时,结合工作流的执行时限要求,在估计的出价策略下对工作流中使用的实例进行配置。实验结果显示,相比于全部使用按需付费虚拟机实例,该方法在满足工作流执行时限的前提下最高可以节省89.9%的计算成本。相似文献

15.

云环境下基于聚簇的科学工作流执行优化策略

段菊陈旺虎王润平俞茂义王世凯《计算机应用》2015,35(6):1580-1584

基于云环境下的科学工作流,以提高处理机利用率、降低费用为目标,提出了一种基于聚簇的执行优化策略。该策略首先基于合理的任务复制和分簇,以实现关键任务的尽早调度;在此基础上,对任务簇再次进行聚集,以充分利用任务簇中任务间可能的空闲时间。实验表明,该策略能够提高任务的并行度,提前工作流的最早完成时间,并且在提高处理机的利用率和降低科学工作流的执行费用方面有显著效果。相似文献

16.

基于自适应惩罚函数的云工作流调度协同进化遗传算法

徐健锐朱会娟《计算机科学》2018,45(8):105-112

云计算为大规模科学工作流应用的执行提供了更高效的运行环境。为了解决云环境中科学工作流调度的代价优化问题,提出了一种基于协同进化的工作流调度遗传算法CGAA。该算法将自适应惩罚函数引入严格约束的遗传算法中,通过协同进化的方法,自适应地调整种群个体的交叉与变异概率,以加速算法收敛并防止种群早熟。通过4种科学工作流的仿真实验结果表明,CGAA算法得到的调度方案在满足工作流调度截止时间约束与降低任务执行代价的综合性能方面优于同类型算法。相似文献

17.

混合自适应粒子群工作流调度优化算法

马学森许雪梅蒋功辉乔焰周天保《计算机应用》2023,43(2):474-483

针对具有截止期的云工作流完成时间与执行成本冲突的问题,提出一种混合自适应粒子群工作流调度优化算法（HAPSO）。首先,基于截止期建立有向无环图（DAG）云工作流调度模型;然后,通过范数理想点与自适应权重的结合,将DAG调度模型转化为权衡DAG完成时间和执行成本的多目标优化问题;最后,在粒子群优化（PSO）算法的基础上引入自适应惯性权重、自适应学习因子、花朵授粉算法的概率切换机制、萤火虫算法（FA）和粒子越界处理方法,从而平衡粒子群的全局搜索与局部搜索能力,进而求解DAG完成时间与执行成本的目标优化问题。实验中对比分析了PSO、惯性权重粒子群算法（WPSO）、蚁群算法（ACO）和HAPSO的优化结果。实验结果表明,HAPSO在权衡工作流（30～300任务数）完成时间与执行成本的多目标函数值上降低了40.9%～81.1%,HAPSO在工作流截止期约束下有效权衡了完成时间与执行成本。此外,HAPSO在减少完成时间或降低执行成本的单目标上也有较好的效果,验证了HAPSO的普适性。相似文献

18.

An Evaluation of the Cost and Performance of Scientific Workflows on Amazon EC2 总被引：1，自引：0，他引：1

Gideon Juve Ewa Deelman G. Bruce Berriman Benjamin P. Berman Philip Maechling 《Journal of Grid Computing》2012,10(1):5-21

Workflows are used to orchestrate data-intensive applications in many different scientific domains. Workflow applications typically communicate data between processing steps using intermediate files. When tasks are distributed, these files are either transferred from one computational node to another, or accessed through a shared storage system. As a result, the efficient management of data is a key factor in achieving good performance for workflow applications in distributed environments. In this paper we investigate some of the ways in which data can be managed for workflows in the cloud. We ran experiments using three typical workflow applications on Amazon’s EC2 cloud computing platform. We discuss the various storage and file systems we used, describe the issues and problems we encountered deploying them on EC2, and analyze the resulting performance and cost of the workflows. 相似文献

19.

基于云科学工作流调度的代价与能效优化算法

魏秀然王峰《计算机应用研究》2018,35(7)

为了降低云环境中科学工作流调度的执行代价与数据中心能耗,提出了一种基于能效感知的工作流调度代价最优化算法CWCO-EA。算法在满足截止时间约束下,以最小化工作流执行代价与降低能耗为目标,将工作流的任务调度划分为四步执行。首先,通过代价效用的概念设计虚拟机选择策略,实现了子makespan约束下的任务与最优虚拟机间的映射;其次,通过串行与并行任务合并策略,同步降低了工作流的执行代价与能耗;然后,通过空闲虚拟机重用机制,改善了租用虚拟机的利用率,进一步提高了能效;最后,通过任务松驰策略实现了租用虚拟机的能力回收,节省了能耗。通过四种科学工作流的仿真实验,结果表明,CWCO-EA算法比较同类型算法,在满足截止时间的同时,可以同步降低工作流的执行代价与执行能耗。相似文献

20.

A hybrid evolutionary algorithm for task scheduling and data assignment of data-intensive scientific workflows on clouds

《Future Generation Computer Systems》2017

A growing number of data- and compute-intensive experiments have been modeled as scientific workflows in the last decade. Meanwhile, clouds have emerged as a prominent environment to execute this type of workflows. In this scenario, the investigation of workflow scheduling strategies, aiming at reducing its execution times, became a top priority and a very popular research field. However, few work consider the problem of data file assignment when solving the task scheduling problem. Usually, a workflow is represented by a graph where nodes represent tasks and the scheduling problem consists in allocating tasks to machines to be executed at a predefined time aiming at reducing the makespan of the whole workflow. In this article, we show that the scheduling of scientific workflows can be improved when both task scheduling and the data file assignment problems are treated together. Thus, we propose a new workflow representation, where nodes of the workflow graph represent either tasks or data files, and define the Task Scheduling and Data Assignment Problem (TaSDAP), considering this new model. We formulated this problem as an integer programming problem. Moreover, a hybrid evolutionary algorithm for solving it, named HEA-TaSDAP, is also introduced. To evaluate our approach we conducted two types of experiments: theoretical and practical ones. At first, we compared HEA-TaSDAP with the solutions produced by the mathematical formulation and by other works from related literature. Then, we considered real executions in Amazon EC2 cloud using a real scientific workflow use case (SciPhy for phylogenetic analyses). In all experiments, HEA-TaSDAP outperformed the other classical approaches from the related literature, such as Min–Min and HEFT. 相似文献