首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Scientific workflows are increasingly used to manage and share scientific computations and methods to analyze data. A variety of systems have been developed that store the workflows executed and make them part of public repositories However, workflows are published in the idiosyncratic format of the workflow system used for the creation and execution of the workflows. Browsing, linking and using the stored workflows and their results often becomes a challenge for scientists who may only be familiar with one system. In this paper we present an approach for addressing this issue by publishing and exploiting workflows as data on the Web with a representation that is independent from the workflow system used to create them. In order to achieve our goal, we follow the Linked Data Principles to publish workflow inputs, intermediate results, outputs and codes; and we reuse and extend well established standards like W3C PROV. We illustrate our approach by publishing workflows and consuming them with different tools designed to address common scenarios for workflow exploitation.  相似文献   

3.
Automation of the execution of computational tasks is at the heart of improving scientific productivity. Over the last years, scientific workflows have been established as an important abstraction that captures data processing and computation of large and complex scientific applications. By allowing scientists to model and express entire data processing steps and their dependencies, workflow management systems relieve scientists from the details of an application and manage its execution on a computational infrastructure. As the resource requirements of today’s computational and data science applications that process vast amounts of data keep increasing, there is a compelling case for a new generation of advances in high-performance computing, commonly termed as extreme-scale computing, which will bring forth multiple challenges for the design of workflow applications and management systems. This paper presents a novel characterization of workflow management systems using features commonly associated with extreme-scale computing applications. We classify 15 popular workflow management systems in terms of workflow execution models, heterogeneous computing environments, and data access methods. The paper also surveys workflow applications and identifies gaps for future research on the road to extreme-scale workflows and management systems.  相似文献   

4.
5.
6.
In the last years, scientific workflows have emerged as a fundamental abstraction for structuring and executing scientific experiments in computational environments. Scientific workflows are becoming increasingly complex and more demanding in terms of computational resources, thus requiring the usage of parallel techniques and high performance computing (HPC) environments. Meanwhile, clouds have emerged as a new paradigm where resources are virtualized and provided on demand. By using clouds, scientists have expanded beyond single parallel computers to hundreds or even thousands of virtual machines. Although the initial focus of clouds was to provide high throughput computing, clouds are already being used to provide an HPC environment where elastic resources can be instantiated on demand during the course of a scientific workflow. However, this model also raises many open, yet important, challenges such as scheduling workflow activities. Scheduling parallel scientific workflows in the cloud is a very complex task since we have to take into account many different criteria and to explore the elasticity characteristic for optimizing workflow execution. In this paper, we introduce an adaptive scheduling heuristic for parallel execution of scientific workflows in the cloud that is based on three criteria: total execution time (makespan), reliability and financial cost. Besides scheduling workflow activities based on a 3-objective cost model, this approach also scales resources up and down according to the restrictions imposed by scientists before workflow execution. This tuning is based on provenance data captured and queried at runtime. We conducted a thorough validation of our approach using a real bioinformatics workflow. The experiments were performed in SciCumulus, a cloud workflow engine for managing scientific workflow execution.  相似文献   

7.
In this paper, we introduce an efficient mechanism to collect, store, and retrieve data provenance information in workflows of multiphysics simulations. Using notifications, we enable the nonintrusive collection of information about workflow events during workflow execution. Combining these events with workflow structure information, constant for every execution of a workflow, we obtain the data provenance information for the specific run of the workflow. Data provenance information is structured into a graph that represents workflow events on the basis of their causal dependency. We use a graph database to store this graph and utilize the traversal framework provided, to efficiently retrieve data provenance information from the graph by traversing backwards from a data object to every workflow event that is part of its provenance. Finally, we integrate data provenance information with semantics of workflow services to provide complete and meaningful data provenance information. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

8.
A growing number of data- and compute-intensive experiments have been modeled as scientific workflows in the last decade. Meanwhile, clouds have emerged as a prominent environment to execute this type of workflows. In this scenario, the investigation of workflow scheduling strategies, aiming at reducing its execution times, became a top priority and a very popular research field. However, few work consider the problem of data file assignment when solving the task scheduling problem. Usually, a workflow is represented by a graph where nodes represent tasks and the scheduling problem consists in allocating tasks to machines to be executed at a predefined time aiming at reducing the makespan of the whole workflow. In this article, we show that the scheduling of scientific workflows can be improved when both task scheduling and the data file assignment problems are treated together. Thus, we propose a new workflow representation, where nodes of the workflow graph represent either tasks or data files, and define the Task Scheduling and Data Assignment Problem (TaSDAP), considering this new model. We formulated this problem as an integer programming problem. Moreover, a hybrid evolutionary algorithm for solving it, named HEA-TaSDAP, is also introduced. To evaluate our approach we conducted two types of experiments: theoretical and practical ones. At first, we compared HEA-TaSDAP with the solutions produced by the mathematical formulation and by other works from related literature. Then, we considered real executions in Amazon EC2 cloud using a real scientific workflow use case (SciPhy for phylogenetic analyses). In all experiments, HEA-TaSDAP outperformed the other classical approaches from the related literature, such as Min–Min and HEFT.  相似文献   

9.
An important challenge for the adoption of cloud computing in the scientific community remains the efficient allocation and execution of data-intensive scientific workflows to reduce execution time and the size of transferred data. The transferred data overhead is becoming significant with emerging scientific workflows that have input/output files and intermediate data products ranging in the hundreds of gigabytes. The allocation of scientific workflows on public clouds can be described through a variety of perspectives and parameters, and has been proved to be NP-complete. This paper proposes an evolutionary approach for task allocation on public clouds considering data transfer and execution time. In our framework, a solution is represented using an allocation chromosome that encodes the allocation of tasks to nodes, and an ordering chromosome that defines the execution order according to the scientific workflow representation. We propose a multi-objective optimization that relies on a cloud cost model and employs tailored evolution operators. Starting from a population of possible solutions, we employ crossover and mutation operators on both chromosomes aiming at optimizing the data transferred between nodes as well as the total workflow runtime. The crossover operators combine parts of solutions to reduce data overhead, whereas the mutation operators swamp between parts of the same chromosome according to pre-defined rules. Our experimental study compares between the proposed approach and current state-of-the art approaches using synthetic and real-life workflows. Our algorithm performs similarly to existing heuristics for small workflows and shows up to 80 % improvements for larger synthetic workflows. To further validate our approach we compare between the allocation and scheduling obtained by our approach with that obtained by popular scientific workflow managers, when real workflows with hundreds of tasks are executed on a public cloud. The results show a 10 % improvement in runtime over existing schedulers, caused by a 80 % reduction in transferred data and optimized allocation and ordering of tasks. This improved data locality has greater impact as it can be employed to improve and study data provenance and facilitate data persistence for scientific workflows.  相似文献   

10.
11.
Workflow technology continues to play an important role as a means for specifying and enacting computational experiments in modern science. Reusing and re-purposing workflows allow scientists to do new experiments faster, since the workflows capture useful expertise from others. As workflow libraries grow, scientists face the challenge of finding workflows appropriate for their task, understanding what each workflow does, and reusing relevant portions of a given workflow. We believe that workflows would be easier to understand and reuse if high-level views (abstractions) of their activities were available in workflow libraries. As a first step towards obtaining these abstractions, we report in this paper on the results of a manual analysis performed over a set of real-world scientific workflows from Taverna, Wings, Galaxy and Vistrails. Our analysis has resulted in a set of scientific workflow motifs that outline (i) the kinds of data-intensive activities that are observed in workflows (Data-Operation motifs), and (ii) the different manners in which activities are implemented within workflows (Workflow-Oriented motifs). These motifs are helpful to identify the functionality of the steps in a given workflow, to develop best practices for workflow design, and to develop approaches for automated generation of workflow abstractions.  相似文献   

12.
13.
提高科学工作流在云环境中的执行效率、降低执行费用受到广泛关注。用户期望的局部QoS约束与工作流的总体执行效率之间往往存在矛盾。针对该现象,在前期的研究基础上提出一种允许违反局部时间约束的科学工作流调度策略。通过对已聚簇的工作流任务集使用任务后向优先合并的方法,可实现任务间空闲时间片的合理利用,进而优化科学工作流的执行时间;另外,为充分利用任务的松弛时间,提高工作流的整体执行效率,允许部分任务的调度违反局部最晚完成时间的约束。实验结果表明,该策略能提前科学工作流的最早完成时间,提高处理机的利用率,并最终降低工作流的执行费用。  相似文献   

14.
基于时间Petri网的工作流建模   总被引:4,自引:0,他引:4  
随着工作流管理的发展,人们提出了很多扩展Petri网概念用于工作流建模。描述工作流中的资源、资源条件以及任务与资源条件之间的关系是一件很复杂的事情。而现有的用于描述工作流的Petri网在这方面,并不令人满意。该文特别强调资源的概念,在时间Petri网的基础上提出了一种新Petri网———资源条件/任务网(ResourceCondition/Tasknet,简称RC/TN),利用RC/TN网来进行工作流建模,描述工作流的执行过程。并将该工作流建模方法应用于OA流程的描述中。  相似文献   

15.
Bag-of-Tasks (BoT) workflows are widespread in many big data analysis fields. However, there are very few cloud resource provisioning and scheduling algorithms tailored for BoT workflows. Furthermore, existing algorithms fail to consider the stochastic task execution times of BoT workflows which leads to deadline violations and increased resource renting costs. In this paper, we propose a dynamic cloud resource provisioning and scheduling algorithm which aims to fulfill the workflow deadline by using the sum of task execution time expectation and standard deviation to estimate real task execution times. A bag-based delay scheduling strategy and a single-type based virtual machine interval renting method are presented to decrease the resource renting cost. The proposed algorithm is evaluated using a cloud simulator ElasticSim which is extended from CloudSim. The results show that the dynamic algorithm decreases the resource renting cost while guaranteeing the workflow deadline compared to the existing algorithms.  相似文献   

16.
Typical patterns of using scientific workflows include their periodical executions using a fixed set of computational resources. Using the statistics from multiple runs, one can accurately estimate task execution and communication times to apply static scheduling algorithms. Several workflows with known estimates could be combined into a set to improve the resulting schedule. In this paper, we consider the mapping of multiple workflows to partially available heterogeneous resources. The problem is how to fill free time windows with tasks from different workflows, taking into account users’ requirements of the urgency of the results of calculations. To estimate quality of schedules for several workflows with various soft deadlines, we introduce the unified metric incorporating levels of meeting constraints and fairness of resource distribution.The main goal of the work was to develop a set of algorithms implementing different scheduling strategies for multiple workflows with soft deadlines in a non-dedicated environment, and to perform a comparative analysis of these strategies. We study how time restrictions (given by resource providers and users) influence the quality of schedules, and which scheme of grouping and ordering the tasks is the most effective for the batched scheduling of non-urgent workflows. Experiments with several types of synthetic and domain-specific sets of multiple workflows show that: (i) the use of information about time windows and deadlines leads to the significant increase of the quality of static schedules, (ii) the clustering-based scheduling scheme outperforms task-based and workflow-based schemes. This was confirmed by an evaluation of studied algorithms on a basis of the CLAVIRE workflow management platform.  相似文献   

17.
Scientific workflows have emerged as an important tool for combining the computational power with data analysis for all scientific domains in e-science, especially in the life sciences. They help scientists to design and execute complex in silico experiments. However, with rising complexity it becomes increasingly impractical to optimize scientific workflows by trial and error. To address this issue, we propose to insert a new optimization phase into the common scientific workflow life cycle. This paper describes the design and implementation of an automated optimization framework for scientific workflows to implement this phase. Our framework was integrated into Taverna, a life-science oriented workflow management system and offers a versatile programming interface (API), which enables easy integration of arbitrary optimization methods. We have used this API to develop an example plugin for parameter optimization that is based on a Genetic Algorithm. Two use cases taken from the areas of structural bioinformatics and proteomics demonstrate how our framework facilitates setup, execution, and monitoring of workflow parameter optimization in high performance computing e-science environments.  相似文献   

18.
This paper investigates the resource allocation problem for a type of workflow in pervasive computing. These workflows are abstracted from the enterprise-level applications in the business or commerce area. The activities in these workflows require not only computing resources, but also human resources. Human involvement introduces additional security concerns. When we plan/allocate resource capacities, we often assume that when a task is allocated to a resource, the resource will accept the task and start the execution once the processor becomes available. However, the security policies impose further constraints on task executions, and therefore may affect both application- and system-oriented performance. Authorization is an important aspect in security. This paper investigates the issue of allocating resources for running workflows under the role-based authorization control, which is one of the most popular authorization mechanisms. By taking into account the authorization constraints, the resource allocation strategies are developed in this paper for both human resources and computing resources. In the allocation strategy for human resources, the optimization equation is constructed subject to the constraint of the budget available to hire human resources. Then the optimization equation is solved to obtain the number of human resources allocated to each authorization role. The allocation strategy for computing resources calculates not only the number of computing resources, but also the proportion of processing capacity in each resource allocated to serve the tasks assuming each role. The simulation experiments have been conducted to verify the effectiveness of the developed allocation strategies. The experimental results show that the allocation strategy developed in this paper outperforms the traditional allocation strategies, which do not consider authorization constraints, in terms of both average response time and resource utilization.  相似文献   

19.
Efficient data-aware methods in job scheduling, distributed storage management and data management platforms are necessary for successful execution of data-intensive applications. However, research about methods for data-intensive scientific applications are insufficient in large-scale distributed cloud and cluster computing environments and data-aware methods are becoming more complex. In this paper, we propose a Data-Locality Aware Workflow Scheduling (D-LAWS) technique and a locality-aware resource management method for data-intensive scientific workflows in HPC cloud environments. D-LAWS applies data-locality and data transfer time based on network bandwidth to scientific workflow task scheduling and balances resource utilization and parallelism of tasks at the node-level. Our method consolidates VMs and consider task parallelism by data flow during the planning of task executions of a data-intensive scientific workflow. We additionally consider more complex workflow models and data locality pertaining to the placement and transfer of data prior to task executions. We implement and validate the methods based on fairness in cloud environments. Experimental results show that, the proposed methods can improve performance and data-locality of data-intensive workflows in cloud environments.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号