首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Scientific workflows have emerged as an important tool for combining the computational power with data analysis for all scientific domains in e-science, especially in the life sciences. They help scientists to design and execute complex in silico experiments. However, with rising complexity it becomes increasingly impractical to optimize scientific workflows by trial and error. To address this issue, we propose to insert a new optimization phase into the common scientific workflow life cycle. This paper describes the design and implementation of an automated optimization framework for scientific workflows to implement this phase. Our framework was integrated into Taverna, a life-science oriented workflow management system and offers a versatile programming interface (API), which enables easy integration of arbitrary optimization methods. We have used this API to develop an example plugin for parameter optimization that is based on a Genetic Algorithm. Two use cases taken from the areas of structural bioinformatics and proteomics demonstrate how our framework facilitates setup, execution, and monitoring of workflow parameter optimization in high performance computing e-science environments.  相似文献   

2.
In the last years, scientific workflows have emerged as a fundamental abstraction for structuring and executing scientific experiments in computational environments. Scientific workflows are becoming increasingly complex and more demanding in terms of computational resources, thus requiring the usage of parallel techniques and high performance computing (HPC) environments. Meanwhile, clouds have emerged as a new paradigm where resources are virtualized and provided on demand. By using clouds, scientists have expanded beyond single parallel computers to hundreds or even thousands of virtual machines. Although the initial focus of clouds was to provide high throughput computing, clouds are already being used to provide an HPC environment where elastic resources can be instantiated on demand during the course of a scientific workflow. However, this model also raises many open, yet important, challenges such as scheduling workflow activities. Scheduling parallel scientific workflows in the cloud is a very complex task since we have to take into account many different criteria and to explore the elasticity characteristic for optimizing workflow execution. In this paper, we introduce an adaptive scheduling heuristic for parallel execution of scientific workflows in the cloud that is based on three criteria: total execution time (makespan), reliability and financial cost. Besides scheduling workflow activities based on a 3-objective cost model, this approach also scales resources up and down according to the restrictions imposed by scientists before workflow execution. This tuning is based on provenance data captured and queried at runtime. We conducted a thorough validation of our approach using a real bioinformatics workflow. The experiments were performed in SciCumulus, a cloud workflow engine for managing scientific workflow execution.  相似文献   

3.
Workflows are used to orchestrate data-intensive applications in many different scientific domains. Workflow applications typically communicate data between processing steps using intermediate files. When tasks are distributed, these files are either transferred from one computational node to another, or accessed through a shared storage system. As a result, the efficient management of data is a key factor in achieving good performance for workflow applications in distributed environments. In this paper we investigate some of the ways in which data can be managed for workflows in the cloud. We ran experiments using three typical workflow applications on Amazon’s EC2 cloud computing platform. We discuss the various storage and file systems we used, describe the issues and problems we encountered deploying them on EC2, and analyze the resulting performance and cost of the workflows.  相似文献   

4.
Nowadays, more and more computer-based scientific experiments need to handle massive amounts of data. Their data processing consists of multiple computational steps and dependencies within them. A data-intensive scientific workflow is useful for modeling such process. Since the sequential execution of data-intensive scientific workflows may take much time, Scientific Workflow Management Systems (SWfMSs) should enable the parallel execution of data-intensive scientific workflows and exploit the resources distributed in different infrastructures such as grid and cloud. This paper provides a survey of data-intensive scientific workflow management in SWfMSs and their parallelization techniques. Based on a SWfMS functional architecture, we give a comparative analysis of the existing solutions. Finally, we identify research issues for improving the execution of data-intensive scientific workflows in a multisite cloud.  相似文献   

5.
6.
7.
A Taxonomy of Workflow Management Systems for Grid Computing   总被引:12,自引:0,他引:12  
With the advent of Grid and application technologies, scientists and engineers are building more and more complex applications to manage and process large data sets, and execute scientific experiments on distributed resources. Such application scenarios require means for composing and executing complex workflows. Therefore, many efforts have been made towards the development of workflow management systems for Grid computing. In this paper, we propose a taxonomy that characterizes and classifies various approaches for building and executing workflows on Grids. We also survey several representative Grid workflow systems developed by various projects world-wide to demonstrate the comprehensiveness of the taxonomy. The taxonomy not only highlights the design and engineering similarities and differences of state-of-the-art in Grid workflow systems, but also identifies the areas that need further research.  相似文献   

8.
Cloud computing offers massive scalability and elasticity required by many scientific and commercial applications. Combining the computational and data handling capabilities of clouds with parallel processing also has the potential to tackle Big Data problems efficiently. Science gateway frameworks and workflow systems enable application developers to implement complex applications and make these available for end-users via simple graphical user interfaces. The integration of such frameworks with Big Data processing tools on the cloud opens new opportunities for application developers. This paper investigates how workflow systems and science gateways can be extended with Big Data processing capabilities. A generic approach based on infrastructure aware workflows is suggested and a proof of concept is implemented based on the WS-PGRADE/gUSE science gateway framework and its integration with the Hadoop parallel data processing solution based on the MapReduce paradigm in the cloud. The provided analysis demonstrates that the methods described to integrate Big Data processing with workflows and science gateways work well in different cloud infrastructures and application scenarios, and can be used to create massively parallel applications for scientific analysis of Big Data.  相似文献   

9.
Scientific workflows are a topic of great interest in the grid community that sees in the workflow model an attractive paradigm for programming distributed wide-area grid infrastructures. Traditionally, the grid workflow execution is approached as a pure best effort scheduling problem that maps the activities onto the grid processors based on appropriate optimization or local matchmaking heuristics such that the overall execution time is minimized. Even though such heuristics often deliver effective results, the execution in dynamic and unpredictable grid environments is prone to severe performance losses that must be understood for minimizing the completion time or for the efficient use of high-performance resources. In this paper, we propose a new systematic approach to help the scientists and middleware developers understand the most severe sources of performance losses that occur when executing scientific workflows in dynamic grid environments. We introduce an ideal model for the lowest execution time that can be achieved by a workflow and explain the difference to the real measured grid execution time based on a hierarchy of performance overheads for grid computing. We describe how to systematically measure and compute the overheads from individual activities to larger workflow regions and adjust well-known parallel processing metrics to the scope of grid computing, including speedup and efficiency. We present a distributed online tool for computing and analyzing the performance overheads in real time based on event correlation techniques and introduce several performance contracts as quality-of-service parameters to be enforced during the workflow execution beyond traditional best effort practices. We illustrate our method through postmortem and online performance analysis of two real-world workflow applications executed in the Austrian grid environment.  相似文献   

10.
The computing environment in most medium-sized and large enterprises involves old main-frame based (legacy) applications and systems as well as new workstation-based distributed computing systems. The objective of the METEOR project is to support multi-system workflow applications that automate enterprise operations. This paper deals with the modeling and specification of workflows in such applications. Tasks in our heterogeneous environment can be submitted through different types of interfaces on different processing entities. We first present a computational model for workflows that captures the behavior of both transactional and non-transactional tasks of different types. We then develop two languages for specifying a workflow at different levels of abstraction: the Workflow Specification Language (WFSL) is a declarative rule-based language used to express the application-level interactions between multiple tasks, while the Task Specification Language (TSL) focuses on the issues related to individual tasks. These languages are designed to address the important issues of inter-task dependencies, data formatting, data exchange, error handling, and recovery. The paper also presents an architecture for the workflow management system that supports the model and the languages. Recommended by: Omran Bukhres and e. Kühn  相似文献   

11.
Cloud computing has established itself as an interesting computational model that provides a wide range of resources such as storage, databases and computing power for several types of users. Recently, the concept of cloud computing was extended with the concept of federated clouds where several resources from different cloud providers are inter-connected to perform a common action (e.g. execute a scientific workflow). Users can benefit from both single-provider and federated cloud environment to execute their scientific workflows since they can get the necessary amount of resources on demand. In several of these workflows, there is a demand for high performance and parallelism techniques since many activities are data and computing intensive and can execute for hours, days or even weeks. There are some Scientific Workflow Management Systems (SWfMS) that already provide parallelism capabilities for scientific workflows in single-provider cloud. Most of them rely on creating a virtual cluster to execute the workflow in parallel. However, they also rely on the user to estimate the amount of virtual machines to be allocated to create this virtual cluster. Most SWfMS use this initial virtual cluster configuration made by the user for the entire workflow execution. Dimensioning the virtual cluster to execute the workflow in parallel is then a top priority task since if the virtual cluster is under or over dimensioned it can impact on the workflow performance or increase (unnecessarily) financial costs. This dimensioning is far from trivial in a single-provider cloud and specially in federated clouds due to the huge number of virtual machine types to choose in each location and provider. In this article, we propose an approach named GraspCC-fed to produce the optimal (or near-optimal) estimation of the amount of virtual machines to allocate for each workflow. GraspCC-fed extends a previously proposed heuristic based on GRASP for executing standalone applications to consider scientific workflows executed in both single-provider and federated clouds. For the experiments, GraspCC-fed was coupled to an adapted version of SciCumulus workflow engine for federated clouds. This way, we believe that GraspCC-fed can be an important decision support tool for users and it can help determining an optimal configuration for the virtual cluster for parallel cloud-based scientific workflows.  相似文献   

12.
13.
14.
Efficient data-aware methods in job scheduling, distributed storage management and data management platforms are necessary for successful execution of data-intensive applications. However, research about methods for data-intensive scientific applications are insufficient in large-scale distributed cloud and cluster computing environments and data-aware methods are becoming more complex. In this paper, we propose a Data-Locality Aware Workflow Scheduling (D-LAWS) technique and a locality-aware resource management method for data-intensive scientific workflows in HPC cloud environments. D-LAWS applies data-locality and data transfer time based on network bandwidth to scientific workflow task scheduling and balances resource utilization and parallelism of tasks at the node-level. Our method consolidates VMs and consider task parallelism by data flow during the planning of task executions of a data-intensive scientific workflow. We additionally consider more complex workflow models and data locality pertaining to the placement and transfer of data prior to task executions. We implement and validate the methods based on fairness in cloud environments. Experimental results show that, the proposed methods can improve performance and data-locality of data-intensive workflows in cloud environments.  相似文献   

15.
16.
Workflow technology continues to play an important role as a means for specifying and enacting computational experiments in modern science. Reusing and re-purposing workflows allow scientists to do new experiments faster, since the workflows capture useful expertise from others. As workflow libraries grow, scientists face the challenge of finding workflows appropriate for their task, understanding what each workflow does, and reusing relevant portions of a given workflow. We believe that workflows would be easier to understand and reuse if high-level views (abstractions) of their activities were available in workflow libraries. As a first step towards obtaining these abstractions, we report in this paper on the results of a manual analysis performed over a set of real-world scientific workflows from Taverna, Wings, Galaxy and Vistrails. Our analysis has resulted in a set of scientific workflow motifs that outline (i) the kinds of data-intensive activities that are observed in workflows (Data-Operation motifs), and (ii) the different manners in which activities are implemented within workflows (Workflow-Oriented motifs). These motifs are helpful to identify the functionality of the steps in a given workflow, to develop best practices for workflow design, and to develop approaches for automated generation of workflow abstractions.  相似文献   

17.
An important challenge for the adoption of cloud computing in the scientific community remains the efficient allocation and execution of data-intensive scientific workflows to reduce execution time and the size of transferred data. The transferred data overhead is becoming significant with emerging scientific workflows that have input/output files and intermediate data products ranging in the hundreds of gigabytes. The allocation of scientific workflows on public clouds can be described through a variety of perspectives and parameters, and has been proved to be NP-complete. This paper proposes an evolutionary approach for task allocation on public clouds considering data transfer and execution time. In our framework, a solution is represented using an allocation chromosome that encodes the allocation of tasks to nodes, and an ordering chromosome that defines the execution order according to the scientific workflow representation. We propose a multi-objective optimization that relies on a cloud cost model and employs tailored evolution operators. Starting from a population of possible solutions, we employ crossover and mutation operators on both chromosomes aiming at optimizing the data transferred between nodes as well as the total workflow runtime. The crossover operators combine parts of solutions to reduce data overhead, whereas the mutation operators swamp between parts of the same chromosome according to pre-defined rules. Our experimental study compares between the proposed approach and current state-of-the art approaches using synthetic and real-life workflows. Our algorithm performs similarly to existing heuristics for small workflows and shows up to 80 % improvements for larger synthetic workflows. To further validate our approach we compare between the allocation and scheduling obtained by our approach with that obtained by popular scientific workflow managers, when real workflows with hundreds of tasks are executed on a public cloud. The results show a 10 % improvement in runtime over existing schedulers, caused by a 80 % reduction in transferred data and optimized allocation and ordering of tasks. This improved data locality has greater impact as it can be employed to improve and study data provenance and facilitate data persistence for scientific workflows.  相似文献   

18.
Scientific workflows are increasingly used to manage and share scientific computations and methods to analyze data. A variety of systems have been developed that store the workflows executed and make them part of public repositories However, workflows are published in the idiosyncratic format of the workflow system used for the creation and execution of the workflows. Browsing, linking and using the stored workflows and their results often becomes a challenge for scientists who may only be familiar with one system. In this paper we present an approach for addressing this issue by publishing and exploiting workflows as data on the Web with a representation that is independent from the workflow system used to create them. In order to achieve our goal, we follow the Linked Data Principles to publish workflow inputs, intermediate results, outputs and codes; and we reuse and extend well established standards like W3C PROV. We illustrate our approach by publishing workflows and consuming them with different tools designed to address common scenarios for workflow exploitation.  相似文献   

19.
Most of the existing scientific workflow systems rely on proprietary concepts and workflow languages. We are convinced that the conventional workflow technology that is established in business scenarios for years is also beneficial for scientists and scientific applications. We are therefore working on a scientific workflow system based on business workflow concepts and technologies. The system offers advanced flexibility features to scientists in order to support them in creating workflows in an explorative manner and to increase robustness of scientific applications. We named the approach Model-as-you-go because it enables users to model and execute workflows in an iterative process that eventually results in a complete scientific workflow. In this paper, we present main ingredients of Model-as-you-go, show how existing workflow concepts have to be extended in order to cover the requirements of scientists, discuss the application of the concepts to BPEL, and introduce the current prototype of the system.  相似文献   

20.
The field of scientific workflow management systems has grown significantly as applications start using them successfully. In 2007, several active researchers in scientific workflow developments presented the challenges for the state of the art in workflow technologies at that time. Many issues have been addressed, but one of them named ‘dynamic workflows and user steering’ remains with many open problems despite the contributions presented in the recent years. This article surveys the early and current efforts in this topic and proposes a taxonomy to identify the main concepts related to addressing issues in dynamic steering of high performance computing (HPC) in scientific workflows. The main concepts are related to putting the human-in-the-loop of the workflow lifecycle, involving user support in real-time monitoring, notification, analysis and interference by adapting the workflow execution at runtime.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号