首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Computational workflows are a powerful paradigm to represent and manage complex applications, particularly in large-scale distributed scientific data analysis. Workflows represent application components that result in individual computations as well as their interdependences in terms of dataflow. Workflow systems use these representations to manage various aspects of workflow creation and execution for users, such as the automatic assignment of execution resources. This article describes an approach to automating a new aspect of the process: the selection of application components and data sources. We present a novel approach that enables users to specify varying degrees of detail and amount of constraints in a workflow request, including the specification of constraints on input, intermediate or output data in the workflow, abstract workflow component classes rather than specific component implementations, and generic reusable workflow templates that express a pre-defined combination of components. The algorithm elaborates the user request into a set of fully ground workflows with specific choices of data sources and codes to be used so that they can be submitted for mapping and execution. The algorithm searches through the space of possible candidate workflows by creating increasingly more specialized versions of the original template and eliminating candidates that violate constraints cumulated in the candidate workflow as components and data sources are selected. A novel feature of our approach is that it assumes a distributed architecture where data and component catalogues are separate from the workflow system. The algorithm explicitly poses queries to external catalogues, and therefore any reasoning regarding data or component properties is not assumed to occur within the workflow system. We describe our implementation of this approach in the Wings workflow system. This implementation uses the W3C Web Ontology Language and associated reasoners to implement the workflow system as well as the data and component catalogues. This research demonstrates the use of artificial intelligence techniques to support the kinds of automation envisioned by the scientific community for large-scale distributed scientific data analysis.  相似文献   

2.
With the development of new experimental technologies, biologists are faced with an avalanche of data to be computationally analyzed for scientific advancements and discoveries to emerge. Faced with the complexity of analysis pipelines, the large number of computational tools, and the enormous amount of data to manage, there is compelling evidence that many if not most scientific discoveries will not stand the test of time: increasing the reproducibility of computed results is of paramount importance.The objective we set out in this paper is to place scientific workflows in the context of reproducibility. To do so, we define several kinds of reproducibility that can be reached when scientific workflows are used to perform experiments. We characterize and define the criteria that need to be catered for by reproducibility-friendly scientific workflow systems, and use such criteria to place several representative and widely used workflow systems and companion tools within such a framework. We also discuss the remaining challenges posed by reproducible scientific workflows in the life sciences. Our study was guided by three use cases from the life science domain involving in silico experiments.  相似文献   

3.
Scientific workflows are increasingly used to manage and share scientific computations and methods to analyze data. A variety of systems have been developed that store the workflows executed and make them part of public repositories However, workflows are published in the idiosyncratic format of the workflow system used for the creation and execution of the workflows. Browsing, linking and using the stored workflows and their results often becomes a challenge for scientists who may only be familiar with one system. In this paper we present an approach for addressing this issue by publishing and exploiting workflows as data on the Web with a representation that is independent from the workflow system used to create them. In order to achieve our goal, we follow the Linked Data Principles to publish workflow inputs, intermediate results, outputs and codes; and we reuse and extend well established standards like W3C PROV. We illustrate our approach by publishing workflows and consuming them with different tools designed to address common scenarios for workflow exploitation.  相似文献   

4.
An important element in e-Science research is scientific workflow which is, in general, very long and composed of many computations, and represents a scientific process. Generally, a scientific workflow is difficult to define. One way to define it is using tools that aggregate semantics to assist in composition. In this context, this paper presents a proposal that aims to make composition of scientific workflows possible, considering semantic search of Web services and incorporating them into workflow definition.  相似文献   

5.
Next-generation scientific applications feature complex workflows comprised of many computing modules with intricate inter-module dependencies. Supporting such scientific workflows in wide-area networks especially Grids and optimizing their performance are crucial to the success of collaborative scientific discovery. We develop a Scientific Workflow Automation and Management Platform (SWAMP), which enables scientists to conveniently assemble, execute, monitor, control, and steer computing workflows in distributed environments via a unified web-based user interface. The SWAMP architecture is built entirely on a seamless composition of web services: the functionalities of its own are provided and its interactions with other tools or systems are enabled through web services for easy access over standard Internet protocols while being independent of different platforms and programming languages. SWAMP also incorporates a class of efficient workflow mapping schemes to achieve optimal end-to-end performance based on rigorous performance modeling and algorithm design. The performance superiority of SWAMP over existing workflow mapping schemes is justified by extensive simulations, and the system efficacy is illustrated by large-scale experiments on real-life scientific workflows for climate modeling through effective system implementation, deployment, and testing on the Open Science Grid.  相似文献   

6.
In the last years, scientific workflows have emerged as a fundamental abstraction for structuring and executing scientific experiments in computational environments. Scientific workflows are becoming increasingly complex and more demanding in terms of computational resources, thus requiring the usage of parallel techniques and high performance computing (HPC) environments. Meanwhile, clouds have emerged as a new paradigm where resources are virtualized and provided on demand. By using clouds, scientists have expanded beyond single parallel computers to hundreds or even thousands of virtual machines. Although the initial focus of clouds was to provide high throughput computing, clouds are already being used to provide an HPC environment where elastic resources can be instantiated on demand during the course of a scientific workflow. However, this model also raises many open, yet important, challenges such as scheduling workflow activities. Scheduling parallel scientific workflows in the cloud is a very complex task since we have to take into account many different criteria and to explore the elasticity characteristic for optimizing workflow execution. In this paper, we introduce an adaptive scheduling heuristic for parallel execution of scientific workflows in the cloud that is based on three criteria: total execution time (makespan), reliability and financial cost. Besides scheduling workflow activities based on a 3-objective cost model, this approach also scales resources up and down according to the restrictions imposed by scientists before workflow execution. This tuning is based on provenance data captured and queried at runtime. We conducted a thorough validation of our approach using a real bioinformatics workflow. The experiments were performed in SciCumulus, a cloud workflow engine for managing scientific workflow execution.  相似文献   

7.
In order to design workflows in changing and dynamic environments, a flexible, correct, and rapid realization of models of the activity flow is required. In particular, techniques are needed to design workflows capable of adapting themselves effectively when exceptional situations occur during process execution. The authors present an approach to flexible workflow design based on rules and patterns developed in the framework of the WIDE project. Rules allow a high degree of flexibility during workflow design by modeling exceptional aspects of the workflow separately from the main activity flow. Patterns model frequently occurring exceptional situations in a generalized way by providing the designer with skeletons of rules and suggestions about their instantiation, together with indications on relationships with other rules, with the activity flow, and with related information. Pattern based design relies on a pattern catalog containing patterns to be reused and on a formal basis for specializing and instantiating available patterns  相似文献   

8.
The ability to support Quality of Service (QoS) constraints is an important requirement in some scientific applications. With the increasing use of Cloud computing infrastructures, where access to resources is shared, dynamic and provisioned on-demand, identifying how QoS constraints can be supported becomes an important challenge. However, access to dedicated resources is often not possible in existing Cloud deployments and limited QoS guarantees are provided by many commercial providers (often restricted to error rate and availability, rather than particular QoS metrics such as latency or access time). We propose a workflow system architecture which enforces QoS for the simultaneous execution of multiple scientific workflows over a shared infrastructure (such as a Cloud environment). Our approach involves multiple pipeline workflow instances, with each instance having its own QoS requirements. These workflows are composed of a number of stages, with each stage being mapped to one or more physical resources. A stage involves a combination of data access, computation and data transfer capability. A token bucket-based data throttling framework is embedded into the workflow system architecture. Each workflow instance stage regulates the amount of data that is injected into the shared resources, allowing for bursts of data to be injected while at the same time providing isolation of workflow streams. We demonstrate our approach by using the Montage workflow, and develop a Reference net model of the workflow.  相似文献   

9.
When the emergence of ‘service‐oriented science,’ the need arises to orchestrate multiple services to facilitate scientific investigation—that is, to create ‘science workflows.’ We present here our findings in providing a workflow solution for the caGrid service‐based grid infrastructure. We choose BPEL and Taverna as candidates, and compare their usability in the lifecycle of a scientific workflow, including workflow composition, execution, and result analysis. Our experience shows that BPEL as an imperative language offers a comprehensive set of modeling primitives for workflows of all flavors; whereas Taverna offers a dataflow model and a more compact set of primitives that facilitates dataflow modeling and pipelined execution. We hope that this comparison study not only helps researchers to select a language or tool that meets their specific needs, but also offers some insight into how a workflow language and tool can fulfill the requirement of the scientific community. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

10.
Cloud computing has established itself as an interesting computational model that provides a wide range of resources such as storage, databases and computing power for several types of users. Recently, the concept of cloud computing was extended with the concept of federated clouds where several resources from different cloud providers are inter-connected to perform a common action (e.g. execute a scientific workflow). Users can benefit from both single-provider and federated cloud environment to execute their scientific workflows since they can get the necessary amount of resources on demand. In several of these workflows, there is a demand for high performance and parallelism techniques since many activities are data and computing intensive and can execute for hours, days or even weeks. There are some Scientific Workflow Management Systems (SWfMS) that already provide parallelism capabilities for scientific workflows in single-provider cloud. Most of them rely on creating a virtual cluster to execute the workflow in parallel. However, they also rely on the user to estimate the amount of virtual machines to be allocated to create this virtual cluster. Most SWfMS use this initial virtual cluster configuration made by the user for the entire workflow execution. Dimensioning the virtual cluster to execute the workflow in parallel is then a top priority task since if the virtual cluster is under or over dimensioned it can impact on the workflow performance or increase (unnecessarily) financial costs. This dimensioning is far from trivial in a single-provider cloud and specially in federated clouds due to the huge number of virtual machine types to choose in each location and provider. In this article, we propose an approach named GraspCC-fed to produce the optimal (or near-optimal) estimation of the amount of virtual machines to allocate for each workflow. GraspCC-fed extends a previously proposed heuristic based on GRASP for executing standalone applications to consider scientific workflows executed in both single-provider and federated clouds. For the experiments, GraspCC-fed was coupled to an adapted version of SciCumulus workflow engine for federated clouds. This way, we believe that GraspCC-fed can be an important decision support tool for users and it can help determining an optimal configuration for the virtual cluster for parallel cloud-based scientific workflows.  相似文献   

11.
Because web services are highly interoperable, they are capable of providing uniform access to underlying technologies, allowing developers to choose between competing services. Workflow languages, such as BPEL, compose and sequence Web service invocations resulting in meaningful, and sometimes, repeated tasks. Their prevalence means there may be multiple Web services that perform the same operation with some better than others depending on the situation. Their potential for being unavailable at critical workflow execution times forces a reliance on such redundant services. One remedy for unavailability and situational awareness constraints is using quality of service factors and user-directed preferences to assign priorities to workflows and services to perform run-time replacement. In this paper we describe a novel approach to self-adapting workflow reconfiguration. We discuss the implementation of our approach embodied by the Next-generation Workflow Toolkit that supports runtime workflow reconfiguration using BPEL with a commercial workflow engine. A key design feature is the decoupling of user-directed changes regarding service priority from the actual workflow execution, allowing NeWT to effectively manage and recover from workflow changes at any time. We evaluate NeWT by comparing the same example across multiple commercial systems that claim reconfiguration capabilities.  相似文献   

12.
面向科学问题求解的科学工作流具有以数据为中心的特性,其验证要求同时考虑控制流和数据流的合理性。为此,分析科学工作流中的4种控制关系和2种数据关系,给出相关合理性定义,提出相应算法以遍历获得科学工作流中存在的控制关系和数据关系,实现对科学工作流的合理性验证。通过正反实例证明了该验证方法的有效性。  相似文献   

13.
Scientific workflow execution often spans multiple self-managing administrative domains to obtain specific processing capabilities. Existing (global) analysis techniques tend to mandate every domain-specific application to unveil all private behaviors for scientific collaboration. In practice, it is infeasible for a domain-specific application to disclose its process details (as a private workflow fragment) for privacy or security reasons. Consequently, it is a challenging endeavor to coordinate scientific workflows and its distributed domain-specific applications. To address this problem, we propose a collaborative scheduling approach that can deal with temporal dependencies between a scientific workflow and a private workflow fragment. Under this collaborative scheduling approach, a private workflow fragment could maintain the temporal consistency with a scientific workflow in resource sharing and task enactments. Further, an evaluation is also presented to demonstrate the proposed approach for coordinating multiple scientific workflow executions in a concurrent environment.  相似文献   

14.
At present, workflow management systems have not sufficiently dealt with the issues of time, involving time modelling at build-time and time management at run-time. They are lack of the ability to support the checking of temporal constraints at run-time. Although some approaches have been devised to tackle this problem, they are limited to a single workflow and use only static techniques to verify temporal constraints. In reality, there are multiple workflows executing concurrently in a workflow management system. There may well exist resource constraints between these concurrent workflows, which affect significantly the verification of temporal constraints at run-time. This paper proposes a novel approach for dynamic verification of temporal constraints for concurrent workflows. We first investigate resource constraints in workflow management systems, and then define concurrent workflow executions. Based on these definitions, we propose a verification method by analysing the temporal relationship and resource constraints between activities among concurrent workflows.  相似文献   

15.
Nowadays business process management is becoming a fundamental piece of many industrial processes. To manage the evolution and interactions between the business actions it is important to accurately model the steps to follow and the resources needed by a process. Workflows provide a way of describing the order of execution and the dependencies between the constituting activities of business processes. Workflow monitoring can help to improve and avoid delays in industrial environments where concurrent processes are carried out. In this article a new Petri net extension for modelling workflow activities together with their required resources is presented: resource-aware Petri nets (RAPN). An intelligent workflow management system for process monitoring and delay prediction is also introduced. Resource aware-Petri nets include time and resources within the classical Petri net workflow representation, facilitating the task of modelling and monitoring workflows. The workflow management system monitors the execution of workflows and detects possible delays using RAPN. In order to test this new approach, different services from a medical maintenance environment have been modelled and simulated.  相似文献   

16.
17.
In prior work it has been shown that the design of scientific workflows can benefit from a collection-oriented modeling paradigm which views scientific workflows as pipelines of XML stream processors. In this paper, we present approaches for exploiting data parallelism in XML processing pipelines through novel compilation strategies to the MapReduce framework. Pipelines in our approach consist of sequences of processing steps that receive XML-structured data and produce, often through calls to “black-box” (scientific) functions, modified (i.e., updated) XML structures. Our main contributions are (i) the development of a set of strategies for compiling scientific workflows, modeled as XML processing pipelines, into parallel MapReduce networks, and (ii) a discussion of their advantages and trade-offs, based on a thorough experimental evaluation of the various translation strategies. Our evaluation uses the Hadoop MapReduce system as an implementation platform. Our results show that execution times of XML workflow pipelines can be significantly reduced using our compilation strategies. These efficiency gains, together with the benefits of MapReduce (e.g., fault tolerance) make our approach ideal for executing large-scale, compute-intensive XML-based scientific workflows.  相似文献   

18.
Most of the existing scientific workflow systems rely on proprietary concepts and workflow languages. We are convinced that the conventional workflow technology that is established in business scenarios for years is also beneficial for scientists and scientific applications. We are therefore working on a scientific workflow system based on business workflow concepts and technologies. The system offers advanced flexibility features to scientists in order to support them in creating workflows in an explorative manner and to increase robustness of scientific applications. We named the approach Model-as-you-go because it enables users to model and execute workflows in an iterative process that eventually results in a complete scientific workflow. In this paper, we present main ingredients of Model-as-you-go, show how existing workflow concepts have to be extended in order to cover the requirements of scientists, discuss the application of the concepts to BPEL, and introduce the current prototype of the system.  相似文献   

19.
Workflows are used to formally describe processes of various types such as business and manufacturing processes. One of the critical tasks of workflow management is automated discovery of possible flaws in the workflow – workflow verification. In this paper, we formalize the problem of workflow verification as the problem of verifying that there exists a feasible process for each task in the workflow. This problem is tractable for nested workflows that are the workflows with a hierarchical structure similar to hierarchical task networks in planning. However, we show that if extra synchronization, precedence, or causal constraints are added to the nested structure, the workflow verification problem becomes NP-complete. We present a workflow verification algorithm for nested workflows with extra constraints that is based on constraint satisfaction techniques and exploits an incremental temporal reasoning algorithm. We then experimentally demonstrate efficiency of the proposed techniques on randomly generated workflows with various structures and sizes. The paper is concluded by notes on exploiting the presented techniques in the application FlowOpt for modeling, optimizing, visualizing, and analyzing production workflows.  相似文献   

20.
The workflow interoperability problem was successfully solved by the SHIWA project if the workflows to be integrated were running in the same grid infrastructure. However, in the more generic case when the workflows were running in different infrastructures the problem has not been solved yet. In the current paper we show a solution for this problem by introducing a new type of workflow called infrastructure-aware workflow. These are scientific workflows extended with new node types that enable the on-the-fly creation and destruction of the required infrastructures in the clouds. The paper shows the semantics of these new types of nodes and workflows and also how they can solve the workflow interoperability problem. The paper also describes how these new type of workflows can be implemented by a new service called Occopus, and how this service can be integrated with the existing SHIWA Simulation Platform services like the WS-PGRADE/gUSE portal to provide the required functionalities of solving the workflow interoperability problem.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号