期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Common motifs in scientific workflows: An empirical analysis

《Future Generation Computer Systems》2014

Workflow technology continues to play an important role as a means for specifying and enacting computational experiments in modern science. Reusing and re-purposing workflows allow scientists to do new experiments faster, since the workflows capture useful expertise from others. As workflow libraries grow, scientists face the challenge of finding workflows appropriate for their task, understanding what each workflow does, and reusing relevant portions of a given workflow. We believe that workflows would be easier to understand and reuse if high-level views (abstractions) of their activities were available in workflow libraries. As a first step towards obtaining these abstractions, we report in this paper on the results of a manual analysis performed over a set of real-world scientific workflows from Taverna, Wings, Galaxy and Vistrails. Our analysis has resulted in a set of scientific workflow motifs that outline (i) the kinds of data-intensive activities that are observed in workflows (Data-Operation motifs), and (ii) the different manners in which activities are implemented within workflows (Workflow-Oriented motifs). These motifs are helpful to identify the functionality of the steps in a given workflow, to develop best practices for workflow design, and to develop approaches for automated generation of workflow abstractions. 相似文献

2.

Improving workflow modularity using a concern-specific layer on top of Unify

《Science of Computer Programming》2014

Workflows are a popular means of automating processes in many domains, ranging from high-level business process modeling to lower-level web service orchestration. However, state-of-the-art workflow languages offer a limited set of modularization mechanisms. This results in monolithic workflow specifications, in which different concerns are scattered across the workflow and tangled with one another. This hinders the design, evolution, and reusability of workflows expressed in these languages. We address this problem through the Unify framework. This framework enables uniform modularization of workflows by supporting the specification of all workflow concerns – including crosscutting ones – in isolation of each other. These independently specified workflow concerns are connected to each other using workflow-specific connectors. In order to further facilitate the development of workflows, we enable the definition of concern-specific languages (CSLs) on top of the Unify framework. A CSL facilitates the expression of a family of workflow concerns by offering abstractions that map well to the concerns' domain. Thus, domain experts can add concerns to a workflow using concern-specific language constructs. We exemplify the specification of a workflow in Unify, and show the definition and application of two concern-specific languages built on top of Unify. 相似文献

3.

A characterization of workflow management systems for extreme-scale applications

《Future Generation Computer Systems》2017

Automation of the execution of computational tasks is at the heart of improving scientific productivity. Over the last years, scientific workflows have been established as an important abstraction that captures data processing and computation of large and complex scientific applications. By allowing scientists to model and express entire data processing steps and their dependencies, workflow management systems relieve scientists from the details of an application and manage its execution on a computational infrastructure. As the resource requirements of today’s computational and data science applications that process vast amounts of data keep increasing, there is a compelling case for a new generation of advances in high-performance computing, commonly termed as extreme-scale computing, which will bring forth multiple challenges for the design of workflow applications and management systems. This paper presents a novel characterization of workflow management systems using features commonly associated with extreme-scale computing applications. We classify 15 popular workflow management systems in terms of workflow execution models, heterogeneous computing environments, and data access methods. The paper also surveys workflow applications and identifies gaps for future research on the road to extreme-scale workflows and management systems. 相似文献

4.

Abstract,link, publish,exploit: An end to end framework for workflow sharing

《Future Generation Computer Systems》2017

Scientific workflows are increasingly used to manage and share scientific computations and methods to analyze data. A variety of systems have been developed that store the workflows executed and make them part of public repositories However, workflows are published in the idiosyncratic format of the workflow system used for the creation and execution of the workflows. Browsing, linking and using the stored workflows and their results often becomes a challenge for scientists who may only be familiar with one system. In this paper we present an approach for addressing this issue by publishing and exploiting workflows as data on the Web with a representation that is independent from the workflow system used to create them. In order to achieve our goal, we follow the Linked Data Principles to publish workflow inputs, intermediate results, outputs and codes; and we reuse and extend well established standards like W3C PROV. We illustrate our approach by publishing workflows and consuming them with different tools designed to address common scenarios for workflow exploitation. 相似文献

5.

Scientific workflows for computational reproducibility in the life sciences: Status,challenges and opportunities

《Future Generation Computer Systems》2017

With the development of new experimental technologies, biologists are faced with an avalanche of data to be computationally analyzed for scientific advancements and discoveries to emerge. Faced with the complexity of analysis pipelines, the large number of computational tools, and the enormous amount of data to manage, there is compelling evidence that many if not most scientific discoveries will not stand the test of time: increasing the reproducibility of computed results is of paramount importance.The objective we set out in this paper is to place scientific workflows in the context of reproducibility. To do so, we define several kinds of reproducibility that can be reached when scientific workflows are used to perform experiments. We characterize and define the criteria that need to be catered for by reproducibility-friendly scientific workflow systems, and use such criteria to place several representative and widely used workflow systems and companion tools within such a framework. We also discuss the remaining challenges posed by reproducible scientific workflows in the life sciences. Our study was guided by three use cases from the life science domain involving in silico experiments. 相似文献

6.

基于任务分配和数据集副本的科学工作流数据布局

尚蕾刘茜萍《计算机工程》2020,46(5):122-130,138

云环境下科学工作流的数据布局成为当前工作流研究中的一个热点问题,对科学工作流中任务和数据之间多对多关系进行分析,可以发现不同数据布局方案在数据传输上的费用各不相同,在很大程度上影响工作流的运行成本。为降低科学工作流数据集传输费用,提出一种基于任务分配和数据集副本的科学工作流数据布局方法。该方法从任务分配开始,在定量计算任务依赖度的基础上进行任务分配,根据分配结果给出一个基于数据集副本的两阶段数据布局方法,以实现科学工作流运行中传输费用的优化。实例结果表明,与工作流层方法相比,该方法可以有效降低科学工作流的运行成本。相似文献

7.

Campus Grids Meet Applications: Modeling, Metascheduling and Integration

Yonghong Yan Barbara M. Chapman 《Journal of Grid Computing》2006,4(2):159-175

Air Quality Forecasting (AQF) is a new discipline that attempts to reliably predict atmospheric pollution. An AQF application has complex workflows and in order to produce timely and reliable forecast results, each execution requires access to diverse and distributed computational and storage resources. Deploying AQF on Grids is one option to satisfy such needs, but requires the related Grid middleware to support automated workflow scheduling and execution on Grid resources. In this paper, we analyze the challenges in deploying an AQF application in a campus Grid environment and present our current efforts to develop a general solution for Grid-enabling scientific workflow applications in the GRACCE project. In GRACCE, an application’s workflow is described using GAMDL, a powerful dataflow language for describing application logic. The GRACCE metascheduling architecture provides the functionalities required for co-allocating Grid resources for workflow tasks, scheduling the workflows and monitoring their execution. By providing an integrated framework for modeling and metascheduling scientific workflow applications on Grid resources, we make it easy to build a customized environment with end-to-end support for application Grid deployment, from the management of an application and its dataset, to the automatic execution and analysis of its results.The work has been performed as part of the University of Houston’s Sun Microsystems Center of Excellence in Geosciences [38]. 相似文献

8.

Science in the Cloud: Allocation and Execution of Data-Intensive Scientific Workflows

Claudia Szabo Quan Z. Sheng Trent Kroeger Yihong Zhang Jian Yu 《Journal of Grid Computing》2014,12(2):245-264

An important challenge for the adoption of cloud computing in the scientific community remains the efficient allocation and execution of data-intensive scientific workflows to reduce execution time and the size of transferred data. The transferred data overhead is becoming significant with emerging scientific workflows that have input/output files and intermediate data products ranging in the hundreds of gigabytes. The allocation of scientific workflows on public clouds can be described through a variety of perspectives and parameters, and has been proved to be NP-complete. This paper proposes an evolutionary approach for task allocation on public clouds considering data transfer and execution time. In our framework, a solution is represented using an allocation chromosome that encodes the allocation of tasks to nodes, and an ordering chromosome that defines the execution order according to the scientific workflow representation. We propose a multi-objective optimization that relies on a cloud cost model and employs tailored evolution operators. Starting from a population of possible solutions, we employ crossover and mutation operators on both chromosomes aiming at optimizing the data transferred between nodes as well as the total workflow runtime. The crossover operators combine parts of solutions to reduce data overhead, whereas the mutation operators swamp between parts of the same chromosome according to pre-defined rules. Our experimental study compares between the proposed approach and current state-of-the art approaches using synthetic and real-life workflows. Our algorithm performs similarly to existing heuristics for small workflows and shows up to 80 % improvements for larger synthetic workflows. To further validate our approach we compare between the allocation and scheduling obtained by our approach with that obtained by popular scientific workflow managers, when real workflows with hundreds of tasks are executed on a public cloud. The results show a 10 % improvement in runtime over existing schedulers, caused by a 80 % reduction in transferred data and optimized allocation and ordering of tasks. This improved data locality has greater impact as it can be employed to improve and study data provenance and facilitate data persistence for scientific workflows. 相似文献

9.

Scientific workflows in data analysis: Bridging expertise across multiple domains

《Future Generation Computer Systems》2017

In this paper, we demonstrate the use of scientific workflows in bridging expertise across multiple domains by re-purposing workflow fragments in the areas of text analysis, image analysis, and analysis of activity in video. We highlight how the reuse of workflows allows scientists to link across disciplines and avail themselves of the benefits of inter-disciplinary research beyond their normal area of expertise. In addition, we present in-depth studies of various tasks, including tasks for text analysis, multimedia analysis involving both images and text, video activity analysis, and analysis of artistic style using deep learning. These tasks show how the re-use of workflow fragments can turn a pre-existing, rudimentary approach into an expert-grade analysis. We also examine how workflow fragments save time and effort while amalgamating expertise in multiple areas such as machine learning and computer vision. 相似文献

10.

Software architectures to integrate workflow engines in science gateways

《Future Generation Computer Systems》2017

Science gateways often rely on workflow engines to execute applications on distributed infrastructures. We investigate six software architectures commonly used to integrate workflow engines into science gateways. In tight integration, the workflow engine shares software components with the science gateway. In service invocation, the engine is isolated and invoked through a specific software interface. In task encapsulation, the engine is wrapped as a computing task executed on the infrastructure. In the pool model, the engine is bundled in an agent that connects to a central pool to fetch and execute workflows. In nested workflows, the engine is integrated as a child process of another engine. In workflow conversion, the engine is integrated through workflow language conversion. We describe and evaluate these architectures with metrics for assessment of integration complexity, robustness, extensibility, scalability and functionality. Tight integration and task encapsulation are the easiest to integrate and the most robust. Extensibility is equivalent in most architectures. The pool model is the most scalable one and meta-workflows are only available in nested workflows and workflow conversion. These results provide insights for science gateway architects and developers. 相似文献

11.

Static analysis of Taverna workflows to predict provenance patterns

《Future Generation Computer Systems》2017

相似文献

12.

Rule-based workflow management for bioinformatics

John S. Conery Julian M. Catchen Michael Lynch 《The VLDB Journal The International Journal on Very Large Data Bases》2005,14(3):318-329

相似文献

13.

Workflow analysis with communication norms

Hans Aldo 《Data & Knowledge Engineering》2003,47(3):349-369

The language/action perspective (LAP) as orginally introduced by Winograd and Flores has inspired several tools and information system design methodologies. The goal of this article is to make the communication norms underlying various LAP workflow loop models (DEMO, ActionWorkflow) explicit and to contrast them with the auditing norms of internal control. It appears that the communicative action paradigm embedded in DEMO and the customer satisfaction orientation of ActionWorkflow lead to norms which resemble the ones required by internal control, but there are some important differences. For that reason, we propose an extended workflow loop model that distinguishes between customer relations and agency relations. Whereas current LAP approaches do not take agency relations explicitly into account, the extended workflow loop model allows us to analyze the effects of delegation on communicative processes. A framework is offered for the normative analysis of workflows based on a number of formalized communication norms. 相似文献

14.

Deriving scientific workflows from algebraic experiment lines: A practical approach

《Future Generation Computer Systems》2017

The exploratory nature of a scientific computational experiment involves executing variations of the same workflow with different approaches, programs, and parameters. However, current approaches do not systematize the derivation process from the experiment definition to the concrete workflows and do not track the experiment provenance down to the workflow executions. Therefore, the composition, execution, and analysis for the entire experiment become a complex task. To address this issue, we propose the Algebraic Experiment Line (AEL). AEL uses a data-centric workflow algebra, which enriches the experiment representation by introducing a uniform data model and its corresponding operators. This representation and the AEL provenance model map concepts from the workflow execution data to the AEL derived workflows with their corresponding experiment abstract definitions. We show how AEL has improved the understanding of a real experiment in the bioinformatics area. By combining provenance data from the experiment and its corresponding executions, AEL provenance queries navigate from experiment concepts defined at high abstraction level to derived workflows and their execution data. It also shows a direct way of querying results from different trials involving activity variations and optionalities, only present at the experiment level of abstraction. 相似文献

15.

启发式规则与GA结合的优化方法求解工作流动态调度优化问题

肖志娇常会友衣杨《计算机科学》2007,34(2):157-160

调度是工作流管理系统的核心问题,是保证工作流正确运行的关键。在工作流环境下,动态调度要比静态调度更切合实际。本文在总结前人工作的基础上,提出了一系列工作流动态调度的启发式规则,并以最小化任务总拖期时间和最大化任务总提前时间为目标,建立了工作流动态调度问题模型。采用启发式规则与遗传算法相结合的优化方法求解工作流动态调度优化问题。仿真结果说明了优化方法的可行性和有效性,同时比较了该方法与多种静态调度方法,进而说明了该方法的优越性。相似文献

16.

WIRES: A Methodology for Developing Workflow Applications

Fabio Casati Maria Grazia Fugini Isabelle Mirbel Barbara Pernici 《Requirements Engineering》2002,7(2):73-106

Workflow management systems are becoming a relevant support for a large class of business applications, and many workflow models as well as commercial products are currently available. While the large availability of tools facilitates the development and the fulfilment of customer requirements, workflow application development still requires methodological guidelines that drive the developers in the complex task of rapidly producing effective applications. In fact, it is necessary to identify and model the business processes, to design the interfaces towards existing cooperating systems, and to manage implementation aspects in an integrated way. This paper presents the WIRES methodology for developing workflow applications under a uniform modelling paradigm – UML modelling tools with some extensions – that covers all the life cycle of these applications: from conceptual analysis to implementation. High-level analysis is performed under different perspectives, including a business and an organisational perspective. Distribution, interoperability and cooperation with external information systems are considered in this early stage. A set of “workflowability” criteria is provided in order to identify which candidate processes are suited to be implemented as workflows. Non-functional requirements receive particular emphasis in that they are among the most important criteria for deciding whether workflow technology can be actually useful for implementing the business process at hand. The design phase tackles aspects of concurrency and cooperation, distributed transactions and exception handling. Reuse of component workflows, available in a repository as workflow fragments, is a distinguishing feature of the method. Implementation aspects are presented in terms of rules that guide in the selection of a commercial workflow management system suitable for supporting the designed processes, coupled with guidelines for mapping the designed workflows onto the model offered by the selected system. 相似文献

17.

A method of workflow scheduling based on colored Petri nets 总被引：1，自引：0，他引：1

Zhijiao Xiao Author VitaeZhong MingAuthor Vitae 《Data & Knowledge Engineering》2011,70(2):230-247

Effective methods of workflow scheduling can improve the performance of workflow systems. Based on the study of existing scheduling methods, a method of workflow scheduling, called phased method, is proposed. This method is based on colored Petri nets. Activities of workflows are divided into several groups to be scheduled in different phases using this method. Details of the method are discussed. Experimental results show that the proposed method can deal with the uncertainties and the dynamic circumstances very well and a satisfactory balance can be achieved between static global optimization and dynamic local optimization. 相似文献

18.

Scripting distributed scientific workflows using Weaver

Peter Bui Li Yu Andrew Thrasher Rory Carmichael Irena Lanc Patrick Donnelly Douglas Thain 《Concurrency and Computation》2012,24(15):1685-1707

相似文献

19.

Using ontologies for verification and validation of workflow-based experiments

《Journal of Web Semantics》2017

相似文献

20.

Using imbalance metrics to optimize task clustering in scientific workflow executions

《Future Generation Computer Systems》2015

Scientific workflows can be composed of many fine computational granularity tasks. The runtime of these tasks may be shorter than the duration of system overheads, for example, when using multiple resources of a cloud infrastructure. Task clustering is a runtime optimization technique that merges multiple short running tasks into a single job such that the scheduling overhead is reduced and the overall runtime performance is improved. However, existing task clustering strategies only provide a coarse-grained approach that relies on an over-simplified workflow model. In this work, we examine the reasons that cause Runtime Imbalance and Dependency Imbalance in task clustering. Then, we propose quantitative metrics to evaluate the severity of the two imbalance problems. Furthermore, we propose a series of task balancing methods (horizontal and vertical) to address the load balance problem when performing task clustering for five widely used scientific workflows. Finally, we analyze the relationship between these metric values and the performance of proposed task balancing methods. A trace-based simulation shows that our methods can significantly decrease the runtime of workflow applications when compared to a baseline execution. We also compare the performance of our methods with two algorithms described in the literature. 相似文献