期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Scientific workflows for computational reproducibility in the life sciences: Status,challenges and opportunities

《Future Generation Computer Systems》2017

With the development of new experimental technologies, biologists are faced with an avalanche of data to be computationally analyzed for scientific advancements and discoveries to emerge. Faced with the complexity of analysis pipelines, the large number of computational tools, and the enormous amount of data to manage, there is compelling evidence that many if not most scientific discoveries will not stand the test of time: increasing the reproducibility of computed results is of paramount importance.The objective we set out in this paper is to place scientific workflows in the context of reproducibility. To do so, we define several kinds of reproducibility that can be reached when scientific workflows are used to perform experiments. We characterize and define the criteria that need to be catered for by reproducibility-friendly scientific workflow systems, and use such criteria to place several representative and widely used workflow systems and companion tools within such a framework. We also discuss the remaining challenges posed by reproducible scientific workflows in the life sciences. Our study was guided by three use cases from the life science domain involving in silico experiments. 相似文献

2.

事务工作流的建模和分析 总被引：20，自引：0，他引：20

丁柯金蓓弘冯玉琳《计算机学报》2003,26(10):1304-1311

事务工作流由若干个事务组成，其执行满足松弛原子性．只有良构的事务工作流才能保证所有执行均满足松弛原子性．事务具有不同的可补偿特性和可重复特性，在包含多种控制结构的复杂事务工作流中，事务之间组合失配问题可能造成事务工作流的非良构性．该文给出了事务工作流模型及良构性的形式化定义，提出了一个良构性判断定理，通过一种构造性的方法来有效地验证事务工作流的良构性，还设计了事务工作流的描述语言ISWDL并实现了良构性验证器．相似文献

3.

A comparison of using Taverna and BPEL in building scientific workflows: the case of caGrid

Wei Tan Paolo Missier Ian Foster Ravi Madduri David De Roure Carole Goble 《Concurrency and Computation》2010,22(9):1098-1117

When the emergence of ‘service‐oriented science,’ the need arises to orchestrate multiple services to facilitate scientific investigation—that is, to create ‘science workflows.’ We present here our findings in providing a workflow solution for the caGrid service‐based grid infrastructure. We choose BPEL and Taverna as candidates, and compare their usability in the lifecycle of a scientific workflow, including workflow composition, execution, and result analysis. Our experience shows that BPEL as an imperative language offers a comprehensive set of modeling primitives for workflows of all flavors; whereas Taverna offers a dataflow model and a more compact set of primitives that facilitates dataflow modeling and pipelined execution. We hope that this comparison study not only helps researchers to select a language or tool that meets their specific needs, but also offers some insight into how a workflow language and tool can fulfill the requirement of the scientific community. Copyright © 2009 John Wiley & Sons, Ltd. 相似文献

4.

A hybrid evolutionary algorithm for task scheduling and data assignment of data-intensive scientific workflows on clouds

《Future Generation Computer Systems》2017

A growing number of data- and compute-intensive experiments have been modeled as scientific workflows in the last decade. Meanwhile, clouds have emerged as a prominent environment to execute this type of workflows. In this scenario, the investigation of workflow scheduling strategies, aiming at reducing its execution times, became a top priority and a very popular research field. However, few work consider the problem of data file assignment when solving the task scheduling problem. Usually, a workflow is represented by a graph where nodes represent tasks and the scheduling problem consists in allocating tasks to machines to be executed at a predefined time aiming at reducing the makespan of the whole workflow. In this article, we show that the scheduling of scientific workflows can be improved when both task scheduling and the data file assignment problems are treated together. Thus, we propose a new workflow representation, where nodes of the workflow graph represent either tasks or data files, and define the Task Scheduling and Data Assignment Problem (TaSDAP), considering this new model. We formulated this problem as an integer programming problem. Moreover, a hybrid evolutionary algorithm for solving it, named HEA-TaSDAP, is also introduced. To evaluate our approach we conducted two types of experiments: theoretical and practical ones. At first, we compared HEA-TaSDAP with the solutions produced by the mathematical formulation and by other works from related literature. Then, we considered real executions in Amazon EC2 cloud using a real scientific workflow use case (SciPhy for phylogenetic analyses). In all experiments, HEA-TaSDAP outperformed the other classical approaches from the related literature, such as Min–Min and HEFT. 相似文献

5.

A collaborative scheduling approach for service-driven scientific workflow execution

Wanchun Dou J. Leon Zhao Shaokun Fan 《Journal of Computer and System Sciences》2010,76(6):416-427

Scientific workflow execution often spans multiple self-managing administrative domains to obtain specific processing capabilities. Existing (global) analysis techniques tend to mandate every domain-specific application to unveil all private behaviors for scientific collaboration. In practice, it is infeasible for a domain-specific application to disclose its process details (as a private workflow fragment) for privacy or security reasons. Consequently, it is a challenging endeavor to coordinate scientific workflows and its distributed domain-specific applications. To address this problem, we propose a collaborative scheduling approach that can deal with temporal dependencies between a scientific workflow and a private workflow fragment. Under this collaborative scheduling approach, a private workflow fragment could maintain the temporal consistency with a scientific workflow in resource sharing and task enactments. Further, an evaluation is also presented to demonstrate the proposed approach for coordinating multiple scientific workflow executions in a concurrent environment. 相似文献

6.

基于语义的网格工作流复合技术及实现

乔宏曹健《计算机应用与软件》2008,25(6):3-5

作为网格环境的基本服务,网格工作流的大规模共享和重用是使网格成为问题求解的智能化集成环境的一项核心技术。工作流复合技术能够通过已存在的解决方法的重用简化复杂的科学应用问题,而基于语义的工作流复合降低了需要了解工作流语法细节才能重用网格工作流带来的复杂性。首先运用知识表达技术提出一个基于目标概念的网格工作流语义模板,其次介绍了一个实现基于语义的工作流复合的工作流管理原型系统,其中详细阐述了工作流建模过程中基于语义的工作流复合的实现。相似文献

7.

Model-as-you-go: An Approach for an Advanced Infrastructure for Scientific Workflows

Mirko Sonntag Dimka Karastoyanova 《Journal of Grid Computing》2013,11(3):553-583

Most of the existing scientific workflow systems rely on proprietary concepts and workflow languages. We are convinced that the conventional workflow technology that is established in business scenarios for years is also beneficial for scientists and scientific applications. We are therefore working on a scientific workflow system based on business workflow concepts and technologies. The system offers advanced flexibility features to scientists in order to support them in creating workflows in an explorative manner and to increase robustness of scientific applications. We named the approach Model-as-you-go because it enables users to model and execute workflows in an iterative process that eventually results in a complete scientific workflow. In this paper, we present main ingredients of Model-as-you-go, show how existing workflow concepts have to be extended in order to cover the requirements of scientists, discuss the application of the concepts to BPEL, and introduce the current prototype of the system. 相似文献

8.

Formalization of Workflows and Correctness Issues in the Presence of Concurrency

ÏsmaÏlcem Budak Arpinar UĞur Halici Sena Arpinar Asuman DoĞaÇ 《Distributed and Parallel Databases》1999,7(2):199-248

In this paper, main components of a workflow system that are relevant to the correctness in the presence of concurrency are formalized based on set theory and graph theory. The formalization which constitutes the theoretical basis of the correctness criterion provided can be summarized as follows:-Activities of a workflow are represented through a notation based on set theory to make it possible to formalize the conceptual grouping of activities.-Control-flow is represented as a special graph based on this set definition, and it includes serial composition, parallel composition, conditional branching, and nesting of individual activities and conceptual activities themselves.-Data-flow is represented as a directed acyclic graph in conformance with the control-flow graph.The formalization of correctness of concurrently executing workflow instances is based on this framework by defining two categories of constraints on the workflow environment with which the workflow instances and their activities interact. These categories are:-Basic constraints that specify the correct states of a workflow environment.-Inter-activity constraints that define the semantic dependencies among activities such as an activity requiring the validity of a constraint that is set or verified by a preceding activity.Basic constraints graph and inter-activity constraints graph which are in conformance with the control-flow and data-flow graphs are then defined to represent these constraints. These graphs are used in formalizing the intervals among activities where an inter-activity constraint should be maintained and the intervals where a basic constraint remains invalid.A correctness criterion is defined for an interleaved execution of workflow instances using the constraints graphs. A concurrency control mechanism, namely Constraint Based Concurrency Control technique is developed based on the correctness criterion. The performance analysis shows the superiority of the proposed technique. Other possible approaches to the problem are also presented. 相似文献

9.

科学工作流溯源表示和查询技术综述

林晨罗万明阎保平《数据与计算发展前沿》2015,6(6):18-32

溯源管理是科学工作流系统的核心功能之一。科学工作流语境下的溯源,可分为工作流定义溯源和工作流执行溯源,分别描述工作流定义和执行阶段的元数据、过程依赖及数据演化。本文重点关注工作流定义溯源和执行溯源的表示及查询技术,并阐释针对科学工作流领域内独有问题,如"黑盒"问题、依赖区分问题以及细粒度溯源等问题的解决方案。文中还将介绍现存的一些面向科学工作流的溯源系统,并提出对溯源技术未来的展望。相似文献

10.

A Provenance-based Adaptive Scheduling Heuristic for Parallel Scientific Workflows in Clouds

Daniel de Oliveira Kary A. C. S. Oca?a Fernanda Bai?o Marta Mattoso 《Journal of Grid Computing》2012,10(3):521-552

In the last years, scientific workflows have emerged as a fundamental abstraction for structuring and executing scientific experiments in computational environments. Scientific workflows are becoming increasingly complex and more demanding in terms of computational resources, thus requiring the usage of parallel techniques and high performance computing (HPC) environments. Meanwhile, clouds have emerged as a new paradigm where resources are virtualized and provided on demand. By using clouds, scientists have expanded beyond single parallel computers to hundreds or even thousands of virtual machines. Although the initial focus of clouds was to provide high throughput computing, clouds are already being used to provide an HPC environment where elastic resources can be instantiated on demand during the course of a scientific workflow. However, this model also raises many open, yet important, challenges such as scheduling workflow activities. Scheduling parallel scientific workflows in the cloud is a very complex task since we have to take into account many different criteria and to explore the elasticity characteristic for optimizing workflow execution. In this paper, we introduce an adaptive scheduling heuristic for parallel execution of scientific workflows in the cloud that is based on three criteria: total execution time (makespan), reliability and financial cost. Besides scheduling workflow activities based on a 3-objective cost model, this approach also scales resources up and down according to the restrictions imposed by scientists before workflow execution. This tuning is based on provenance data captured and queried at runtime. We conducted a thorough validation of our approach using a real bioinformatics workflow. The experiments were performed in SciCumulus, a cloud workflow engine for managing scientific workflow execution. 相似文献

11.

Composing multiple variability artifacts to assemble coherent workflows

Mathieu Acher Philippe Collet Alban Gaignard Philippe Lahire Johan Montagnat Robert B. France 《Software Quality Journal》2012,20(3-4):689-734

相似文献

12.

网格环境下费用约束的科学工作流可靠调度算法

阎朝坤胡志刚李玺罗慧敏《小型微型计算机系统》2012,33(4):707-711

网格基础设施是目前科学工作流应用规划、部署和执行的主要支撑环境.然而由于网格资源的自治、动态及异构性,如何在保障用户QoS约束下有效调度科学工作流是一个研究热点.针对费用约束下的科学工作流调度问题,为了提高其执行的可靠性,本文使用随机服务模型描述资源节点的动态服务能力并考虑本地任务负载对资源执行性能的影响,给出一种资源可靠性的评估方法,在此基础上提出一种费用约束下的科学工作流可靠调度算法RSASW.仿真实验结果表明RSASW算法相对于GAIN3,GreedyTime-CD及PFAS算法,对工作流的执行具有很好的可靠性保障. 相似文献

13.

面向科学工作流的应用集成框架 总被引：2，自引：0，他引：2

下载免费PDF全文

汪春杰曹健《计算机工程》2009,35(20):258-260

在分析科学工作流的概念和特点的基础上,针对科学工作流中应用集成的需求,提出一个应用集成的统一框架。基于各种应用抽象出统一的模型,在该模型上提出的框架能有效地集成各种应用。基于该模型实现一个科学工作流管理系统。图像处理应用验证了该框架的可行性。相似文献

14.

Zhangbing Zhou Zehui Cheng Yueqin Zhu 《中国科学:信息科学(英文版)》2016,59(11):113101

This article proposes to identify and recommend scientific workflows for reuse and repurposing. Specifically, a scientific workflow is represented as a layer hierarchy that specifies the hierarchical relations between this workflow, its sub-workflows, and activities. Semantic similarity is calculated between layer hierarchies of workflows. A graph-skeleton based clustering technique is adopted for grouping layer hierarchies into clusters. Barycenters in each cluster are identified, which serve as core workflows in this cluster, for facilitating the cluster identification and workflow ranking and recommendation with respect to the requirement of scientists. 相似文献

15.

基于案例的动态科学工作流模型

文元桥余胜生《计算机科学》2007,34(5):120-124

为提高科学工作流对不确定性因素的处理能力,本文建立了一种树状结构的动态科学工作流模型,它通过与基于案例的推理技术相结合,能很好地解决科学工作流对动态性的要求,提高了科学工作流管理系统的自适应性。基于案例推理的重用,为解决科学工作流低重复性问题、实现科学工作流从单个计算步骤到整个流程定义的多层次重用提供了有效的解决手段。相似文献

16.

工作流重构技术研究 总被引：1，自引：0，他引：1

田珂朱清新向培素《计算机科学》2005,32(8):87-90

先进的工作流技术与传统的企业管理信息系统相结合,日益成为提高企业信息化的一个重要手段。目前的工作流是基于模型驱动的,定义一个完整的模型是相当复杂和费时的;而且,实际业务流程同流程模型之间必然存在差异。本文介绍了工作流网,工作流日志的概念;提出了一种基于日志包含的信息来重构业务流程模型的算法,该算法还能处理日志中的干扰信息和有效地度量流程模型和实际业务流程之间的差异。相似文献

17.

A Formal Approach to Support Interoperability in Scientific Meta-workflows

Junaid Arshad Gabor Terstyanszky Tamas Kiss Noam Weingarten Giuliano Taffoni 《Journal of Grid Computing》2016,14(4):655-671

相似文献

18.

基于SharePoint的工作流引擎的实现

下载免费PDF全文

金飞腾赵正德张东卢志国《中国图象图形学报》2006,11(11):1552-1556

工作流管理系统能提高企业的生产效率。而工作流引擎是工作流管理系统的核心。为了对工作流进行更有效的管理，提出了一种新的基于Microsoft的SharePoint协作平台的工作流引擎实现方案，即先利用嵌入C#逻辑代码的XML模板定义流程。再利用Agent服务框架实现引擎的逻辑组件。该方案使用户可轻松实现流程白定义和扩展出其他现有系统的接口。以保证工作流管理系统的灵活性和可扩展性。相似文献

19.

一种基于PDM的工作流管理服务框架的研究

于万钧刘大有李嘉菲《小型微型计算机系统》2005,26(10):1863-1868

工作流管理系统用来支持组织内部或组织之间的业务建模及业务过程的协作运行．在工作流过程定义及工作流实例运行时，针对系统中的数据流处理，提出一个支撑工作流管理系统运行的服务框架，框架中的服务相对独立，同时又互相协作．描述了应用背景和服务的意义并对设计原理进行形式化描述，采用接口表示服务间的协作关系，并给出一个可扩展的实现模式，用该框架设计实现了一个嵌入式工作流管理系统KE／PDM—Workflow．相似文献

20.

Virtual workflow system for distributed collaborative scientific applications on Grids

Lizhe WangAuthor Vitae Dan ChenAuthor Vitae Fang HuangAuthor Vitae 《Computers & Electrical Engineering》2011,37(3):300-310

Grid computing has become an effective computing technique in recent years. This paper develops a virtual workflow system to construct distributed collaborative applications for Grid users. The virtual workflow system consists three levels: abstract workflow system, translator and concrete workflow system. The research highlight of the implementation is that this workflow system is developed based on CORBA and Unicore Grid middleware. Furthermore, this implementation can support legacy application developed with Parco and C++ codes. This virtual workflow system can provide efficient GUI for users to organize distributed scientific collaborative applications and execute them on Grid resources. We present the design, implementation, and evaluation of this virtual workflow system in the paper. 相似文献