期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Infrastructure Aware Scientific Workflows and Infrastructure Aware Workflow Managers in Science Gateways

Peter Kacsuk Gabor Kecskemeti Attila Kertesz Zsolt Nemeth József Kovács Zoltán Farkas 《Journal of Grid Computing》2016,14(4):641-654

The workflow interoperability problem was successfully solved by the SHIWA project if the workflows to be integrated were running in the same grid infrastructure. However, in the more generic case when the workflows were running in different infrastructures the problem has not been solved yet. In the current paper we show a solution for this problem by introducing a new type of workflow called infrastructure-aware workflow. These are scientific workflows extended with new node types that enable the on-the-fly creation and destruction of the required infrastructures in the clouds. The paper shows the semantics of these new types of nodes and workflows and also how they can solve the workflow interoperability problem. The paper also describes how these new type of workflows can be implemented by a new service called Occopus, and how this service can be integrated with the existing SHIWA Simulation Platform services like the WS-PGRADE/gUSE portal to provide the required functionalities of solving the workflow interoperability problem. 相似文献

2.

A Case Study into Using Common Real-Time Workflow Monitoring Infrastructure for Scientific Workflows

Karan Vahi Ian Harvey Taghrid Samak Daniel Gunter Kieran Evans David Rogers Ian Taylor Monte Goode Fabio Silva Eddie Al-Shakarchi Gaurang Mehta Ewa Deelman Andrew Jones 《Journal of Grid Computing》2013,11(3):381-406

Scientific workflow systems support various workflow representations, operational modes, and configurations. Regardless of the system used, end users have common needs: to track the status of their workflows in real time, be notified of execution anomalies and failures automatically, perform troubleshooting, and automate the analysis of the workflow results. In this paper, we describe how the Stampede monitoring infrastructure was integrated with the Pegasus Workflow Management System and the Triana Workflow Systems, in order to add generic real time monitoring and troubleshooting capabilities across both systems. Stampede is an infrastructure that provides interoperable monitoring using a three-layer model: (1) a common data model to describe workflow and job executions; (2) high-performance tools to load workflow logs conforming to the data model into a data store; and (3) a common query interface. This paper describes the integration of Stampede monitoring architecture with Pegasus and Triana and shows the new analysis capabilities that Stampede provides to these workflow systems. The successful integration of Stampede with these workflow engines demonstrates the generic nature of the Stampede monitoring infrastructure and its potential to provide a common platform for monitoring across scientific workflow engines. 相似文献

3.

A Dynamic Cloud Dimensioning Approach for Parallel Scientific Workflows: a Case Study in the Comparative Genomics Domain

Rafaelli Coutinho Yuri Frota Kary Ocaña Daniel de Oliveira Lúcia M. A. Drummond 《Journal of Grid Computing》2016,14(3):443-461

Usually, scientists need to execute experiments that demand high performance computing environments and parallel techniques. This is the scenario found in many bioinformatics experiments modeled as scientific workflows, such as phylogenetic and phylogenomic analyses. To execute these experiments, scientists have adopted virtual machines (VMs) instantiated in clouds. Estimating the number of VMs to instantiate is a crucial task to avoid negative impacts on the execution performance and on the financial costs with under or overestimations. Previously, the necessary number of VMs to execute bioinformatics workflows have been estimated by a GRASP heuristic and have been coupled to a Cloud-based Parallel Scientific Workflow Management System. Although this work was a step forward, this approach only provided a static dimensioning. If the characteristics of the environment change (processing capacity, network speed), this static dimensioning may not be suitable. In this way, it is of interest that the dimensioning is adjusted at runtime. To achieve this, we developed a novel framework for monitoring and dynamically dimensioning resources during the execution of parallel scientific workflows in clouds, called Dynamic Dimensioning of Cloud Computing Framework (DDC-F). We have evaluated DDC-F in real executions of bioinformatics workflows. Experiments showed that DDC-F is able to efficiently calculate the number of VMs necessary to execute bioinformatics workflows of Comparative Genomics (CG), also reducing the financial costs, when compared with other works of the related literature. 相似文献

4.

Supporting Scientists' Everyday Work: Automating Scientific Workflows 总被引：1，自引：0，他引：1

Vigder Mark Vinson Norman G. Singer Janice Stewart Darlene Mews Keith 《Software, IEEE》2008,25(4):52-58

An action research project involving scientists from the National Research Council Canada and the Institute for Ocean Technology analyzed difficulties in using software to collect data and manage processes. The project identified three requirements for increasing research productivity: ease of use for end users, managing scientific workflows, and facilitating software interoperability. On the basis of these requirements, the researchers developed Sweet, a software framework, to help automate scientific workflows. 相似文献

5.

Examining the Challenges of Scientific Workflows 总被引：3，自引：0，他引：3

Gil Y. Deelman E. Ellisman M. Fahringer T. Fox G. Gannon D. Goble C. Livny M. Moreau L. Myers J. 《Computer》2007,40(12):24-32

Workflows have emerged as a paradigm for representing and managing complex distributed computations and are used to accelerate the pace of scientific progress. A recent National Science Foundation workshop brought together domain, computer, and social scientists to discuss requirements of future scientific applications and the challenges they present to current workflow technologies. 相似文献

6.

Distributed Throughput Optimization for Large-Scale Scientific Workflows Under Fault-Tolerance Constraint

Yi Gu Chase Qishi Wu Xin Liu Dantong Yu 《Journal of Grid Computing》2013,11(3):361-379

With the advent of next-generation scientific applications, the workflow approach that integrates various computing and networking technologies has provided a viable solution to managing and optimizing large-scale distributed data transfer, processing, and analysis. This paper investigates a problem of mapping distributed scientific workflows for maximum throughput in faulty networks where nodes and links are subject to probabilistic failures. We formulate this problem as a bi-objective optimization problem to maximize both throughput and reliability. By adapting and modifying a centralized fault-free workflow mapping scheme, we propose a new mapping algorithm to achieve high throughput for smooth data flow in a distributed manner while satisfying a pre-specified bound of the overall failure rate for a guaranteed level of reliability. The performance superiority of the proposed solution is illustrated by both extensive simulation-based comparisons with existing algorithms and experimental results from a real-life scientific workflow deployed in wide-area networks. 相似文献

7.

支持虚拟组织的语义基础设施的动态构建方法研究 总被引：3，自引：0，他引：3

陈旺虎刘晨李厚福王建武《计算机学报》2006,29(7):1127-1136

提出一种从虚拟组织自治域的资源描述中抽取语义,然后聚合为虚拟组织的语义基础设施的方法．该方法引入了一种领域知识学习算法,用以建立当前语境相关的词法空间,以提高语义抽取和聚合的准确性及自动化程度,并且在语义聚合的过程中隐含了虚拟组织语义到自治域语义的映射,更好的支持了虚拟组织应用的构建和跨自治域资源的透明访问．实验表明,该方法能够适应虚拟组织的动态开放环境、有效支持虚拟组织的语义基础设施构建．相似文献

8.

Scientific Software as Workflows: From Discovery to Distribution 总被引：1，自引：0，他引：1

Woollard David Medvidovic Nenad Gil Yolanda Mattmann Chris A. 《Software, IEEE》2008,25(4):37-43

相似文献

9.

多云工作流优化传输费用的数据布局策略

马飞《数字社区&智能家居》2014,(4):2418-2420

科学工作流应用是一种复杂且数据密集型的应用,常应用于结构生物学、高能物理学和神经学等涉及分布式数据源的学科。数据分散存储在基于互联网的云计算平台上,致使科学工作流在执行时伴随着大量的数据传输。云计算是一种按使用量付费的模式,数据传输产生传输费用,尤其在多个工作流相互协同的情况下,将产生更高的传输成本。该文从全局的角度建立基于多工作流数据依赖图的传输成本模型,研究基于二进制粒子群算法（BPSO）的数据布局优化策略,从而减少对云计算传输资源的租赁费用。相似文献

10.

Composing RESTful Services and Collaborative Workflows: A Lightweight Approach

Rosenberg Florian Curbera Francisco Duftler Matthew J. Khalaf Rania 《Internet Computing, IEEE》2008,12(5):24-31

The use of RESTful Web services has gained momentum in the development of distributed applications based on traditional Web standards such as HTTP. In particular, these services can integrate easily into various applications, such as mashups. Composing RESTful services into Web-scale workflows requires a lightweight composition language that's capable of describing both the control and data flow that constitute a workflow. The authors address these issues with Bite, a lightweight and extensible composition language that enables the creation of Web-scale workflows and uses RESTful services as its main composable entities. 相似文献

11.

Verifying Workflows with Cancellation Regions and OR-joins: An Approach Based on Relaxed Soundness and Invariants 总被引：1，自引：0，他引：1

Verbeek H.M.W.; van der Aalst W.M.P.; ter Hofstede A.H.M. 《Computer Journal》2007,50(3):294-314

相似文献

12.

Data-Oriented Scheduling with Dynamic-Clustering Fault-Tolerant Technique for Scientific Workflows in Clouds

Ahmad Z. Jehangiri A. I. Iftikhar M. Umer A. I. Afzal I. 《Programming and Computer Software》2019,45(8):506-516

Programming and Computer Software - Cloud computing is one of the most prominent parallel and distributed computing paradigm. It is used for providing solution to a huge number of scientific and... 相似文献

13.

Overhead Analysis of Scientific Workflows in Grid Environments

Prodan R. Fahringer T. 《Parallel and Distributed Systems, IEEE Transactions on》2008,19(3):378-393

Scientific workflows are a topic of great interest in the grid community that sees in the workflow model an attractive paradigm for programming distributed wide-area grid infrastructures. Traditionally, the grid workflow execution is approached as a pure best effort scheduling problem that maps the activities onto the grid processors based on appropriate optimization or local matchmaking heuristics such that the overall execution time is minimized. Even though such heuristics often deliver effective results, the execution in dynamic and unpredictable grid environments is prone to severe performance losses that must be understood for minimizing the completion time or for the efficient use of high-performance resources. In this paper, we propose a new systematic approach to help the scientists and middleware developers understand the most severe sources of performance losses that occur when executing scientific workflows in dynamic grid environments. We introduce an ideal model for the lowest execution time that can be achieved by a workflow and explain the difference to the real measured grid execution time based on a hierarchy of performance overheads for grid computing. We describe how to systematically measure and compute the overheads from individual activities to larger workflow regions and adjust well-known parallel processing metrics to the scope of grid computing, including speedup and efficiency. We present a distributed online tool for computing and analyzing the performance overheads in real time based on event correlation techniques and introduce several performance contracts as quality-of-service parameters to be enforced during the workflow execution beyond traditional best effort practices. We illustrate our method through postmortem and online performance analysis of two real-world workflow applications executed in the Austrian grid environment. 相似文献

14.

A Provenance-based Adaptive Scheduling Heuristic for Parallel Scientific Workflows in Clouds

Daniel de Oliveira Kary A. C. S. Oca?a Fernanda Bai?o Marta Mattoso 《Journal of Grid Computing》2012,10(3):521-552

In the last years, scientific workflows have emerged as a fundamental abstraction for structuring and executing scientific experiments in computational environments. Scientific workflows are becoming increasingly complex and more demanding in terms of computational resources, thus requiring the usage of parallel techniques and high performance computing (HPC) environments. Meanwhile, clouds have emerged as a new paradigm where resources are virtualized and provided on demand. By using clouds, scientists have expanded beyond single parallel computers to hundreds or even thousands of virtual machines. Although the initial focus of clouds was to provide high throughput computing, clouds are already being used to provide an HPC environment where elastic resources can be instantiated on demand during the course of a scientific workflow. However, this model also raises many open, yet important, challenges such as scheduling workflow activities. Scheduling parallel scientific workflows in the cloud is a very complex task since we have to take into account many different criteria and to explore the elasticity characteristic for optimizing workflow execution. In this paper, we introduce an adaptive scheduling heuristic for parallel execution of scientific workflows in the cloud that is based on three criteria: total execution time (makespan), reliability and financial cost. Besides scheduling workflow activities based on a 3-objective cost model, this approach also scales resources up and down according to the restrictions imposed by scientists before workflow execution. This tuning is based on provenance data captured and queried at runtime. We conducted a thorough validation of our approach using a real bioinformatics workflow. The experiments were performed in SciCumulus, a cloud workflow engine for managing scientific workflow execution. 相似文献

15.

An Evaluation of the Cost and Performance of Scientific Workflows on Amazon EC2 总被引：1，自引：0，他引：1

Gideon Juve Ewa Deelman G. Bruce Berriman Benjamin P. Berman Philip Maechling 《Journal of Grid Computing》2012,10(1):5-21

Workflows are used to orchestrate data-intensive applications in many different scientific domains. Workflow applications typically communicate data between processing steps using intermediate files. When tasks are distributed, these files are either transferred from one computational node to another, or accessed through a shared storage system. As a result, the efficient management of data is a key factor in achieving good performance for workflow applications in distributed environments. In this paper we investigate some of the ways in which data can be managed for workflows in the cloud. We ran experiments using three typical workflow applications on Amazon’s EC2 cloud computing platform. We discuss the various storage and file systems we used, describe the issues and problems we encountered deploying them on EC2, and analyze the resulting performance and cost of the workflows. 相似文献

16.

Negotiation-Based Scheduling of Scientific Grid Workflows Through Advance Reservations

Radu Prodan Marek Wieczorek 《Journal of Grid Computing》2010,8(4):493-510

In its broadest sense, scheduling of Grid applications can be viewed as a negotiation process between a scheduling service optimising user-centric objectives such as execution time, and a resource manager optimising provider-centric metrics such as resource utilisation or fairness. In this paper we enhance an existing list scheduling algorithm designed for minimising the workflow makespan with advance reservation-based negotiation functionality. As an instantiation of the new negotiation phase, we investigate two advance reservation functionality from the resource provider perspective: attentive and progressive. We illustrate through real-world experiments a two-fold benefit of our approach: improved execution predictability from the user’s perspective, and higher resource utilisation fairness through a new progressive allocation strategy from the provider’s perspective. 相似文献

17.

Fine-Grain Interoperability of Scientific Workflows in Distributed Computing Infrastructures

Kassian Plankensteiner Radu Prodan Matthias Janetschek Thomas Fahringer Johan Montagnat David Rogers Ian Harvey Ian Taylor Ákos Balaskó Péter Kacsuk 《Journal of Grid Computing》2013,11(3):429-455

Today there exist a wide variety of scientific workflow management systems, each designed to fulfill the needs of a certain scientific community. Unfortunately, once a workflow application has been designed in one particular system it becomes very hard to share it with users working with different systems. Portability of workflows and interoperability between current systems barely exists. In this work, we present the fine-grained interoperability solution proposed in the SHIWA European project that brings together four representative European workflow systems: ASKALON, MOTEUR, WS-PGRADE, and Triana. The proposed interoperability is realised at two levels of abstraction: abstract and concrete. At the abstract level, we propose a generic Interoperable Workflow Intermediate Representation (IWIR) that can be used as a common bridge for translating workflows between different languages independent of the underlying distributed computing infrastructure. At the concrete level, we propose a bundling technique that aggregates the abstract IWIR representation and concrete task representations to enable workflow instantiation, execution and scheduling. We illustrate case studies using two real-workflow applications designed in a native environment and then translated and executed by a foreign workflow system in a foreign distributed computing infrastructure. 相似文献

18.

一个基于元数据导航的服务工作流装配模型 总被引：3，自引：0，他引：3

王月龙王文俊罗英伟汪小林许卓群《计算机学报》2006,29(7):1105-1115

以城市应急处置业务作为应用背景,提出了一个工作流的分层概念模型和一个与之关联的元数据分层描述规范,在此基础上构建了一个基于元数据导航的、从高层的业务应用到底层的分布、动态资源的逐层绑定的运行机制,基于该机制实现了一个城市应急联动系统IERS(Integrated Emergency Response System)的实验原型．该机制解决了从应急处置业务工作流到底层分布服务和资源的装配问题,增强了应急系统对动态、分布服务环境的支持力度,提高了应急处置业务在执行过程中的自动化水平和自适应性,同时也分解和简化了工作流问题的复杂度．相似文献

19.

Science in the Cloud: Allocation and Execution of Data-Intensive Scientific Workflows

Claudia Szabo Quan Z. Sheng Trent Kroeger Yihong Zhang Jian Yu 《Journal of Grid Computing》2014,12(2):245-264

An important challenge for the adoption of cloud computing in the scientific community remains the efficient allocation and execution of data-intensive scientific workflows to reduce execution time and the size of transferred data. The transferred data overhead is becoming significant with emerging scientific workflows that have input/output files and intermediate data products ranging in the hundreds of gigabytes. The allocation of scientific workflows on public clouds can be described through a variety of perspectives and parameters, and has been proved to be NP-complete. This paper proposes an evolutionary approach for task allocation on public clouds considering data transfer and execution time. In our framework, a solution is represented using an allocation chromosome that encodes the allocation of tasks to nodes, and an ordering chromosome that defines the execution order according to the scientific workflow representation. We propose a multi-objective optimization that relies on a cloud cost model and employs tailored evolution operators. Starting from a population of possible solutions, we employ crossover and mutation operators on both chromosomes aiming at optimizing the data transferred between nodes as well as the total workflow runtime. The crossover operators combine parts of solutions to reduce data overhead, whereas the mutation operators swamp between parts of the same chromosome according to pre-defined rules. Our experimental study compares between the proposed approach and current state-of-the art approaches using synthetic and real-life workflows. Our algorithm performs similarly to existing heuristics for small workflows and shows up to 80 % improvements for larger synthetic workflows. To further validate our approach we compare between the allocation and scheduling obtained by our approach with that obtained by popular scientific workflow managers, when real workflows with hundreds of tasks are executed on a public cloud. The results show a 10 % improvement in runtime over existing schedulers, caused by a 80 % reduction in transferred data and optimized allocation and ordering of tasks. This improved data locality has greater impact as it can be employed to improve and study data provenance and facilitate data persistence for scientific workflows. 相似文献

20.

Workflows for Heliophysics

Anja Le Blanc John Brooke Donal Fellows Marco Soldati David Pérez-Suárez Alessandro Marassi Andrej Santin 《Journal of Grid Computing》2013,11(3):481-503

In this paper we describe how we have introduced workflows into the working practices of a community for whom the concept of workflows is very new, namely the heliophysics community. Heliophysics is a branch of astrophysics which studies the Sun and the interactions between the Sun and the planets, by tracking solar events as they travel throughout the Solar system. Heliophysics produces two major challenges for workflow technology. Firstly it is a systems science where research is currently developed by many different communities who need reliable data models and metadata to be able to work together. Thus it has major challenges in the semantics of workflows. Secondly, the problem of time is critical in heliophysics; the workflows must take account of the propagation of events outwards from the sun. They have to address the four dimensional nature of space and time in terms of the indexing of data. We discuss how we have built an environment for Heliophysics workflows building on and extending the Taverna workflow system and utilising the myExperiment site for sharing workflows. We also describe how we have integrated the workflows into the existing practices of the communities involved in Heliophysics by developing a web portal which can hide the technical details from the users, who can concentrate on the data from their scientific point of view rather than on the methods used to integrate and process the data. This work has been developed in the EU Framework 7 project HELIO, and is being disseminated to the worldwide Heliophysics community, since Heliophysics requires integration of effort on a global scale. 相似文献