期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Mapping Abstract Complex Workflows onto Grid Environments 总被引：18，自引：0，他引：18

Ewa Deelman James Blythe Yolanda Gil Carl Kesselman Gaurang Mehta Karan Vahi Kent Blackburn Albert Lazzarini Adam Arbree Richard Cavanaugh Scott Koranda 《Journal of Grid Computing》2003,1(1):25-39

In this paper we address the problem of automatically generating job workflows for the Grid. These workflows describe the execution of a complex application built from individual application components. In our work we have developed two workflow generators: the first (the Concrete Workflow Generator CWG) maps an abstract workflow defined in terms of application-level components to the set of available Grid resources. The second generator (Abstract and Concrete Workflow Generator, ACWG) takes a wider perspective and not only performs the abstract to concrete mapping but also enables the construction of the abstract workflow based on the available components. This system operates in the application domain and chooses application components based on the application metadata attributes. We describe our current ACWG based on AI planning technologies and outline how these technologies can play a crucial role in developing complex application workflows in Grid environments. Although our work is preliminary, CWG has already been used to map high energy physics applications onto the Grid. In one particular experiment, a set of production runs lasted 7 days and resulted in the generation of 167,500 events by 678 jobs. Additionally, ACWG was used to map gravitational physics workflows, with hundreds of nodes onto the available resources, resulting in 975 tasks, 1365 data transfers and 975 output files produced. 相似文献

2.

Campus Grids Meet Applications: Modeling, Metascheduling and Integration

Yonghong Yan Barbara M. Chapman 《Journal of Grid Computing》2006,4(2):159-175

Air Quality Forecasting (AQF) is a new discipline that attempts to reliably predict atmospheric pollution. An AQF application has complex workflows and in order to produce timely and reliable forecast results, each execution requires access to diverse and distributed computational and storage resources. Deploying AQF on Grids is one option to satisfy such needs, but requires the related Grid middleware to support automated workflow scheduling and execution on Grid resources. In this paper, we analyze the challenges in deploying an AQF application in a campus Grid environment and present our current efforts to develop a general solution for Grid-enabling scientific workflow applications in the GRACCE project. In GRACCE, an application’s workflow is described using GAMDL, a powerful dataflow language for describing application logic. The GRACCE metascheduling architecture provides the functionalities required for co-allocating Grid resources for workflow tasks, scheduling the workflows and monitoring their execution. By providing an integrated framework for modeling and metascheduling scientific workflow applications on Grid resources, we make it easy to build a customized environment with end-to-end support for application Grid deployment, from the management of an application and its dataset, to the automatic execution and analysis of its results.The work has been performed as part of the University of Houston’s Sun Microsystems Center of Excellence in Geosciences [38]. 相似文献

3.

一种时间控制模型在工作流中的研究与应用

熊天虹张祖平龙军《计算机系统应用》2010,19(11):239-241

现代企业对工作任务完成的时效性非常看重,而很多实际应用的工作流解决方案,包括应用广泛的国内外开源工作流引擎,都存在着任务分配不均和任务容易超时的问题。论文提出在工作流引擎中引入一种时间控制模型,通过模型的任务时间计量平台来衡量任务耗费时间,并以此来合理分配工作任务,避免任务超时。实际应用后发现,任务超时率下降明显,表明模型是行之有效的,也为企业级工作流引擎模型的架构设计和实际应用提供了参考。相似文献

4.

A Taxonomy of Workflow Management Systems for Grid Computing 总被引：12，自引：0，他引：12

Jia Yu Rajkumar Buyya 《Journal of Grid Computing》2005,3(3-4):171-200

With the advent of Grid and application technologies, scientists and engineers are building more and more complex applications to manage and process large data sets, and execute scientific experiments on distributed resources. Such application scenarios require means for composing and executing complex workflows. Therefore, many efforts have been made towards the development of workflow management systems for Grid computing. In this paper, we propose a taxonomy that characterizes and classifies various approaches for building and executing workflows on Grids. We also survey several representative Grid workflow systems developed by various projects world-wide to demonstrate the comprehensiveness of the taxonomy. The taxonomy not only highlights the design and engineering similarities and differences of state-of-the-art in Grid workflow systems, but also identifies the areas that need further research. 相似文献

5.

网格环境下费用约束的科学工作流可靠调度算法

阎朝坤胡志刚李玺罗慧敏《小型微型计算机系统》2012,33(4):707-711

网格基础设施是目前科学工作流应用规划、部署和执行的主要支撑环境.然而由于网格资源的自治、动态及异构性,如何在保障用户QoS约束下有效调度科学工作流是一个研究热点.针对费用约束下的科学工作流调度问题,为了提高其执行的可靠性,本文使用随机服务模型描述资源节点的动态服务能力并考虑本地任务负载对资源执行性能的影响,给出一种资源可靠性的评估方法,在此基础上提出一种费用约束下的科学工作流可靠调度算法RSASW.仿真实验结果表明RSASW算法相对于GAIN3,GreedyTime-CD及PFAS算法,对工作流的执行具有很好的可靠性保障. 相似文献

6.

Using imbalance metrics to optimize task clustering in scientific workflow executions

《Future Generation Computer Systems》2015

Scientific workflows can be composed of many fine computational granularity tasks. The runtime of these tasks may be shorter than the duration of system overheads, for example, when using multiple resources of a cloud infrastructure. Task clustering is a runtime optimization technique that merges multiple short running tasks into a single job such that the scheduling overhead is reduced and the overall runtime performance is improved. However, existing task clustering strategies only provide a coarse-grained approach that relies on an over-simplified workflow model. In this work, we examine the reasons that cause Runtime Imbalance and Dependency Imbalance in task clustering. Then, we propose quantitative metrics to evaluate the severity of the two imbalance problems. Furthermore, we propose a series of task balancing methods (horizontal and vertical) to address the load balance problem when performing task clustering for five widely used scientific workflows. Finally, we analyze the relationship between these metric values and the performance of proposed task balancing methods. A trace-based simulation shows that our methods can significantly decrease the runtime of workflow applications when compared to a baseline execution. We also compare the performance of our methods with two algorithms described in the literature. 相似文献

7.

A semantic framework for automatic generation of computational workflows using distributed data and component catalogues

Yolanda Gil Pedro A. González-Calero Joshua Moody Varun Ratnakar 《人工智能实验与理论杂志》2013,25(4):389-467

Computational workflows are a powerful paradigm to represent and manage complex applications, particularly in large-scale distributed scientific data analysis. Workflows represent application components that result in individual computations as well as their interdependences in terms of dataflow. Workflow systems use these representations to manage various aspects of workflow creation and execution for users, such as the automatic assignment of execution resources. This article describes an approach to automating a new aspect of the process: the selection of application components and data sources. We present a novel approach that enables users to specify varying degrees of detail and amount of constraints in a workflow request, including the specification of constraints on input, intermediate or output data in the workflow, abstract workflow component classes rather than specific component implementations, and generic reusable workflow templates that express a pre-defined combination of components. The algorithm elaborates the user request into a set of fully ground workflows with specific choices of data sources and codes to be used so that they can be submitted for mapping and execution. The algorithm searches through the space of possible candidate workflows by creating increasingly more specialized versions of the original template and eliminating candidates that violate constraints cumulated in the candidate workflow as components and data sources are selected. A novel feature of our approach is that it assumes a distributed architecture where data and component catalogues are separate from the workflow system. The algorithm explicitly poses queries to external catalogues, and therefore any reasoning regarding data or component properties is not assumed to occur within the workflow system. We describe our implementation of this approach in the Wings workflow system. This implementation uses the W3C Web Ontology Language and associated reasoners to implement the workflow system as well as the data and component catalogues. This research demonstrates the use of artificial intelligence techniques to support the kinds of automation envisioned by the scientific community for large-scale distributed scientific data analysis. 相似文献

8.

Workflow Concepts of the Java CoG Kit 总被引：1，自引：0，他引：1

Gregor von Laszewski Mike Hategan 《Journal of Grid Computing》2005,3(3-4):239-258

Many scientific simulations and experiments require the coordination of numerous tasks posed by interdisciplinary research teams. Grids can provide access to the necessary high-end resources to conduct such tasks. The complex tasks and their interactions must be supported through convenient tools. To address this issue, we introduce a number of Grid abstractions that make the development of Grid middleware-independent tools possible and allow for the integration of a number of commodity tools. Our vision is implemented through an integrated approach based on a layered architecture that bridges the gap between Grid middleware and scientific applications. Our abstractions include specialized services, a Grid workflow engine and language, and Gridfaces – graphical abstractions that can be employed in science portals and standalone applications. 相似文献

9.

Grid Service Orchestration Using the Business Process Execution Language (BPEL) 总被引：6，自引：0，他引：6

Wolfgang Emmerich Ben Butchart Liang Chen Bruno Wassermann Sarah L. Price 《Journal of Grid Computing》2005,3(3-4):283-304

Modern scientific applications often need to be distributed across Grids. Increasingly applications rely on services, such as job submission, data transfer or data portal services. We refer to such services as Grid services. While the invocation of Grid services could be hard coded in theory, scientific users want to orchestrate service invocations more flexibly. In enterprise applications, the orchestration of web services is achieved using emerging orchestration standards, most notably the Business Process Execution Language (BPEL). We describe our experience in orchestrating scientific workflows using BPEL. We have gained this experience during an extensive case study that orchestrates Grid services for the automation of a polymorph prediction application. Using this example, we explain the extent with which the BPEL language supports the definition of scientific workflows. We then describe the reliability, performance and scalability that can be achieved by executing a complex scientific workflow with ActiveBPEL, an industrial strength but freely available BPEL engine. *The work has been funded by the UK EPSRC through grants GR/R97207/01 (e-Materials) and GR/S90843/01 (OMII Managed Programme). 相似文献

10.

Multi-Grid, Multi-User Workflows in the P-GRADE Grid Portal

Péter Kacsuk Gergely Sipos 《Journal of Grid Computing》2005,3(3-4):221-238

Computational Grids connect resources and users in a complex way in order to deliver nontrivial qualities of services. According to the current trend various communities build their own Grids and due to the lack of generally accepted standards these Grids are usually not interoperable. As a result, large scale sharing of resources is prevented by the isolation of Grid systems. Similarly, people are isolated, because the collaborative work of Grid users is not supported by current environments. Each user accesses Grids as an individual person without having the possibility of organizing teams that could overcome the difficulties of application development and execution more easily. The paper describes a new workflow-oriented portal concept that solves both problems. It enables the interoperability of various Grids during the execution of workflow applications, and supports users to develop and run their Grid workflows in a collaborative way. The paper also introduces a classification model that can be used to identify workflow-oriented Grid portals based on two general features: Ability to access multiple Grids, and support for collaborative problem solving. Using the approach the different potential portal types are introduced, their unique features are discussed and the portals and Problem Solving Environments (PSE) of our days are classified. The P-GRADE Portal as a Globus-based implementation for the classification model is also presented. The work described in this paper is supported by the Hungarian Grid project (IHM 4671/1/2003), by the Hungarian OTKA project (No. T042459) and a collaboration project with the University of Reading. 相似文献

11.

Cloud-aware data intensive workflow scheduling on volunteer computing systems

《Future Generation Computer Systems》2015

Volunteer computing systems offer high computing power to the scientific communities to run large data intensive scientific workflows. However, these computing environments provide the best effort infrastructure to execute high performance jobs. This work aims to schedule scientific and data intensive workflows on hybrid of the volunteer computing system and Cloud resources to enhance the utilization of these environments and increase the percentage of workflow that meets the deadline. The proposed workflow scheduling system partitions a workflow into sub-workflows to minimize data dependencies among the sub-workflows. Then these sub-workflows are scheduled to distribute on volunteer resources according to the proximity of resources and the load balancing policy. The execution time of each sub-workflow on the selected volunteer resources is estimated in this phase. If any of the sub-workflows misses the sub-deadline due to the large waiting time, we consider re-scheduling of this sub-workflow into the public Cloud resources. This re-scheduling improves the system performance by increasing the percentage of workflows that meet the deadline. The proposed Cloud-aware data intensive scheduling algorithm increases the percentage of workflow that meet the deadline with a factor of 75% in average with respect to the execution of workflows on the volunteer resources. 相似文献

12.

一种云环境下科学工作流执行计划的优化方法

郭宏乐陈旺虎马生俊李新田乔保民《计算机工程与科学》2019,41(3):433-439

为降低云环境下科学工作流的执行代价,提出了一种执行计划的优化方法。引入猴群算法,依靠对当前执行计划的层内和层间优化,在保证工作流全局截止时间约束的前提下,通过同层任务的逻辑聚合和任务的层间调整,尽可能减少各层任务数的差异,以避免资源的闲置浪费,缩短任务的等待时间。实验表明,该方法与类似研究相比,可降低资源消耗量,减小总的延迟时间。相似文献

13.

Utility functions for adaptively executing concurrent workflows

Kevin Lee Norman W. Paton Rizos Sakellariou Alvaro A. A. Fernandes 《Concurrency and Computation》2011,23(6):646-666

相似文献

14.

Towards critical region reliability support for Grid workflows

Guo-Zhong Jiong Jing-Sha 《Journal of Parallel and Distributed Computing》2009,69(12):989-995

Many Directed Acyclic Graph (DAG)-based workflow applications often have timing constraints such that each processing of a workflow needs to be finished within its deadline. There have been some studies to improve the performance of time-constrained workflow processing. Few of them, however, have taken into account the fact that successful execution of a workflow within its deadline is also affected by the ‘normal state’ and ‘abnormal state’ of Grid resources occurring in successive turns and by the relative difference in execution time between tasks on the critical path and tasks on the non-critical path. To solve the problem, we first put forward new some conceptions, such as the critical region and the reliability of the critical region, and then present a scheduling algorithm. In terms of the finite-state continuous-time Markov process, the algorithm selects a resource combination scheme which has the lowest expenditure under a certain credit level of the resource reliability on the critical path in the DAG-based workflow. The simulation shows the validity of theory analysis. 相似文献

15.

Dynamic Instrumentation, Performance Monitoring and Analysis of Grid Scientific Workflows

Hong-Linh Truong Thomas Fahringer Schahram Dustdar 《Journal of Grid Computing》2005,3(1-2):1-18

While existing work concentrates on developing QoS models of business workflows and Web services, few tools have been developed to support the monitoring and performance analysis of scientific workflows in Grids. This paper describes novel Grid services for dynamic instrumentation of Grid-based applications, performance monitoring and analysis of Grid scientific workflows. We describe a Grid dynamic instrumentation service that provides a widely accessible interface for other services and users to conduct the dynamic instrumentation of Grid applications during the runtime. We introduce a Grid performance analysis service for Grid scientific workflows. The analysis service utilizes various types of data including workflow graphs, monitoring data of resources, execution status of activities, and performance measurements obtained from the dynamic instrumentation of invoked applications, and provides a rich set of functionalities and features to support the online monitoring and performance analysis of scientific workflows. Workflows and their relevant information including performance metrics are stored and utilized for comparing the performance of constructs of different workflows and for supporting multi-workflow analysis. The work described in this paper is supported in part by the Austrian Science Fund as part of the Aurora Project under contract SFBF1104 and by the European Union through the IST-2002-511385 project K-WfGrid. 相似文献

16.

A Provenance-based Adaptive Scheduling Heuristic for Parallel Scientific Workflows in Clouds

Daniel de Oliveira Kary A. C. S. Oca?a Fernanda Bai?o Marta Mattoso 《Journal of Grid Computing》2012,10(3):521-552

In the last years, scientific workflows have emerged as a fundamental abstraction for structuring and executing scientific experiments in computational environments. Scientific workflows are becoming increasingly complex and more demanding in terms of computational resources, thus requiring the usage of parallel techniques and high performance computing (HPC) environments. Meanwhile, clouds have emerged as a new paradigm where resources are virtualized and provided on demand. By using clouds, scientists have expanded beyond single parallel computers to hundreds or even thousands of virtual machines. Although the initial focus of clouds was to provide high throughput computing, clouds are already being used to provide an HPC environment where elastic resources can be instantiated on demand during the course of a scientific workflow. However, this model also raises many open, yet important, challenges such as scheduling workflow activities. Scheduling parallel scientific workflows in the cloud is a very complex task since we have to take into account many different criteria and to explore the elasticity characteristic for optimizing workflow execution. In this paper, we introduce an adaptive scheduling heuristic for parallel execution of scientific workflows in the cloud that is based on three criteria: total execution time (makespan), reliability and financial cost. Besides scheduling workflow activities based on a 3-objective cost model, this approach also scales resources up and down according to the restrictions imposed by scientists before workflow execution. This tuning is based on provenance data captured and queried at runtime. We conducted a thorough validation of our approach using a real bioinformatics workflow. The experiments were performed in SciCumulus, a cloud workflow engine for managing scientific workflow execution. 相似文献

17.

超算环境科学工作流应用平台的引擎设计和资源调度

李于锋莫则尧肖永浩赵士操段博文《计算机应用研究》2019,36(6)

高性能计算机体系结构的复杂性对使用者提出了更高要求;而且在工程实际和科学实验中,通常需要使用多种应用软件相互协作才能解决复杂问题。围绕超算资源的易用性和多类软件的集成以及协作需求,开发了超算环境下的科学工作流应用平台,设计了异步并发的流程执行引擎,采取调度算法和调度器、引擎相分离的设计策略,给出了资源调度方案。提出了局部资源池化技术和资源预约算法,并比较分析了五种常用调度算法的性能,给出了算法选择的建议。实际应用表明设计的引擎能够支撑复杂工作流的灵活执行方式,给出的资源调度方案能够满足超算环境下工作流应用的高效执行。相似文献

18.

一个组织间松散耦合跨组织工作流的仿真模型 总被引：5，自引：0，他引：5

程绍武徐晓飞王刚李全龙《软件学报》2006,17(12):2461-2470

为了解决松散耦合跨组织工作流的仿真建模问题,以染色Petri网为理论基础,将颜色集、染色函数、资源库所、等待库所、忙库所、开始变迁、结束变迁、角色、组织、时间函数、资源函数和变迁函数引入到IOWF(inter-organizational workflow),提出了染色多维跨组织工作流网CMD/IOWF(colored multi-dimension/IOWF).基于CMD/IOWF,讨论了组织间松散耦合跨组织工作流中多个不同项目工作流实例通过资源共享和活动同步相互耦合以及与仿真相关的资源约束和时间的建模问题.通过定义输入/输出、状态、事件、时间推进函数和状态转移函数,给出了一个组织间松散耦合的多个跨组织工作流的仿真模型.基于该模型的仿真分析可以求解组织间松散耦合跨组织工作流的关键性能指标,包括:对应项目工作流实例的平均执行时间、平均执行成本及其关于组织的分布,各项目的资源利用率.最后,以一个实例验证了提出模型的有效性. 相似文献

19.

A model, design, and implementation of an efficient multithreaded workflow execution engine with data streaming, caching, and storage constraints

Pawel Czarnul 《The Journal of supercomputing》2013,63(3):919-945

The paper proposes a model, design, and implementation of an efficient multithreaded engine for execution of distributed service-based workflows with data streaming defined on a per task basis. The implementation takes into account capacity constraints of the servers on which services are installed and the workflow data footprint if needed. Furthermore, it also considers storage space of the workflow execution engine and its cost. Caching service output data is implemented to speed up the execution of the workflow. Input data is partitioned into data packets, which are passed and processed by services previously selected for workflow tasks so that the aforementioned constraints are met. Performance impact of the proposed mechanisms is investigated for workflow structures common in acyclic directed graph workflow applications. It is shown for a real workflow with distributed processing of digital media content that the initial budget needs to be properly distributed between both the cost of services, but also the cost of intermediate storage to obtain good workflow execution times. 相似文献

20.

一个基于元数据导航的服务工作流装配模型 总被引：3，自引：0，他引：3

王月龙王文俊罗英伟汪小林许卓群《计算机学报》2006,29(7):1105-1115

以城市应急处置业务作为应用背景,提出了一个工作流的分层概念模型和一个与之关联的元数据分层描述规范,在此基础上构建了一个基于元数据导航的、从高层的业务应用到底层的分布、动态资源的逐层绑定的运行机制,基于该机制实现了一个城市应急联动系统IERS(Integrated Emergency Response System)的实验原型．该机制解决了从应急处置业务工作流到底层分布服务和资源的装配问题,增强了应急系统对动态、分布服务环境的支持力度,提高了应急处置业务在执行过程中的自动化水平和自适应性,同时也分解和简化了工作流问题的复杂度．相似文献