期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

On the efficiency of several VM provisioning strategies for workflows with multi-threaded tasks on clouds

Marc E. Frincu Stéphane Genaud Julien Gossa 《Computing》2014,96(11):1059-1086

Cloud computing promises the delivery of on-demand pay-per-use access to unlimited resources. Using these resources requires more than a simple access to them as most clients have certain constraints in terms of cost and time that need to be fulfilled. Therefore certain scheduling heuristics have been devised to optimize the placement of client tasks on allocated virtual machines. The applications can be roughly divided in two categories: independent bag-of-tasks and workflows. In this paper we focus on the latter and investigate a less studied problem, i.e., the effect the virtual machine allocation policy has on the scheduling outcome. For this we look at how workflow structure, execution time, virtual machine instance type affect the efficiency of the provisioning method when cost and makespan are considered. To aid our study we devised a mathematical model for cost and makespan in case single or multiple instance types are used. While the model allows us to determine the boundaries for two of our extreme methods, the complexity of workflow applications calls for a more experimental approach to determine the general relation. For this purpose we considered synthetically generated workflows that cover a wide range of possible cases. Results have shown the need for probabilistic selection methods in case small and heterogeneous execution times are used, while for large homogeneous ones the best algorithm is clearly noticed. Several other conclusions regarding the efficiency of powerful instance types as compared to weaker ones, and of dynamic methods against static ones are also made. 相似文献

2.

Cloud-aware data intensive workflow scheduling on volunteer computing systems

《Future Generation Computer Systems》2015

Volunteer computing systems offer high computing power to the scientific communities to run large data intensive scientific workflows. However, these computing environments provide the best effort infrastructure to execute high performance jobs. This work aims to schedule scientific and data intensive workflows on hybrid of the volunteer computing system and Cloud resources to enhance the utilization of these environments and increase the percentage of workflow that meets the deadline. The proposed workflow scheduling system partitions a workflow into sub-workflows to minimize data dependencies among the sub-workflows. Then these sub-workflows are scheduled to distribute on volunteer resources according to the proximity of resources and the load balancing policy. The execution time of each sub-workflow on the selected volunteer resources is estimated in this phase. If any of the sub-workflows misses the sub-deadline due to the large waiting time, we consider re-scheduling of this sub-workflow into the public Cloud resources. This re-scheduling improves the system performance by increasing the percentage of workflows that meet the deadline. The proposed Cloud-aware data intensive scheduling algorithm increases the percentage of workflow that meet the deadline with a factor of 75% in average with respect to the execution of workflows on the volunteer resources. 相似文献

3.

A semantic framework for automatic generation of computational workflows using distributed data and component catalogues

Yolanda Gil Pedro A. González-Calero Joshua Moody Varun Ratnakar 《人工智能实验与理论杂志》2013,25(4):389-467

Computational workflows are a powerful paradigm to represent and manage complex applications, particularly in large-scale distributed scientific data analysis. Workflows represent application components that result in individual computations as well as their interdependences in terms of dataflow. Workflow systems use these representations to manage various aspects of workflow creation and execution for users, such as the automatic assignment of execution resources. This article describes an approach to automating a new aspect of the process: the selection of application components and data sources. We present a novel approach that enables users to specify varying degrees of detail and amount of constraints in a workflow request, including the specification of constraints on input, intermediate or output data in the workflow, abstract workflow component classes rather than specific component implementations, and generic reusable workflow templates that express a pre-defined combination of components. The algorithm elaborates the user request into a set of fully ground workflows with specific choices of data sources and codes to be used so that they can be submitted for mapping and execution. The algorithm searches through the space of possible candidate workflows by creating increasingly more specialized versions of the original template and eliminating candidates that violate constraints cumulated in the candidate workflow as components and data sources are selected. A novel feature of our approach is that it assumes a distributed architecture where data and component catalogues are separate from the workflow system. The algorithm explicitly poses queries to external catalogues, and therefore any reasoning regarding data or component properties is not assumed to occur within the workflow system. We describe our implementation of this approach in the Wings workflow system. This implementation uses the W3C Web Ontology Language and associated reasoners to implement the workflow system as well as the data and component catalogues. This research demonstrates the use of artificial intelligence techniques to support the kinds of automation envisioned by the scientific community for large-scale distributed scientific data analysis. 相似文献

4.

A scalable Cloud-based system for data-intensive spatial analysis

R.?O.?Sinnott Email author W.?Voorsluys 《International Journal on Software Tools for Technology Transfer (STTT)》2016,18(6):587-605

Advances in Cloud computing technology and the availability of affordable and easy to use Cloud services are enabling a multitude of scientific applications to use these resources as primary or secondary computing infrastructure. The urban and built environment research domain is one area that can benefit greatly from Cloud computing. The global population growth and increase in the size and population of cities raise many challenges for governments, planners and researchers alike. The Australian Urban Research Infrastructure Network (AURIN—http://www.aurin.org.au) project has been tasked with developing an advanced platform (e-Infrastructure) across Australia to tackle these challenges. The platform leverages large-scale Cloud resources to provide federated data access to, at present over 1100 data sets from major and often definitive government and industry data-rich organisations, and for scalable data processing and visualisation. The original AURIN tools were developed using the object modelling system (OMS) and supported integrated workflows to define and enact/re-enact scientific processes. More recently the work has evolved to focus more on delivery of a workbench offering a rich range of tools delivered through an extensible workflow environment. In this paper, we provide the background to AURIN including the scientific drivers that are shaping the work and the realisation of the Cloud-based AURIN environment. We focus in particular on the workflow environment and show how it seamlessly utilizes the Cloud for urban research processes focused especially on data-intensive spatial analysis. We illustrate the utilisation of this workflow environment across a range of case studies reflecting urban research activities. 相似文献

5.

Scheduling and flexible control of bandwidth and in-transit services for end-to-end application workflows

《Future Generation Computer Systems》2016

End-to-end scientific application workflows that integrate high-end experiments and instruments with large scale simulations and end-user displays are becoming increasingly important. These workflows require complex couplings and data sharing between distributed components involving large data volumes and present varying hard (in-time data delivery) and soft (in-transit processing) quality of service (QoS) requirements. As a result, supporting efficient data transport is critical for such workflows. In this paper, we leverage software-defined networking (SDN) to address issues of data transport service control and resource provisioning to meet varying QoS requirements from multiple coupled workflows sharing the same service medium. Specifically, we present a flexible control and a disciplined resource scheduling approach for data transport services for science networks. Furthermore, we emulate an SDN testbed on top of the FutureGrid virtualized testbed and use it to evaluate our approach for a realistic scientific workflow. Our results show that SDN-based control and resource scheduling based on simple intuitive models can meet the requirements of the targeted workflows with high resource utilization. 相似文献

6.

允许违反局部时间约束的科学工作流调度策略

陈旺虎段菊俞茂义《计算机工程与科学》2016,38(11):2165-2171

提高科学工作流在云环境中的执行效率、降低执行费用受到广泛关注。用户期望的局部QoS约束与工作流的总体执行效率之间往往存在矛盾。针对该现象,在前期的研究基础上提出一种允许违反局部时间约束的科学工作流调度策略。通过对已聚簇的工作流任务集使用任务后向优先合并的方法,可实现任务间空闲时间片的合理利用,进而优化科学工作流的执行时间;另外,为充分利用任务的松弛时间,提高工作流的整体执行效率,允许部分任务的调度违反局部最晚完成时间的约束。实验结果表明,该策略能提前科学工作流的最早完成时间,提高处理机的利用率,并最终降低工作流的执行费用。相似文献

7.

SLA enactment for large-scale healthcare workflows on multi-Cloud

《Future Generation Computer Systems》2015

相似文献

8.

A Formal Approach to Support Interoperability in Scientific Meta-workflows

Junaid Arshad Gabor Terstyanszky Tamas Kiss Noam Weingarten Giuliano Taffoni 《Journal of Grid Computing》2016,14(4):655-671

相似文献

9.

Resource management for bursty streams on multi-tenancy cloud environments

《Future Generation Computer Systems》2016

The number of applications that need to process data continuously over long periods of time has increased significantly over recent years. The emerging Internet of Things and Smart Cities scenarios also confirm the requirement for real time, large scale data processing. When data from multiple sources are processed over a shared distributed computing infrastructure, it is necessary to provide some Quality of Service (QoS) guarantees for each data stream, specified in a Service Level Agreement (SLA). SLAs identify the price that a user must pay to achieve the required QoS, and the penalty that the provider will pay the user in case of QoS violation. Assuming maximization of revenue as a Cloud provider’s objective, then it must decide which streams to accept for storage and analysis; and how many resources to allocate for each stream. When the real-time requirements demand a rapid reaction, dynamic resource provisioning policies and mechanisms may not be useful, since the delays and overheads incurred might be too high. Alternatively, idle resources that were initially allocated for other streams could be re-allocated, avoiding subsequent penalties. In this paper, we propose a system architecture for supporting QoS for concurrent data streams to be composed of self-regulating nodes. Each node features an envelope process for regulating and controlling data access and a resource manager to enable resource allocation, and selective SLA violations, while maximizing revenue. Our resource manager, based on a shared token bucket, enables: (i) the re-distribution of unused resources amongst data streams; and (ii) a dynamic re-allocation of resources to streams likely to generate greater profit for the provider. We extend previous work by providing a Petri-net based model of system components, and we evaluate our approach on an OpenNebula-based Cloud infrastructure. 相似文献

10.

MidHDC: Advanced topics on middleware services for heterogeneous distributed computing. Part 2

《Future Generation Computer Systems》2017

Currently distributes systems support different computing paradigms like Cluster Computing, Grid Computing, Peer-to-Peer Computing, and Cloud Computing all involving elements of heterogeneity. These computing distributed systems are often characterized by a variety of resources that may or may not be coupled with specific platforms or environments. All these topics challenge today researchers, due to the strong dynamic behavior of the user communities and of resource collections they use.The second part of this special issue presents advances in allocation algorithms, service selection, VM consolidation and mobility policies, scheduling multiple virtual environments and scientific workflows, optimization in scheduling process, energy-aware scheduling models, failure Recovery in shared Big Data processing systems, distributed transaction processing middleware, data storage, trust evaluation, information diffusion, mobile systems, integration of robots in Cloud systems. 相似文献

11.

A Provenance-based Adaptive Scheduling Heuristic for Parallel Scientific Workflows in Clouds

Daniel de Oliveira Kary A. C. S. Oca?a Fernanda Bai?o Marta Mattoso 《Journal of Grid Computing》2012,10(3):521-552

In the last years, scientific workflows have emerged as a fundamental abstraction for structuring and executing scientific experiments in computational environments. Scientific workflows are becoming increasingly complex and more demanding in terms of computational resources, thus requiring the usage of parallel techniques and high performance computing (HPC) environments. Meanwhile, clouds have emerged as a new paradigm where resources are virtualized and provided on demand. By using clouds, scientists have expanded beyond single parallel computers to hundreds or even thousands of virtual machines. Although the initial focus of clouds was to provide high throughput computing, clouds are already being used to provide an HPC environment where elastic resources can be instantiated on demand during the course of a scientific workflow. However, this model also raises many open, yet important, challenges such as scheduling workflow activities. Scheduling parallel scientific workflows in the cloud is a very complex task since we have to take into account many different criteria and to explore the elasticity characteristic for optimizing workflow execution. In this paper, we introduce an adaptive scheduling heuristic for parallel execution of scientific workflows in the cloud that is based on three criteria: total execution time (makespan), reliability and financial cost. Besides scheduling workflow activities based on a 3-objective cost model, this approach also scales resources up and down according to the restrictions imposed by scientists before workflow execution. This tuning is based on provenance data captured and queried at runtime. We conducted a thorough validation of our approach using a real bioinformatics workflow. The experiments were performed in SciCumulus, a cloud workflow engine for managing scientific workflow execution. 相似文献

12.

Bi-level fuzzy based advanced reservation of Cloud workflow applications on distributed Grid resources

Sahar Adabi Ali Movaghar Amir Masoud Rahmani 《The Journal of supercomputing》2014,67(1):175-218

The increasing demand on execution of large-scale Cloud workflow applications which need a robust and elastic computing infrastructure usually lead to the use of high-performance Grid computing clusters. As the owners of Cloud applications expect to fulfill the requested Quality of Services (QoS) by the Grid environment, an adaptive scheduling mechanism is needed which enables to distribute a large number of related tasks with different computational and communication demands on multi-cluster Grid computing environments. Addressing the problem of scheduling large-scale Cloud workflow applications onto multi-cluster Grid environment regarding the QoS constraints declared by application’s owner is the main contribution of this paper. Heterogeneity of resource types (service type) is one of the most important issues which significantly affect workflow scheduling in Grid environment. On the other hand, a Cloud application workflow is usually consisting of different tasks with the need for different resource types to complete which we call it heterogeneity in workflow. The main idea which forms the soul of all the algorithms and techniques introduced in this paper is to match the heterogeneity in Cloud application’s workflow to the heterogeneity in Grid clusters. To obtain this objective a new bi-level advanced reservation strategy is introduced, which is based upon the idea of first performing global scheduling and then conducting local scheduling. Global-scheduling is responsible to dynamically partition the received DAG into multiple sub-workflows that is realized by two collaborating algorithms: (1) The Critical Path Extraction algorithm (CPE) which proposes a new dynamic task overall critically value strategy based on DAG’s specification and requested resource type QoS status to determine the criticality of each task; and (2) The DAG Partitioning algorithm (DAGP) which introduces a novel dynamic score-based approach to extract sub-workflows based on critical paths by using a new Fuzzy Qualitative Value Calculation System to evaluate the environment. Local-scheduling is responsible for scheduling tasks on suitable resources by utilizing a new Multi-Criteria Advance Reservation algorithm (MCAR) which simultaneously meets high reliability and QoS expectations for scheduling distributed Cloud-base applications. We used the simulation to evaluate the performance of the proposed mechanism in comparison with four well-known approaches. The results show that the proposed algorithm outperforms other approaches in different QoS related terms. 相似文献

13.

JS4Cloud: script‐based workflow programming for scalable data analysis on cloud platforms

Fabrizio Marozzo Domenico Talia Paolo Trunfio 《Concurrency and Computation》2015,27(17):5214-5237

相似文献

14.

Using a suite of ontologies for preserving workflow-centric research objects

《Journal of Web Semantics》2015

Scientific workflows are a popular mechanism for specifying and automating data-driven in silico experiments. A significant aspect of their value lies in their potential to be reused. Once shared, workflows become useful building blocks that can be combined or modified for developing new experiments. However, previous studies have shown that storing workflow specifications alone is not sufficient to ensure that they can be successfully reused, without being able to understand what the workflows aim to achieve or to re-enact them. To gain an understanding of the workflow, and how it may be used and repurposed for their needs, scientists require access to additional resources such as annotations describing the workflow, datasets used and produced by the workflow, and provenance traces recording workflow executions.In this article, we present a novel approach to the preservation of scientific workflows through the application of research objects—aggregations of data and metadata that enrich the workflow specifications. Our approach is realised as a suite of ontologies that support the creation of workflow-centric research objects. Their design was guided by requirements elicited from previous empirical analyses of workflow decay and repair. The ontologies developed make use of and extend existing well known ontologies, namely the Object Reuse and Exchange (ORE) vocabulary, the Annotation Ontology (AO) and the W3C PROV ontology (PROVO). We illustrate the application of the ontologies for building Workflow Research Objects with a case-study that investigates Huntington’s disease, performed in collaboration with a team from the Leiden University Medial Centre (HG-LUMC). Finally we present a number of tools developed for creating and managing workflow-centric research objects. 相似文献

15.

多云工作流优化传输费用的数据布局策略

马飞《数字社区&智能家居》2014,(4):2418-2420

科学工作流应用是一种复杂且数据密集型的应用,常应用于结构生物学、高能物理学和神经学等涉及分布式数据源的学科。数据分散存储在基于互联网的云计算平台上,致使科学工作流在执行时伴随着大量的数据传输。云计算是一种按使用量付费的模式,数据传输产生传输费用,尤其在多个工作流相互协同的情况下,将产生更高的传输成本。该文从全局的角度建立基于多工作流数据依赖图的传输成本模型,研究基于二进制粒子群算法（BPSO）的数据布局优化策略,从而减少对云计算传输资源的租赁费用。相似文献

16.

A Survey of Data-Intensive Scientific Workflow Management

Ji Liu Esther Pacitti Patrick Valduriez Marta Mattoso 《Journal of Grid Computing》2015,13(4):457-493

Nowadays, more and more computer-based scientific experiments need to handle massive amounts of data. Their data processing consists of multiple computational steps and dependencies within them. A data-intensive scientific workflow is useful for modeling such process. Since the sequential execution of data-intensive scientific workflows may take much time, Scientific Workflow Management Systems (SWfMSs) should enable the parallel execution of data-intensive scientific workflows and exploit the resources distributed in different infrastructures such as grid and cloud. This paper provides a survey of data-intensive scientific workflow management in SWfMSs and their parallelization techniques. Based on a SWfMS functional architecture, we give a comparative analysis of the existing solutions. Finally, we identify research issues for improving the execution of data-intensive scientific workflows in a multisite cloud. 相似文献

17.

Optimizing virtual machine allocation for parallel scientific workflows in federated clouds

《Future Generation Computer Systems》2015

Cloud computing has established itself as an interesting computational model that provides a wide range of resources such as storage, databases and computing power for several types of users. Recently, the concept of cloud computing was extended with the concept of federated clouds where several resources from different cloud providers are inter-connected to perform a common action (e.g. execute a scientific workflow). Users can benefit from both single-provider and federated cloud environment to execute their scientific workflows since they can get the necessary amount of resources on demand. In several of these workflows, there is a demand for high performance and parallelism techniques since many activities are data and computing intensive and can execute for hours, days or even weeks. There are some Scientific Workflow Management Systems (SWfMS) that already provide parallelism capabilities for scientific workflows in single-provider cloud. Most of them rely on creating a virtual cluster to execute the workflow in parallel. However, they also rely on the user to estimate the amount of virtual machines to be allocated to create this virtual cluster. Most SWfMS use this initial virtual cluster configuration made by the user for the entire workflow execution. Dimensioning the virtual cluster to execute the workflow in parallel is then a top priority task since if the virtual cluster is under or over dimensioned it can impact on the workflow performance or increase (unnecessarily) financial costs. This dimensioning is far from trivial in a single-provider cloud and specially in federated clouds due to the huge number of virtual machine types to choose in each location and provider. In this article, we propose an approach named GraspCC-fed to produce the optimal (or near-optimal) estimation of the amount of virtual machines to allocate for each workflow. GraspCC-fed extends a previously proposed heuristic based on GRASP for executing standalone applications to consider scientific workflows executed in both single-provider and federated clouds. For the experiments, GraspCC-fed was coupled to an adapted version of SciCumulus workflow engine for federated clouds. This way, we believe that GraspCC-fed can be an important decision support tool for users and it can help determining an optimal configuration for the virtual cluster for parallel cloud-based scientific workflows. 相似文献

18.

A security and cost aware scheduling algorithm for heterogeneous tasks of scientific workflow in clouds

《Future Generation Computer Systems》2016

Security is increasingly critical for various scientific workflows that are big data applications and typically take quite amount of time being executed on large-scale distributed infrastructures. Cloud computing platform is such an infrastructure that can enable dynamic resource scaling on demand. Nevertheless, based on pay-per-use and hourly-based pricing model, users should pay attention to the cost incurred by renting virtual machines (VMs) from cloud data centers. Meanwhile, workflow tasks are generally heterogeneous and require different instance series (i.e., computing optimized, memory optimized, storage optimized, etc.). In this paper, we propose a security and cost aware scheduling (SCAS) algorithm for heterogeneous tasks of scientific workflow in clouds. Our proposed algorithm is based on the meta-heuristic optimization technique, particle swarm optimization (PSO), the coding strategy of which is devised to minimize the total workflow execution cost while meeting the deadline and risk rate constraints. Extensive experiments using three real-world scientific workflow applications, as well as CloudSim simulation framework, demonstrate the effectiveness and practicality of our algorithm. 相似文献

19.

A collaborative scheduling approach for service-driven scientific workflow execution

Wanchun Dou J. Leon Zhao Shaokun Fan 《Journal of Computer and System Sciences》2010,76(6):416-427

Scientific workflow execution often spans multiple self-managing administrative domains to obtain specific processing capabilities. Existing (global) analysis techniques tend to mandate every domain-specific application to unveil all private behaviors for scientific collaboration. In practice, it is infeasible for a domain-specific application to disclose its process details (as a private workflow fragment) for privacy or security reasons. Consequently, it is a challenging endeavor to coordinate scientific workflows and its distributed domain-specific applications. To address this problem, we propose a collaborative scheduling approach that can deal with temporal dependencies between a scientific workflow and a private workflow fragment. Under this collaborative scheduling approach, a private workflow fragment could maintain the temporal consistency with a scientific workflow in resource sharing and task enactments. Further, an evaluation is also presented to demonstrate the proposed approach for coordinating multiple scientific workflow executions in a concurrent environment. 相似文献

20.

Dynamic checking of temporal constraints for concurrent workflows

《Electronic Commerce Research and Applications》2005,4(2):124-142

At present, workflow management systems have not sufficiently dealt with the issues of time, involving time modelling at build-time and time management at run-time. They are lack of the ability to support the checking of temporal constraints at run-time. Although some approaches have been devised to tackle this problem, they are limited to a single workflow and use only static techniques to verify temporal constraints. In reality, there are multiple workflows executing concurrently in a workflow management system. There may well exist resource constraints between these concurrent workflows, which affect significantly the verification of temporal constraints at run-time. This paper proposes a novel approach for dynamic verification of temporal constraints for concurrent workflows. We first investigate resource constraints in workflow management systems, and then define concurrent workflow executions. Based on these definitions, we propose a verification method by analysing the temporal relationship and resource constraints between activities among concurrent workflows. 相似文献