首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Automation of the execution of computational tasks is at the heart of improving scientific productivity. Over the last years, scientific workflows have been established as an important abstraction that captures data processing and computation of large and complex scientific applications. By allowing scientists to model and express entire data processing steps and their dependencies, workflow management systems relieve scientists from the details of an application and manage its execution on a computational infrastructure. As the resource requirements of today’s computational and data science applications that process vast amounts of data keep increasing, there is a compelling case for a new generation of advances in high-performance computing, commonly termed as extreme-scale computing, which will bring forth multiple challenges for the design of workflow applications and management systems. This paper presents a novel characterization of workflow management systems using features commonly associated with extreme-scale computing applications. We classify 15 popular workflow management systems in terms of workflow execution models, heterogeneous computing environments, and data access methods. The paper also surveys workflow applications and identifies gaps for future research on the road to extreme-scale workflows and management systems.  相似文献   

2.
Scientific workflows have become a valuable tool for large-scale data processing and analysis. This has led to the creation of specialized online repositories to facilitate workflow sharing and reuse. Over time, these repositories have grown to sizes that call for advanced methods to support workflow discovery, in particular for similarity search. Effective similarity search requires both high quality algorithms for the comparison of scientific workflows and efficient strategies for indexing, searching, and ranking of search results. Yet, the graph structure of scientific workflows poses severe challenges to each of these steps. Here, we present a complete system for effective and efficient similarity search in scientific workflow repositories, based on the Layer Decomposition approach to scientific workflow comparison. Layer Decomposition specifically accounts for the directed dataflow underlying scientific workflows and, compared to other state-of-the-art methods, delivers best results for similarity search at comparably low runtimes. Stacking Layer Decomposition with even faster, structure-agnostic approaches allows us to use proven, off-the-shelf tools for workflow indexing to further reduce runtimes and scale similarity search to sizes of current repositories.  相似文献   

3.
Modeling and Managing Interactions among Business Processes   总被引:3,自引:0,他引:3  
Most workflow management systems (WfMSs) only support the separate andindependent execution of business processes. However, processes often needto interact with each other, in order to synchronize the execution of theiractivities, to exchange process data, to request execution of services, orto notify progresses in process execution. Recent market trends also raisethe need for cooperation and interaction between processes executed in differentorganizations, posing additional challenges. In fact, in order to reduce costsand provide better services, companies are pushed to increase cooperation and toform virtual enterprises, where business processes span across organizationalboundaries and are composed of cooperating workflows executed in differentorganizations. Workflow interaction in a cross-organizational environment iscomplicated by the heterogeneity of workflow management platforms on top ofwhich workflows are defined and executed and by the different and possiblycompeting business policies and business goals that drive process executionin each organization.In this paper we propose a model and system that enable interactionbetween workflows executed in the same or in different organizations. Weextend traditional workflow models by allowing workflows to publish andsubscribe to events, and by enabling the definition of points in the processexecution where events should be sent or received. Event notifications aremanaged by a suitable event service that is capable of filtering andcorrelating events, and of dispatching them to the appropriate targetworkflow instances. The extended model can be easily mapped onto anyworkflow model, since event specific constructs can be specified by means ofordinary workflow activities, for which we provide the implementation. Inaddition, the event service is easily portable to different platforms, anddoes not require integration with the WfMS that supports the cooperatingworkflows. Therefore, the proposed approach is applicable in virtually anyenvironment and is independent on the specific platform adopted  相似文献   

4.
工作流修正是工作流重用的重要任务.目前在基于工作流的可重用片段——stream的语义工作流修正中,当工作流stream库中不存在与检索语义工作流中的工作流stream结构相似的stream时,无法修正检索工作流.针对这种情况,提出了一种改进方案——基于stream行为特征修正语义工作流.使用任务紧邻关系集表达stream的行为特征.对于检索语义工作流中的每个与变更请求不一致的stream,使用锚集合数据索引和stream匹配规则对工作流stream库过滤得到候选匹配stream集;之后基于stream的行为相似性和变更请求对候选stream集进行验证,得到与变更请求一致程度最高和足够相似的匹配stream;然后更新变更请求,使用每个检索到的匹配stream替换原stream以逐步修正检索语义工作流中的缺陷;最后得到修正语义工作流.实验结果表明,与现有的基于工作流stream的修正算法相比,本文的算法得到了整体质量更好的修正语义工作流集,其适应性更好.该修正算法能为业务过程管理人员为适应新业务需求的工作流建模提供较好质量的修正语义工作流供参考,对提高工作流重用的效率和质量有较大帮助.  相似文献   

5.
科学工作流应用是一种复杂且数据密集型的应用,常应用于结构生物学、高能物理学和神经学等涉及分布式数据源的学科。数据分散存储在基于互联网的云计算平台上,致使科学工作流在执行时伴随着大量的数据传输。云计算是一种按使用量付费的模式,数据传输产生传输费用,尤其在多个工作流相互协同的情况下,将产生更高的传输成本。该文从全局的角度建立基于多工作流数据依赖图的传输成本模型,研究基于二进制粒子群算法(BPSO)的数据布局优化策略,从而减少对云计算传输资源的租赁费用。  相似文献   

6.
Scientific workflows are increasingly used to manage and share scientific computations and methods to analyze data. A variety of systems have been developed that store the workflows executed and make them part of public repositories However, workflows are published in the idiosyncratic format of the workflow system used for the creation and execution of the workflows. Browsing, linking and using the stored workflows and their results often becomes a challenge for scientists who may only be familiar with one system. In this paper we present an approach for addressing this issue by publishing and exploiting workflows as data on the Web with a representation that is independent from the workflow system used to create them. In order to achieve our goal, we follow the Linked Data Principles to publish workflow inputs, intermediate results, outputs and codes; and we reuse and extend well established standards like W3C PROV. We illustrate our approach by publishing workflows and consuming them with different tools designed to address common scenarios for workflow exploitation.  相似文献   

7.
几乎所有的行业都涉及工作流,用以协调任务之间的执行。由构件组成的工作流,可以提高系统的复用性,加快系统的开发和配置速度。但是构件之间句法语义的不同,决定了它们是不能直接组装的。论文提出基于本体扩展的工作流构件,使用本体来解决不同构件之间的元数据映射,并实现了基于构件的本体扩展工作流的框架,做出了具体实施。  相似文献   

8.
The field of scientific workflow management systems has grown significantly as applications start using them successfully. In 2007, several active researchers in scientific workflow developments presented the challenges for the state of the art in workflow technologies at that time. Many issues have been addressed, but one of them named ‘dynamic workflows and user steering’ remains with many open problems despite the contributions presented in the recent years. This article surveys the early and current efforts in this topic and proposes a taxonomy to identify the main concepts related to addressing issues in dynamic steering of high performance computing (HPC) in scientific workflows. The main concepts are related to putting the human-in-the-loop of the workflow lifecycle, involving user support in real-time monitoring, notification, analysis and interference by adapting the workflow execution at runtime.  相似文献   

9.
The paper presents a platform for distributed computing, developed using the latest software technologies and computing paradigms to enable big data mining. The platform, called ClowdFlows, is implemented as a cloud-based web application with a graphical user interface which supports the construction and execution of data mining workflows, including web services used as workflow components. As a web application, the ClowdFlows platform poses no software requirements and can be used from any modern browser, including mobile devices. The constructed workflows can be declared either as private or public, which enables sharing the developed solutions, data and results on the web and in scientific publications. The server-side software of ClowdFlows can be multiplied and distributed to any number of computing nodes. From a developer’s perspective the platform is easy to extend and supports distributed development with packages. The paper focuses on big data processing in the batch and real-time processing mode. Big data analytics is provided through several algorithms, including novel ensemble techniques, implemented using the map-reduce paradigm and a special stream mining module for continuous parallel workflow execution. The batch mode and real-time processing mode are demonstrated with practical use cases. Performance analysis shows the benefit of using all available data for learning in distributed mode compared to using only subsets of data in non-distributed mode. The ability of ClowdFlows to handle big data sets and its nearly perfect linear speedup is demonstrated.  相似文献   

10.
Volunteer computing systems offer high computing power to the scientific communities to run large data intensive scientific workflows. However, these computing environments provide the best effort infrastructure to execute high performance jobs. This work aims to schedule scientific and data intensive workflows on hybrid of the volunteer computing system and Cloud resources to enhance the utilization of these environments and increase the percentage of workflow that meets the deadline. The proposed workflow scheduling system partitions a workflow into sub-workflows to minimize data dependencies among the sub-workflows. Then these sub-workflows are scheduled to distribute on volunteer resources according to the proximity of resources and the load balancing policy. The execution time of each sub-workflow on the selected volunteer resources is estimated in this phase. If any of the sub-workflows misses the sub-deadline due to the large waiting time, we consider re-scheduling of this sub-workflow into the public Cloud resources. This re-scheduling improves the system performance by increasing the percentage of workflows that meet the deadline. The proposed Cloud-aware data intensive scheduling algorithm increases the percentage of workflow that meet the deadline with a factor of 75% in average with respect to the execution of workflows on the volunteer resources.  相似文献   

11.
12.
13.
14.
Security is increasingly critical for various scientific workflows that are big data applications and typically take quite amount of time being executed on large-scale distributed infrastructures. Cloud computing platform is such an infrastructure that can enable dynamic resource scaling on demand. Nevertheless, based on pay-per-use and hourly-based pricing model, users should pay attention to the cost incurred by renting virtual machines (VMs) from cloud data centers. Meanwhile, workflow tasks are generally heterogeneous and require different instance series (i.e., computing optimized, memory optimized, storage optimized, etc.). In this paper, we propose a security and cost aware scheduling (SCAS) algorithm for heterogeneous tasks of scientific workflow in clouds. Our proposed algorithm is based on the meta-heuristic optimization technique, particle swarm optimization (PSO), the coding strategy of which is devised to minimize the total workflow execution cost while meeting the deadline and risk rate constraints. Extensive experiments using three real-world scientific workflow applications, as well as CloudSim simulation framework, demonstrate the effectiveness and practicality of our algorithm.  相似文献   

15.
Large-scale applications can be expressed as a set of tasks with data dependencies between them, also known as application workflows. Due to the scale and data processing requirements of these applications, they require Grid computing and storage resources. So far, the focus has been on developing easy to use interfaces for composing these workflows and finding an optimal mapping of tasks in the workflow to the Grid resources in order to minimize the completion time of the application. After this mapping is done, a workflow execution engine is required to run the workflow over the mapped resources. In this paper, we show that the performance of the workflow execution engine in executing the workflow can also be a critical factor in determining the workflow completion time. Using Condor as the workflow execution engine, we examine the various factors that affect the completion time of a fine granularity astronomy workflow. We show that changing the system parameters that influence these factors and restructuring the workflow can drastically reduce the completion time of this class of workflows. We also examine the effect on the optimizations developed for the astronomy application on a coarser granularity biology application. We were able to reduce the completion time of the Montage and the Tomography application workflows by 90% and 50%, respectively.  相似文献   

16.
云计算为大规模科学工作流应用的执行提供了更高效的运行环境。为了解决云环境中科学工作流调度的代价优化问题,提出了一种基于协同进化的工作流调度遗传算法CGAA。该算法将自适应惩罚函数引入严格约束的遗传算法中,通过协同进化的方法,自适应地调整种群个体的交叉与变异概率,以加速算法收敛并防止种群早熟。通过4种科学工作流的仿真实验结果表明,CGAA算法得到的调度方案在满足工作流调度截止时间约束与降低任务执行代价的综合性能方面优于同类型算法。  相似文献   

17.
The exploratory nature of a scientific computational experiment involves executing variations of the same workflow with different approaches, programs, and parameters. However, current approaches do not systematize the derivation process from the experiment definition to the concrete workflows and do not track the experiment provenance down to the workflow executions. Therefore, the composition, execution, and analysis for the entire experiment become a complex task. To address this issue, we propose the Algebraic Experiment Line (AEL). AEL uses a data-centric workflow algebra, which enriches the experiment representation by introducing a uniform data model and its corresponding operators. This representation and the AEL provenance model map concepts from the workflow execution data to the AEL derived workflows with their corresponding experiment abstract definitions. We show how AEL has improved the understanding of a real experiment in the bioinformatics area. By combining provenance data from the experiment and its corresponding executions, AEL provenance queries navigate from experiment concepts defined at high abstraction level to derived workflows and their execution data. It also shows a direct way of querying results from different trials involving activity variations and optionalities, only present at the experiment level of abstraction.  相似文献   

18.
End-to-end scientific application workflows that integrate high-end experiments and instruments with large scale simulations and end-user displays are becoming increasingly important. These workflows require complex couplings and data sharing between distributed components involving large data volumes and present varying hard (in-time data delivery) and soft (in-transit processing) quality of service (QoS) requirements. As a result, supporting efficient data transport is critical for such workflows. In this paper, we leverage software-defined networking (SDN) to address issues of data transport service control and resource provisioning to meet varying QoS requirements from multiple coupled workflows sharing the same service medium. Specifically, we present a flexible control and a disciplined resource scheduling approach for data transport services for science networks. Furthermore, we emulate an SDN testbed on top of the FutureGrid virtualized testbed and use it to evaluate our approach for a realistic scientific workflow. Our results show that SDN-based control and resource scheduling based on simple intuitive models can meet the requirements of the targeted workflows with high resource utilization.  相似文献   

19.
提高科学工作流在云环境中的执行效率、降低执行费用受到广泛关注。用户期望的局部QoS约束与工作流的总体执行效率之间往往存在矛盾。针对该现象,在前期的研究基础上提出一种允许违反局部时间约束的科学工作流调度策略。通过对已聚簇的工作流任务集使用任务后向优先合并的方法,可实现任务间空闲时间片的合理利用,进而优化科学工作流的执行时间;另外,为充分利用任务的松弛时间,提高工作流的整体执行效率,允许部分任务的调度违反局部最晚完成时间的约束。实验结果表明,该策略能提前科学工作流的最早完成时间,提高处理机的利用率,并最终降低工作流的执行费用。  相似文献   

20.
Social workflows pervade peoples׳ everyday life. Whenever a group of persons works together on a challenging or multifaceted task, a social workflow begins. Unlike traditional business workflows, such social workflows aim at supporting processes that contain personal tasks and data. In this work, we envision a social workflow service as part of a social network that enables private individuals to construct social workflows according to their specific needs and to keep track of the workflow execution. The proposed features for a social workflow service could help individuals to accomplish their private goals. The presented idea is contrasted with established research areas and applications to show the degree of novelty of this work. It is shown how novel ideas for knowledge management, facilitated by a process-oriented case-based reasoning approach, support private individuals and how they can obtain an appropriate social workflow through sharing and reuse of respective experience. Two empirical studies confirm the potential benefits of a social workflow service in general and the core features of the developed concept.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号