首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 217 毫秒
1.
李学俊  吴洋  刘晓  程慧敏  朱二周  杨耘 《软件学报》2016,27(7):1861-1875
科学工作流是一种复杂的数据密集型应用程序.如何在混合云环境中对数据进行有效布局,是科学工作流所面临的重要问题,尤其是混合云的安全性要求给科学云工作流数据布局研究带来了新的挑战.传统数据布局方法大多采用基于负载均衡的划分模型布局数据集,该方法可以获得很好的负载平衡布局,然而传输时间并非最优.针对传统数据布局方法的不足,并结合混合云中数据布局的特点,首先设计一种基于数据依赖破坏度的矩阵划分模型,生成对数据依赖度破坏最小的划分;然后提出一种面向数据中心的数据布局方法,该方法依据划分模型将依赖度高的数据集尽量放在同一数据中心,从而减少数据集跨数据中心的传输时间.实验结果表明,该方法能够有效地缩短科学工作流运行时跨数据中心的数据传输时间.  相似文献   

2.
科学工作流应用是一种复杂且数据密集型的应用,常应用于结构生物学、高能物理学和神经学等涉及分布式数据源的学科。数据分散存储在基于互联网的云计算平台上,致使科学工作流在执行时伴随着大量的数据传输。云计算是一种按使用量付费的模式,数据传输产生传输费用,尤其在多个工作流相互协同的情况下,将产生更高的传输成本。该文从全局的角度建立基于多工作流数据依赖图的传输成本模型,研究基于二进制粒子群算法(BPSO)的数据布局优化策略,从而减少对云计算传输资源的租赁费用。  相似文献   

3.
科学工作流应用是一种复杂且数据密集型的应用,常应用于结构生物学、高能物理学和神经学等涉及分布式数据源的学科。数据分散存储在基于互联网的云计算平台上,致使科学工作流在执行时伴随着大量的数据传输。云计算是一种按使用量付费的模式,数据传输产生传输费用,尤其在多个工作流相互协同的情况下,将产生更高的传输成本。该文从全局的角度建立基于多工作流数据依赖图的传输成本模型,研究基于二进制粒子群算法(BPSO)的数据布局优化策略,从而减少对云计算传输资源的租赁费用。  相似文献   

4.
王润平  陈旺虎  段菊 《计算机仿真》2015,32(3):421-425,437
云计算环境下,数据密集型科学工作流的数据文件在多数据中心间的合理布局,对科学工作流的执行效率具有很大的影响。根据科学工作流各数据集之间的依赖关系,并聚焦于运行科学工作流的各数据中心的处理能力差异和网络性能差异,提出一种可提高科学工作流执行性能的数据布局以及数据布局敏感的任务调度策略。分析和实验表明,上述策略可有效减少科学工作流运行时跨数据中心的数据传输,降低科学工作流的运行时间,从而提高科学工作流整体运行效率。  相似文献   

5.
刘漳辉  赵旭  林兵  陈星 《计算机科学》2021,48(11):199-207
混合云环境下,合理的数据布局策略对科学工作流的高效执行至关重要.传统的科学工作流数据布局策略主要基于确定性环境,而在实际网络环境中,由于不同数据中心之间的负载不同、带宽波动和网络拥塞等原因以及计算机自身的特性,数据传输时间存在不确定性.为了解决该问题,基于模糊理论,以最小化数据模糊传输时间为目标,提出了一种基于遗传算法算子的模糊自适应离散粒子群优化算法(Fuzzy Adaptive Discrete Particle Swarm Optimization Algorithm Based on Genetic Algo-rithm Operators,FGA-DPSO),对科学工作流数据进行合理布局,同时满足数据集的隐私要求和数据中心的容量限制.实验结果表明,该算法能够有效地减少混合云环境下科学工作流的数据模糊传输时间.  相似文献   

6.
针对传统科学工作流数据布局策略在减少数据传输时间的同时,不能兼顾数据中心间的负载均衡,提出一种基于多目标优化的数据布局策略。首先生成固定数据集布局方案,然后利用多目标优化算法KnEA对非固定数据集进行布局,最终得到全局布局方案。KnEA算法利用knee points比普通非支配个体有着更好的收敛性特征,并综合考虑多个优化目标间的平衡,因而可以取得数据传输时间和负载均衡都很好的数据布局方案。通过对比实验证明了该数据布局策略的有效性。  相似文献   

7.
吴修国 《计算机科学》2014,41(10):154-159,190
数据副本管理是云存储系统的一个重要组成部分,对提高系统的可靠性和性能具有重要意义。一般而言,云计算环境中数据副本数目越少,其传输成本则愈大;而副本过多,存储成本又随之增加,可能导致总成本上升。从降低数据管理成本的角度,在权衡存储成本与传输成本的基础上研究面向最小成本的数据副本管理策略,主要包括:数据管理成本模型、创建副本必要性测试以及近似最小成本的副本布局策略等。以Amazon云平台数据管理成本模型为例进行实验,结果表明:面向最小成本的副本管理策略在满足用户响应时间等需求的同时,可以有效地降低数据中心的管理成本,推动企业(用户)积极运用云计算平台管理企业数据,促进云计算环境的和谐发展。  相似文献   

8.
云计算中的数据放置与任务调度算法   总被引:1,自引:0,他引:1  
在海量数据的云计算中,通常面临着数据传输时间长的问题.针对目前大多数数据放置与任务调度算法存在的副本静态性和传输标准精确度的不足,提出了一种动态调整副本个数、以时间作为衡量数据传输标准的数据放置与任务调度算法.该算法根据数据访问频率和存储大小,动态地调整副本个数,一方面减少了低访问率副本对存储空间的浪费;另一方面也减少了高访问率副本所需跨节点传输次数.考虑到节点间网络带宽的差异性,确定以数据传输时间作为传输衡量标准,提高了传输标准的精确度.实验结果表明,除了任务集和网络节点均较少的情况外,该算法均能有效地减少数据传输时间,甚至在任务集合和网络节点较多的情况下,能减少近50%的传输时间.  相似文献   

9.
在数据中心放置海量数据时,每个数据常有多个副本,服务提供商需要支付巨额电费以运行存储这些数据副本的服务器。同时,为保证多个数据副本的一致性,放置在不同数据中心的副本需要通过数据中心之间的网络进行同步,从而引发高额的网络传输费用。为此,以最小化多副本数据放置代价为目标,建立数据放置问题模型,并提出一种基于数据组和数据中心划分的数据放置算法DDDP。将数据划分为多个数据组,按用户访问数据的延迟要求将数据中心划分成数据中心子集,并将每个数据组中的数据放置到能满足访问延迟要求且能最小化放置代价的数据中心子集中。仿真结果表明,相比NPR算法,DDDP算法能有效降低数据中心存储数据时的放置代价。  相似文献   

10.
云计算环境下适于工作流的数据布局方法   总被引:1,自引:0,他引:1  
云计算环境下的工作流管理系统,适合支持需要高效的计算性能和大规模存储的跨组织业务协作,在应急管理、供应链管理和健康医疗等领域具有广泛的应用前景.然而,在大量并发工作流实例的情况下,当工作流的任务需要不同位置的数据,特别是涉及客户端隐私数据时,如何有效地存放这些数据、优化数据传输就成为其中的挑战.为此,云计算环境下适于工作流的数据布局方法能够根据数据的隐私要求把数据存放到云端和客户端,并且在工作流运行时根据控制流动态地调整数据布局,减少数据传输.实验表明,该方法能够有效减少工作流运行时的数据传输.  相似文献   

11.
范菁  沈杰  熊丽荣 《计算机科学》2015,42(Z11):400-405
混合云环境下调度包含敏感数据的工作流主要考虑在满足数据安全性以及工作流截止时间的前提下,对工作流任务在混合云上进行分配,实现计算资源与任务的映射,并优化调度费用。采用了整数规划来建模求解包含数据敏感性、截止时间和调度费用3种约束条件的混合云工作流调度问题,同时为优化模型求解速度,基于“帕雷托最优”原理对工作流任务在混合云上的分配方案进行筛选以减小模型求解规模。实验表明,优先排除不合理的任务分配方案可有效减小整数规划模型的求解规模,缩短模型计算时间,在产生较小误差的情况下获得较优的调度结果。  相似文献   

12.
An important challenge for the adoption of cloud computing in the scientific community remains the efficient allocation and execution of data-intensive scientific workflows to reduce execution time and the size of transferred data. The transferred data overhead is becoming significant with emerging scientific workflows that have input/output files and intermediate data products ranging in the hundreds of gigabytes. The allocation of scientific workflows on public clouds can be described through a variety of perspectives and parameters, and has been proved to be NP-complete. This paper proposes an evolutionary approach for task allocation on public clouds considering data transfer and execution time. In our framework, a solution is represented using an allocation chromosome that encodes the allocation of tasks to nodes, and an ordering chromosome that defines the execution order according to the scientific workflow representation. We propose a multi-objective optimization that relies on a cloud cost model and employs tailored evolution operators. Starting from a population of possible solutions, we employ crossover and mutation operators on both chromosomes aiming at optimizing the data transferred between nodes as well as the total workflow runtime. The crossover operators combine parts of solutions to reduce data overhead, whereas the mutation operators swamp between parts of the same chromosome according to pre-defined rules. Our experimental study compares between the proposed approach and current state-of-the art approaches using synthetic and real-life workflows. Our algorithm performs similarly to existing heuristics for small workflows and shows up to 80 % improvements for larger synthetic workflows. To further validate our approach we compare between the allocation and scheduling obtained by our approach with that obtained by popular scientific workflow managers, when real workflows with hundreds of tasks are executed on a public cloud. The results show a 10 % improvement in runtime over existing schedulers, caused by a 80 % reduction in transferred data and optimized allocation and ordering of tasks. This improved data locality has greater impact as it can be employed to improve and study data provenance and facilitate data persistence for scientific workflows.  相似文献   

13.
A growing number of data- and compute-intensive experiments have been modeled as scientific workflows in the last decade. Meanwhile, clouds have emerged as a prominent environment to execute this type of workflows. In this scenario, the investigation of workflow scheduling strategies, aiming at reducing its execution times, became a top priority and a very popular research field. However, few work consider the problem of data file assignment when solving the task scheduling problem. Usually, a workflow is represented by a graph where nodes represent tasks and the scheduling problem consists in allocating tasks to machines to be executed at a predefined time aiming at reducing the makespan of the whole workflow. In this article, we show that the scheduling of scientific workflows can be improved when both task scheduling and the data file assignment problems are treated together. Thus, we propose a new workflow representation, where nodes of the workflow graph represent either tasks or data files, and define the Task Scheduling and Data Assignment Problem (TaSDAP), considering this new model. We formulated this problem as an integer programming problem. Moreover, a hybrid evolutionary algorithm for solving it, named HEA-TaSDAP, is also introduced. To evaluate our approach we conducted two types of experiments: theoretical and practical ones. At first, we compared HEA-TaSDAP with the solutions produced by the mathematical formulation and by other works from related literature. Then, we considered real executions in Amazon EC2 cloud using a real scientific workflow use case (SciPhy for phylogenetic analyses). In all experiments, HEA-TaSDAP outperformed the other classical approaches from the related literature, such as Min–Min and HEFT.  相似文献   

14.
Efficient data-aware methods in job scheduling, distributed storage management and data management platforms are necessary for successful execution of data-intensive applications. However, research about methods for data-intensive scientific applications are insufficient in large-scale distributed cloud and cluster computing environments and data-aware methods are becoming more complex. In this paper, we propose a Data-Locality Aware Workflow Scheduling (D-LAWS) technique and a locality-aware resource management method for data-intensive scientific workflows in HPC cloud environments. D-LAWS applies data-locality and data transfer time based on network bandwidth to scientific workflow task scheduling and balances resource utilization and parallelism of tasks at the node-level. Our method consolidates VMs and consider task parallelism by data flow during the planning of task executions of a data-intensive scientific workflow. We additionally consider more complex workflow models and data locality pertaining to the placement and transfer of data prior to task executions. We implement and validate the methods based on fairness in cloud environments. Experimental results show that, the proposed methods can improve performance and data-locality of data-intensive workflows in cloud environments.  相似文献   

15.
提高科学工作流在云环境中的执行效率、降低执行费用受到广泛关注。用户期望的局部QoS约束与工作流的总体执行效率之间往往存在矛盾。针对该现象,在前期的研究基础上提出一种允许违反局部时间约束的科学工作流调度策略。通过对已聚簇的工作流任务集使用任务后向优先合并的方法,可实现任务间空闲时间片的合理利用,进而优化科学工作流的执行时间;另外,为充分利用任务的松弛时间,提高工作流的整体执行效率,允许部分任务的调度违反局部最晚完成时间的约束。实验结果表明,该策略能提前科学工作流的最早完成时间,提高处理机的利用率,并最终降低工作流的执行费用。  相似文献   

16.
Security is increasingly critical for various scientific workflows that are big data applications and typically take quite amount of time being executed on large-scale distributed infrastructures. Cloud computing platform is such an infrastructure that can enable dynamic resource scaling on demand. Nevertheless, based on pay-per-use and hourly-based pricing model, users should pay attention to the cost incurred by renting virtual machines (VMs) from cloud data centers. Meanwhile, workflow tasks are generally heterogeneous and require different instance series (i.e., computing optimized, memory optimized, storage optimized, etc.). In this paper, we propose a security and cost aware scheduling (SCAS) algorithm for heterogeneous tasks of scientific workflow in clouds. Our proposed algorithm is based on the meta-heuristic optimization technique, particle swarm optimization (PSO), the coding strategy of which is devised to minimize the total workflow execution cost while meeting the deadline and risk rate constraints. Extensive experiments using three real-world scientific workflow applications, as well as CloudSim simulation framework, demonstrate the effectiveness and practicality of our algorithm.  相似文献   

17.
A scientific workflow, usually consists of a good mix of fine and coarse computational granularity tasks displaying varied runtime requirements. It has been observed that fine grained tasks incur more scheduling overhead than their execution time, when executed on widely distributed platforms. Task clustering is extensively used, in such situations, as a runtime optimization method which involves combining multiple short duration tasks into a cluster, to be scheduled on a single resource. This helps in minimizing the scheduling overheads of the fine grained tasks. However, tasks grouping curtails the degree of parallelism and hence needs to be done optimally. Though a number of task clustering techniques have been developed to reduce the impact of system overheads, they fail to identify the appropriate number of clusters at each level of workflow in order to achieve maximum possible parallelism. This work proposes a level based autonomic Workflow-and-Platform Aware (WPA) task clustering technique which takes into consideration both; the workflow structure and the underlying resource set size for task clustering. It aims to achieve maximum possible parallelism among the tasks at a level of a workflow while minimizing the system overheads and resource wastage. A comparative study with current state of the art task clustering approaches on four well-known scientific workflows show that the proposed method significantly reduces the overall workflow execution time and at the same time is able to consolidate the load onto minimum possible resources.  相似文献   

18.
杜清华  张凯 《计算机工程》2022,48(7):13-21+28
为了应对复杂的数据分析任务,研究人员设计开发出结合多个平台的跨平台数据处理系统。系统跨平台工作流中算子的平台选择对于系统性能至关重要,因为算子在不同平台上的实现会产生性能间的显著差异。目前多使用基于成本的优化方法来实现跨平台工作流的平台选择,但现有的成本模型由于无法挖掘跨平台工作流的潜在信息而导致成本估计不准确。提出一种高效的跨平台工作流优化方法,采用GGFN模型作为成本模型,以算子特征和工作流特征作为模型输入,利用图注意力机制捕捉有向无环图型跨平台工作流的结构信息和算子邻居节点信息,同时结合门控循环单元记忆算子的运行时序信息,从而实现准确的成本估计。在此基础上,根据跨平台工作流的特点设计算子实现平台的枚举算法,利用基于GGFN的成本模型和延迟贪婪剪枝方法进行枚举操作,为每个算子选择合适的实现平台。实验结果表明,该方法可以将跨平台工作流的执行性能提升3倍,运行时间缩短60%以上。  相似文献   

19.
Cloud computing, an important source of computing power for the scientific community, requires enhanced tools for an efficient use of resources. Current solutions for workflows execution lack frameworks to deeply analyze applications and consider realistic execution times as well as computation costs. In this study, we propose cloud user–provider affiliation (CUPA) to guide workflow’s owners in identifying the required tools to have his/her application running. Additionally, we develop PSO-DS, a specialized scheduling algorithm based on particle swarm optimization. CUPA encompasses the interaction of cloud resources, workflow manager system and scheduling algorithm. Its featured scheduler PSO-DS is capable of converging strategic tasks distribution among resources to efficiently optimize makespan and monetary cost. We compared PSO-DS performance against four well-known scientific workflow schedulers. In a test bed based on VMware vSphere, schedulers mapped five up-to-date benchmarks representing different scientific areas. PSO-DS proved its efficiency by reducing makespan and monetary cost of tested workflows by 75 and 78%, respectively, when compared with other algorithms. CUPA, with the featured PSO-DS, opens the path to develop a full system in which scientific cloud users can run their computationally expensive experiments.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号