首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 343 毫秒
1.
衡量大数据的数据挖掘性能有2个最重要的任务指标:一是实时性,二是准确性。流数据从数据产生到消息队列再通过数据源流入Flink进行计算,这个过程中因为网络传输速度不同,不同节点的计算性能不同等原因,流数据进入计算框架的先后顺序和数据产生的事件时间顺序会有局部乱序的现象。面对窗口作业的传统水位线机制在不确定乱序程度的流数据情况下无法同时兼顾作业结果的实时性和准确性。针对这个问题,建立了流数据微簇模型。通过局部乱序度算法,根据流数据微簇的流数据事件时间局部乱序程度计算出可以代表当前时刻流数据的乱序度。设计了水位线动态调整策略,使水位线根据流数据的乱序程度动态调整大小。最后,在Apache Flink框架中对基于事件时间窗口的水位线动态调整策略进行了实现。实验结果表明,弹性或不确定乱序流数据条件下,基于事件时间窗口的水位线动态调整策略可以有效地同时兼顾窗口作业的准确性和实时性。  相似文献   

2.
Flink流处理系统默认的任务调度策略在一定程度上忽略了集群异构和节点可用资源,导致集群整体负载不均衡。研究分布式节点的实时性能和集群作业环境,根据实际作业环境的异构分布情况,设计结合异构Flink集群的节点优先级调整方法,以基于Ganglia可扩展分布式集群资源监控系统的集群信息为依据,动态调整适应当前作业环境的节点优先级指数。基于此提出Flink节点动态自适应调度策略,通过实时监测节点的异构状况,并在任务执行过程中根据实时作业环境更新节点优先级指数,为系统任务找到最佳的执行节点完成任务分配。实验结果表明,相比于Flink默认的任务调度策略,基于节点优先级调整方法的自适应调度策略在WorldCount基准测试中的运行时间约平均减少6%,可使异构Flink集群在保持集群低延迟的同时,节点资源利用率和任务执行效率更高。  相似文献   

3.
Hadoop已成为研究云计算的基础平台,MapReduce是其大数据分布式处理的计算模型。针对异构集群下MapReduce数据分布、数据本地性、作业执行流程等问题,提出一种基于DAG的MapReduce调度算法。把集群中的节点按计算能力进行划分,将MapReduce作业转换成DAG模型,改进向上排序值计算方法,使其在异构集群中计算更精准、任务的优先级排序更合理。综合节点的计算能力与数据本地性及集群利用情况,选择合理的数据节点分配和执行任务,减少当前任务完成时间。实验表明,该算法能合理分布数据,有效提高数据本地性,减少通信开销,缩短整个作业集的调度长度,从而提高集群的利用率。  相似文献   

4.
针对异构集群任务推测式执行算法存在的任务进度比例固定、落后任务被动选取等问题,提出基于快慢节点集计算能力差异的自适应任务调度算法。该算法量化节点集计算能力差异实现分集调度,并通过节点与任务速率的动态反馈及时更新快慢节点集,提高节点集资源利用率与任务并行度。在两节点集中,利用动态调整任务进度比例判别落后任务,主动选择采用替代执行方式为落后任务执行备份任务的快节点,从而提升任务执行效率。与最长近似结束时间(LATE)算法的实验对比结果表明,该算法在短作业集、混合型作业集、出现节点性能下降的混合型作业集执行时间上比LATE算法分别缩短了5.21%、20.51%、23.86%,启用的备份任务数比LATE算法明显减少。所提算法可使任务主动适应节点差异,在减少备份任务的同时有效提高作业整体执行效率。  相似文献   

5.
刘仲  周兴铭 《计算机学报》2006,29(10):1757-1763
提出一种支持权重分布数据的可伸缩分布式动态区间映射算法.该算法能够在存储节点发生变化时,根据可用的资源情况立即重新均衡数据对象分布,从所有存储节点中并行迁移数据对象,且迁移的数据对象数目是最少的.在此基础上提出分布式节点地址计算算法,支持计算节点通过视图校正算法自主学习,自动适应新的系统规模,消除了现有的集中式访问性能瓶颈,使系统具有高可伸缩性.  相似文献   

6.
针对Hadoop平台下默认调度算法FIFO、计算能力调度算法以及公平调度算法在调度过程中遵守严格的队列顺序,导致一些任务被调度到不满足数据本地性节点上的问题,提出一个基于本地性的调度算法——延时调度。该算法在维护公平性原则的同时,当一个被调度的作业无法启动一个本地的任务时,让这个任务等待一小段时间,调度其他作业先执行。实验结果表明,此调度算法缩短了作业平均响应时间,有效增加了集群系统的吞吐量,提高了集群资源利用率。  相似文献   

7.
为了解决高速网络流量分类系统的性能瓶颈问题,提出了一种并行网络流量分类系统负载均衡算法。该算法由静态预分配和动态自适应调整两部分组成,采用基于Hash流表实现负载的静态预分配,根据处理节点的动态反馈对Hash流表进行重映射。通过实验和静态Hash算法以及SHI算法进行了对比,实验结果表明,该算法负载均衡度好、丢包率小、流重映射率低,能够满足并行网络流量分类系统负载均衡的要求。  相似文献   

8.
传统经典作业度算法在集群应用中实现简单、执行效率高,但在异构集群环境下由于缺乏在线节点运行状态动态反馈能力和负载均衡能力,降低了计算资源利用率和系统吞吐率.为解决上述问题,设计了一种在异构集群环境下基于主机性能度量的作业负载均衡调度算法,该算法通过收集集群中在线节点的状态信息和作业响应时间遴选出可信节点集合,计算出各可信节点的HPM值,利用负载均衡运算规则生成候选的作业分配节点集合,最终按照预先设计的优先原则把不同作业分配至各计算节点,并更新各节点运行状态.实验结果表明,在异构集群环境下调度同类型作业时,该算法在总完成时间和负载均衡性能等指标上均优于传统经典算法.  相似文献   

9.
本设计实现一种动态归并算法,主要应用在对于分布式结构化数据的跨节点跨表实时分页查询的业务场景中.分布式数据库中数据表都会被拆分为若干子表并存储于若干数据节点中,在对数据进行单表查询和多表查询时都需要进行数据的归并,本算法被设计用来处理中间数据的归并问题,在归并策略上采用了二路归并,从而保证了较高的节点并发度,使得归并的计算负载能够均衡地分配在各计算节点上;采用动态的归并过程而不是在任务一开始就确定节点之间的归并配对关系,确保算法的自适应性,避免了预先制定归并策略而可能导致的数据等待.实验结果表明随着参与归并的节点数量的提高,该算法执行效率明显优于单节点归并以及预先设定归并策略的多节点归并.  相似文献   

10.
为了解决混合无线传感器网络的节点覆盖率低的问题,提出了改进粒子群的混合无线传感器网络节点覆盖迭代优化算法.在该算法中,首先将混合无线传感器网络节点覆盖模型转化为在网络系统中动态的求覆盖率最大值的节点部署位置寻优问题;然后提出利用改进粒子群算法对节点覆盖优化方案进行粒子及其权值映射,并依据粒子粒距聚类度和粒子信息熵对粒子权值进行调整,再依据粒子适应度值对粒子局部最优值和全局最优值进行更新;最后迭代地对粒子的位置和速度进行计算,输出具有最优覆盖率的节点部署方案.仿真结果证明,该算法能够有效的提升网络覆盖率,且算法的收敛速度快.  相似文献   

11.
基于层次化调度策略和动态数据复制的网格调度方法   总被引:2,自引:0,他引:2  
针对在网格中如何有效地进行任务调度和数据复制, 以便减少任务执行时间等问题, 提出了任务调度算法(ISS)和优化动态数据复制算法(ODHRA), 并构建一个方案将两种算法进行了有效结合。该方案采用ISS算法综合考虑任务等待队列的数量、任务需求数据的位置和站点的计算容量, 采用网络结构分级调度的方式, 配以适当的权重系数计算综合任务成本, 搜索出最佳计算节点区域; 采用ODHRA算法分析数据传输时间、存储访问延迟、等待在存储队列中的副本请求和节点间的距离, 在众多的副本中选取出最佳副本位置, 再结合副本放置和副本管理, 从而降低了文件访问时间。仿真结果表明, 提出的方案在平均任务执行时间方面, 与其他算法相比表现出了更好的性能。  相似文献   

12.
朱洁  李雯睿  赵红  李滢 《计算机应用》2015,35(12):3383-3386
针对目前层级队列作业调度算法中资源占比高的作业执行效率低的问题,提出一种资源匹配最大集算法。该算法分析作业特征,引入完成度、等待时间、优先级、重调度次数为紧迫值因子,优先考虑资源占比高或等待时间长的作业,以改善作业公平性;采用双队列结构在可用资源总量内优先选择高紧迫值作业,在不同资源占比作业集比较中选择作业数最大集,以实现调度平衡。在与最大最小公平(Max-min fairness)算法的实例对比中发现,该算法可降低作业集平均等待时间、提高资源利用率。实验对比结果表明,该算法可将不同资源占比的单一类型作业集执行时间缩短18.73%,其中资源占比高的作业执行时间缩短27.26%;在混合型作业集中对应的执行时间可分别缩短22.36%与30.28%。所提算法能有效减少资源占比高作业的等待,提高作业整体执行效率。  相似文献   

13.

In recent years, various studies on OpenStack-based high-performance computing have been conducted. OpenStack combines off-the-shelf physical computing devices and creates a resource pool of logical computing. The configuration of the logical computing resource pool provides computing infrastructure according to the user’s request and can be applied to the infrastructure as a service (laaS), which is a cloud computing service model. The OpenStack-based cloud computing can provide various computing services for users using a virtual machine (VM). However, intensive computing service requests from a large number of users during large-scale computing jobs may delay the job execution. Moreover, idle VM resources may occur and computing resources are wasted if users do not employ the cloud computing resources. To resolve the computing job delay and waste of computing resources, a variety of studies are required including computing task allocation, job scheduling, utilization of idle VM resource, and improvements in overall job’s execution speed according to the increase in computing service requests. Thus, this paper proposes an efficient job management of computing service (EJM-CS) by which idle VM resources are utilized in OpenStack and user’s computing services are processed in a distributed manner. EJM-CS logically integrates idle VM resources, which have different performances, for computing services. EJM-CS improves resource wastes by utilizing idle VM resources. EJM-CS takes multiple computing services rather than single computing service into consideration. EJM-CS determines the job execution order considering workloads and waiting time according to job priority of computing service requester and computing service type, thereby providing improved performance of overall job execution when computing service requests increase.

  相似文献   

14.
Data Grid integrates graphically distributed resources for solving data intensive scientific applications. Effective scheduling in Grid can reduce the amount of data transferred among nodes by submitting a job to a node, where most of the requested data files are available. Scheduling is a traditional problem in parallel and distributed system. However, due to special issues and goals of Grid, traditional approach is not effective in this environment any more. Therefore, it is necessary to propose methods specialized for this kind of parallel and distributed system. Another solution is to use a data replication strategy to create multiple copies of files and store them in convenient locations to shorten file access times. To utilize the above two concepts, in this paper we develop a job scheduling policy, called hierarchical job scheduling strategy (HJSS), and a dynamic data replication strategy, called advanced dynamic hierarchical replication strategy (ADHRS), to improve the data access efficiencies in a hierarchical Data Grid. HJSS uses hierarchical scheduling to reduce the search time for an appropriate computing node. It considers network characteristics, number of jobs waiting in queue, file locations, and disk read speed of storage drive at data sources. Moreover, due to the limited storage capacity, a good replica replacement algorithm is needed. We present a novel replacement strategy which deletes files in two steps when free space is not enough for the new replica: first, it deletes those files with minimum time for transferring. Second, if space is still insufficient then it considers the last time the replica was requested, number of access, size of replica and file transfer time. The simulation results show that our proposed algorithm has better performance in comparison with other algorithms in terms of job execution time, number of intercommunications, number of replications, hit ratio, computing resource usage and storage usage.  相似文献   

15.
朱洁  赵红  李雯睿 《计算机应用》2014,34(11):3227-3230
Hadoop集群单队列作业调度会产生短作业等待、资源利用率低的问题;采用多队列调度可兼顾公平、提高执行效率,但会带来手工配置参数、资源互占、算法复杂等问题。针对上述问题,提出三队列作业调度算法,利用区分作业类型、动态调整作业优先级、配置共享资源池、作业抢占等设计,达到平衡作业需求、简化一般作业调度流程、提升并行执行能力的目的。对短作业占比高,各作业占比均衡以及一般作业为主,偶尔出现长、短作业三种情况与先进先出(FIFO)算法进行了对比实验,结果三队列算法的运行时间均比FIFO算法要少。实验结果表明,在短作业聚集时,三队列算法的执行效率提升并不显著;但当各种作业并存且分布均衡时,效果很明显,这符合了算法设计时短作业优先、一般作业简化流程、兼顾长作业的初衷,提高了作业整体执行效率。  相似文献   

16.
移动边缘计算(MEC)系统中,因本地计算能力和电池能量不足,终端设备可以决定是否将延迟敏感性任务卸载到边缘节点中执行。针对卸载过程中用户任务随机产生且系统资源动态变化问题,提出了一种基于异步奖励的深度确定性策略梯度(asynchronous reward deep deterministic policy gradient,ARDDPG)算法。不同于传统独立任务资源分配采用顺序等待执行的策略,该算法在任务产生的时隙即可执行资源分配,不必等待上一个任务执行完毕,以异步模式获取任务计算奖励。ARDDPG算法在时延约束下联合优化了任务卸载决策、动态带宽分配和计算资源分配,并通过深度确定性策略梯度训练神经网络来探索最佳优化性能。仿真结果表明,与随机策略、基线策略和DQN算法相比,ARDDPG算法在不同时延约束和任务生成率下有效降低了任务丢弃率和系统的时延和能耗。  相似文献   

17.
Science is increasingly becoming more and more data-driven. The ability of a geographically distributed community of scientists to access and analyze large amounts of data has emerged as a significant requirement for furthering science. In data intensive computing environment with uncountable numeric nodes, resource is inevitably unreliable, which has a great effect on task execution and scheduling. Novel algorithms are needed to schedule the jobs on the trusty nodes to execute, assure the high speed of communication, reduce the jobs execution time, lower the ratio of failure execution, and improve the security of execution environment of important data. In this paper, a kind of trust mechanism-based task scheduling model was presented. Referring to the trust relationship models of social persons, trust relationship is built among computing nodes, and the trustworthiness of nodes is evaluated by utilizing the Bayesian cognitive method. Integrating the trustworthiness of nodes into a Dynamic Level Scheduling (DLS) algorithm, the Trust-Dynamic Level Scheduling (Trust-DLS) algorithm is proposed. Moreover, a benchmark is structured to span a range of data intensive computing characteristics for evaluation the proposed method. Theoretical analysis and simulations prove that the Trust-DLS algorithm can efficiently meet the requirement of data intensive workloads in trust, sacrificing fewer time costs, and assuring the execution of tasks in a security way in large-scale data intensive computing environment.  相似文献   

18.
一个基于偏序的定时投入关联网络作业调度算法   总被引:1,自引:0,他引:1  
基于偏序的定时投入关联网络作业调度算法在一个大型作业管理系统中得到了应用,它的基本思想是对定时投入的关联网络作业找到一个最佳执行序列,以便减少互相关联的网络作业在执行时的等待时间,该算法首先将同时请求投入的多个有关联关系的网络作业按照偏序关系进行排序,形成关联作业,然后推算出网络作业的阶位值,最后产生一个最优的投入序列,从而大大提高关联作业执行时间,实际系统应用表明,此算法对作业管理系统中定时投入关联网络作业的快速执行有很强的优越性。  相似文献   

19.
We consider the problem of scheduling jobs arriving over time in a multiprocessor setting, with immediate dispatching, disallowing job migration. The goal is to minimize both the total flow time (total time in the system) and the total completion time. Previous studies have shown that while preemption (interrupt a job and later continue its execution) is inherent to make a scheduling algorithm efficient, migration (continue the execution on a different machine) is not. Still, the current non-migratory online algorithms suffer from a need for a central queue of unassigned jobs which is a "no option" in large computing systems, such as the Web. We introduce a simple online non-migratory algorithm IMD, which employs immediate dispatching, i.e., it immediately assigns released jobs to one of the machines. We show that the performance of this algorithm is within a logarithmic factor of the optimal migratory offline algorithm, with respect to the total flow time, and within a small constant factor of the optimal migratory offline algorithm, with respect to the total completion time. This solves an open problem suggested by Awerbuch et al. (STOC 99).  相似文献   

20.
Adaptive Execution of Jobs in Computational Grid Environment   总被引:1,自引:0,他引:1       下载免费PDF全文
In a computational grid, jobs must adapt to the dynamically changing heterogeneous environment with an objective of maintaining the quality of service. In order to enable adaptive execution of multiple jobs running concurrently in a computational grid, we propose an integrated performance-based resource management framework that is supported by a multi-agent system (MAS). The multi-agent system initially allocates the jobs onto different resource providers based on a resource selection algorithm. Later, during runtime, if performance of any job degrades or quality of service cannot be maintained for some reason (resource failure or overloading), the multi-agent system assists the job to adapt to the system. This paper focuses on a part of our framework in which adaptive execution facility is supported. Adaptive execution facility is availed by reallocation and local tuning of jobs. Mobile, as well as static agents are employed for this purpose. The paper provides a summary of the design and implementation and demonstrates the efficiency of the framework by conducting experiments on a local grid test bed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号