首页 | 本学科首页   官方微博 | 高级检索  
     

面向Flink的负载均衡任务调度算法的研究与实现
引用本文:李文佳,史岚,季航旭,罗意彭.面向Flink的负载均衡任务调度算法的研究与实现[J].计算机工程与科学,2022,44(7):1141-1151.
作者姓名:李文佳  史岚  季航旭  罗意彭
作者单位:(1.东北大学计算机科学与工程学院,辽宁 沈阳 110169;2.辽宁工业大学软件学院,辽宁 锦州 121000)
基金项目:科技部重点研发项目(2018YFB1004402)
摘    要:Apache Flink是现在主流的大数据分布式计算引擎之一,其中任务调度问题是分布式计算系统中的关键问题。由于集群的异构性以及不同算子复杂度不同,大数据计算系统Flink中不可避免地会出现负载不均的情况,针对这种问题,提出了基于资源反馈的负载均衡任务调度算法RFTS。通过实时资源监控、区域划分和基于人工萤火虫优化的任务调度算法3个模块,把负载过重的机器中处于等待状态的任务分配给负载较轻的机器,来实现集群的负载均衡,提高系统集群利用率和执行效率。最后通过基于TPC-C和TPC-H数据集的实验结果表明,RFTS算法从执行时间和吞吐量2个方面有效提升了Apache Flink计算系统的性能。

关 键 词:Apache  Flink  基于资源反馈的负载均衡任务调度算法  实时资源监控  区域划分  人工萤火虫优化算法     
收稿时间:2021-11-10
修稿时间:2022-01-17

Research and implementation of a Flink-orientedload balancing task scheduling algorithm
LI Wen-jia,SHI Lan,JI Hang-xu,LUO Yi-peng.Research and implementation of a Flink-orientedload balancing task scheduling algorithm[J].Computer Engineering & Science,2022,44(7):1141-1151.
Authors:LI Wen-jia  SHI Lan  JI Hang-xu  LUO Yi-peng
Affiliation:(1.College of Computer Science and Engineering,Northeastern University,Shenyang 110169; 2.School of Software,Liaoning University of Technology,Jinzhou 121000,China)
Abstract:Apache Flink is one of the mainstream big data distributed computing engines, and task scheduling is a key issue in distributed computing systems. Due to the heterogeneity of clusters and the different complexity of operators, uneven load will inevitably appear in the big data computing system Flink. To solve this problem, a load balancing task scheduling algorithm based on resource feedback, named RFTS, is proposed. Through the three modules (real-time resource monitoring, area division, and task scheduling algorithm based on glowworm swarm optimization), the tasks in the waiting queue in the over-loaded machine are allocated to the lighter-loaded machines, so as to reduce the load unevenness of the entire cluster and improve the cluster utilization and execution efficiency of the system. Finally, through the experimental verification based on the TPC-C and TPC-H datasets, the results show that the load balancing task scheduling algorithm based on resource feedback (RFTS) can effectively improve the performance of the Apache Flink computing system in terms of execution time and throughput.
Keywords:Apache Flink  load balancing task scheduling algorithm based on resource feedback  real-time resource monitoring  area division  glowworm swarm optimization algorithm  
点击此处可从《计算机工程与科学》浏览原始摘要信息
点击此处可从《计算机工程与科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号