首页 | 本学科首页   官方微博 | 高级检索  
     

面向Dataflow的异构集群混合式资源调度框架研究
引用本文:汤小春,赵全,符莹,朱紫钰,丁朝,胡小雪,李战怀.面向Dataflow的异构集群混合式资源调度框架研究[J].软件学报,2022,33(12):4704-4726.
作者姓名:汤小春  赵全  符莹  朱紫钰  丁朝  胡小雪  李战怀
作者单位:西北工业大学 计算机学院, 陕西 西安 710129
基金项目:国家重点研发计划(2018YFB1003400)
摘    要:Dataflow模型的使用,使得大数据计算的批处理和流处理融合为一体.但是,现有的针对大数据计算的集群资源调度框架,要么面向流处理,要么面向批处理,不适合批处理与流处理作业共享集群资源的需求.另外,GPU用于大数据分析计算时,由于缺乏有效的CPU-GPU资源解耦方式,降低了资源使用效率.在分析现有的集群资源调度框架的基础上,设计并实现了一种可以感知批处理/流处理应用的混合式资源调度框架HRM.它以共享状态架构为基础,采用乐观封锁协议和悲观封锁协议相结合的方式,确保流处理作业和批处理作业的不同资源要求.在计算节点上,提供CPU-GPU资源的灵活绑定,采用队列堆叠技术,不但满足流处理作业的实时性需求,也减少了反馈延迟并实现了GPU资源的共享.通过模拟大规模作业的调度,结果显示,HRM的调度延迟只有集中式调度框架的75%左右;使用实际负载测试,批处理与流处理共享集群时,使用HRM调度框架,CPU资源利用率提高25%以上;而使用细粒度作业调度方法,不但GPU利用率提高2倍以上,作业的完成时间也能够减少50%

关 键 词:数据流模型  批处理  流处理  作业感知  CPU-GPU  队列堆叠
收稿时间:2020/11/23 0:00:00
修稿时间:2021/1/25 0:00:00

Research of Hybrid Resource Scheduling Framework of Heterogeneous Clusters for Dataflow
TANG Xiao-Chun,ZHAO Quan,FU Ying,ZHU Zi-Yu,DING Zhao,HU Xiao-Xue,LI Zhan-Huai.Research of Hybrid Resource Scheduling Framework of Heterogeneous Clusters for Dataflow[J].Journal of Software,2022,33(12):4704-4726.
Authors:TANG Xiao-Chun  ZHAO Quan  FU Ying  ZHU Zi-Yu  DING Zhao  HU Xiao-Xue  LI Zhan-Huai
Affiliation:School of Computer Science, Northwestern Polytechnical University, Xi''an 710129, China
Abstract:The use of the Dataflow model integrates the batch processing and stream processing of big data computing. Nevertheless, the existing cluster resource scheduling frameworks for big data computing are oriented either to stream processing or to batch processing, which are not suitable for batch processing and stream processing jobs to share cluster resources. In addition, when GPUs are used for big data analysis and calculations, resource usage efficiency is reduced due to the lack of effective CPU-GPU resource decoupling methods. Based on the analysis of existing cluster scheduling frameworks, a hybrid resource scheduling framework called HRM is designed and implemented that can perceive batch/stream processing applications. Based on a shared state architecture, HRM uses a combination of optimistic blocking protocols and pessimistic blocking protocols to ensure different resource requirements for stream processing jobs and batch processing jobs. On computing nodes, it provides flexible binding of CPU-GPU resources, and adopts queue stacking technology, which not only meets the real-time requirements of stream processing jobs, but also reduces feedback delays and realizes the sharing of GPU resources. By simulating the scheduling of large-scale jobs, the scheduling delay of HRM is only about 75% of the centralized scheduling framework; by using actual load testing, the CPU resource utilization is increased by more than 25% when batch processing and stream processing share clusters; by using the fine-grained job scheduling method, not only the GPU utilization rate is increased by more than 2 times, the job completion time can also be reduced by about 50%.
Keywords:dataflow model  batch process  streaming process  application aware  CPU-GPU  queue overlap
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号