面向Dataflow的异构集群混合式资源调度框架研究 Research of Hybrid Resource Scheduling Framework of Heterogeneous Clusters for Dataflow期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

面向Dataflow的异构集群混合式资源调度框架研究

引用本文：	汤小春,赵全,符莹,朱紫钰,丁朝,胡小雪,李战怀.面向Dataflow的异构集群混合式资源调度框架研究[J].软件学报,2022,33(12):4704-4726.

作者姓名：	汤小春赵全符莹朱紫钰丁朝胡小雪李战怀

作者单位：	西北工业大学计算机学院, 陕西西安 710129

基金项目：	国家重点研发计划（2018YFB1003400）

摘要：	Dataflow模型的使用，使得大数据计算的批处理和流处理融合为一体.但是，现有的针对大数据计算的集群资源调度框架，要么面向流处理，要么面向批处理，不适合批处理与流处理作业共享集群资源的需求.另外，GPU用于大数据分析计算时，由于缺乏有效的CPU-GPU资源解耦方式，降低了资源使用效率.在分析现有的集群资源调度框架的基础上，设计并实现了一种可以感知批处理/流处理应用的混合式资源调度框架HRM.它以共享状态架构为基础，采用乐观封锁协议和悲观封锁协议相结合的方式，确保流处理作业和批处理作业的不同资源要求.在计算节点上，提供CPU-GPU资源的灵活绑定，采用队列堆叠技术，不但满足流处理作业的实时性需求，也减少了反馈延迟并实现了GPU资源的共享.通过模拟大规模作业的调度，结果显示，HRM的调度延迟只有集中式调度框架的75%左右；使用实际负载测试，批处理与流处理共享集群时，使用HRM调度框架，CPU资源利用率提高25%以上；而使用细粒度作业调度方法，不但GPU利用率提高2倍以上，作业的完成时间也能够减少50%
关键词：	数据流模型批处理流处理作业感知 CPU-GPU 队列堆叠
收稿时间：	2020/11/23 0:00:00
修稿时间：	2021/1/25 0:00:00
Research of Hybrid Resource Scheduling Framework of Heterogeneous Clusters for Dataflow

TANG Xiao-Chun,ZHAO Quan,FU Ying,ZHU Zi-Yu,DING Zhao,HU Xiao-Xue,LI Zhan-Huai.Research of Hybrid Resource Scheduling Framework of Heterogeneous Clusters for Dataflow[J].Journal of Software,2022,33(12):4704-4726.

Authors:	TANG Xiao-Chun ZHAO Quan FU Ying ZHU Zi-Yu DING Zhao HU Xiao-Xue LI Zhan-Huai

Affiliation:	School of Computer Science, Northwestern Polytechnical University, Xi''an 710129, China

Abstract:	The use of the Dataflow model integrates the batch processing and stream processing of big data computing. Nevertheless, the existing cluster resource scheduling frameworks for big data computing are oriented either to stream processing or to batch processing, which are not suitable for batch processing and stream processing jobs to share cluster resources. In addition, when GPUs are used for big data analysis and calculations, resource usage efficiency is reduced due to the lack of effective CPU-GPU resource decoupling methods. Based on the analysis of existing cluster scheduling frameworks, a hybrid resource scheduling framework called HRM is designed and implemented that can perceive batch/stream processing applications. Based on a shared state architecture, HRM uses a combination of optimistic blocking protocols and pessimistic blocking protocols to ensure different resource requirements for stream processing jobs and batch processing jobs. On computing nodes, it provides flexible binding of CPU-GPU resources, and adopts queue stacking technology, which not only meets the real-time requirements of stream processing jobs, but also reduces feedback delays and realizes the sharing of GPU resources. By simulating the scheduling of large-scale jobs, the scheduling delay of HRM is only about 75% of the centralized scheduling framework; by using actual load testing, the CPU resource utilization is increased by more than 25% when batch processing and stream processing share clusters; by using the fine-grained job scheduling method, not only the GPU utilization rate is increased by more than 2 times, the job completion time can also be reduced by about 50%.

Keywords:	dataflow model batch process streaming process application aware CPU-GPU queue overlap

	点击此处可从《软件学报》浏览原始摘要信息
	点击此处可从《软件学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏