首页 | 本学科首页   官方微博 | 高级检索  
     

数据中心网络流量调度的研究进展与趋势
引用本文:李文信,齐恒,徐仁海,周晓波,李克秋.数据中心网络流量调度的研究进展与趋势[J].计算机学报,2020,43(4):600-617.
作者姓名:李文信  齐恒  徐仁海  周晓波  李克秋
作者单位:大连理工大学 计算机科学与技术学院 大连 116024;天津大学 智能与计算学部 天津 300350
摘    要:近年来,流量调度已经发展成为网络领域的热点研究问题.该问题主要决定何时以及以多大速率传输网络中的每条数据流,其对网络性能和应用性能都具有十分重要的影响.然而,在托管着许多大规模互联网应用的数据中心中,流量调度问题正面临着流量矩阵多变、流量种类混杂、以及流量突发等与流量模型相关的挑战.此外,随着数据中心规模的不断壮大,流量调度问题还面临着网络带宽动态化、网络拥塞随机化、以及网络目标多样化等与网络模型相关的挑战.为了进一步提升对数据中心流量调度的关注和理解,推动流调度技术在实际应用中的不断发展,本文分别从调度目标、调度方式和调度对象这三个维度对数据中心网络流调度的相关研究工作进行了分析和对比,并概括出如下结论:现有研究主要以分布式、集中式或混合式的调度方式对数据中心内、数据中心间或数据中心与用户间的流进行高效地调度,从而达到带宽保障、时限保障、最小化流完成时间、最小化Coflow完成时间、公平性保证、最小化流传输成本等目标.本文最后还指出了四个数据中心流调度的未来发展方向,并相应提出尚未解决的研究问题.

关 键 词:数据中心网络  流调度  Coflow调度  完成时间  优先级队列  带宽保障  时限保障

Data Center Network Flow Scheduling Progress and Trends
LI Wen-Xin,QI Heng,XU Ren-Hai,ZHOU Xiao-Bo,LI Ke-Qiu.Data Center Network Flow Scheduling Progress and Trends[J].Chinese Journal of Computers,2020,43(4):600-617.
Authors:LI Wen-Xin  QI Heng  XU Ren-Hai  ZHOU Xiao-Bo  LI Ke-Qiu
Affiliation:(School of Computer Science and Technology,Dalian University of Technology,Dalian 116024;College of Intelligence and Computing,Tianjin University,Tianjin 300350)
Abstract:Modern data center serves as the underlying infrastructure for many applications,including online internet services,data-parallel computing,machine learning,and cloud computing.A common denominator of these applications or services is that they will generate massive amounts of data flows in the network.From the perspectives of both the network operator and the applications/users,data center networks must be utilized effectively and efficiently.Flow scheduling is a promising technique to enhance the performance of the datacenter network,and hence has recently gained much research interest.Flow scheduling mainly determines when and at what rate to send each flow in the network,such that the desired objectives(e.g.,minimum flow completion time(FCT),guarantee deadline)can be achieved.In this survey paper,we first illustrate the fundamental problems and challenges of scheduling flows in data center networks.It has the following two major challenges.First,the datacenter network flow model is complicated due to the dynamics,burst,and mixture in the traffic.Second,the network model is also full of complexity because of the dynamic network bandwidth,random network congestion,and diverse network optimization objectives.Bearing those challenges in mind,this paper compares and summarizes the related research work on data center flow schedulingfrom three dimensions:scheduling optimization goals, scheduling methods, and scheduling entities.At the level of scheduling optimization goal,existing work on flow scheduling can be divided into sixcategories:bandwidth guarantee, deadline guarantee, minimum FCT, minimum coflow completiontime(CCT), fairness guarantee, minimum traffic transmission cost. At the level of schedulingmethod, existing work mainly falls into three kinds:distributed scheduling, centralized scheduling,and hybrid scheduling. At the level of scheduling entity,they can further be classified into three kinds:intra-datacenter flow scheduling,inter-datacenter flow scheduling,datacenter-client flow scheduling.Though many flow scheduling solutions have been proposed in existing work,most of them are still inthe research stage, and are far from being adopted by the industry. The low complexity, low cost,and high-performance flow scheduling schemes need further exploration. Therefore,at the end of thispaper, we point out four potential research directions of data center flow scheduling as well as thecorresponding unresolved research problems involved in flow scheduling. First,most flow schedulingschemes rely on limited switch function(e. g.,priority queues),while modern switches have muchmore flexibility to support more complex network functions due to its programmability. Hence,programming flow scheduling on switches is one potential research direction. Second, existingsolutions need to hook packets in the end-host network stack to tag priorities in the packet header toperform flow scheduling, while such packet tagging incurs substantial overhead, making theminapplicable to high-speed networks. Since 40G and 100G or even 200G networks are coming,scheduling flows at such high-speed networks is another direction. Third, traditional model-basedflow scheduling is sub-optimal;machine learning provides a new choice for high-efficiency flowscheduling. Hence,machine learning assisted flow scheduling is the third potential direction. Finally,as the geo-distributed machine learning and federated learning become important workloads,scheduling inter-datacenter flows(especially tiny flows)with security constraints to reduce FCT isalso one of the potential directions.
Keywords:datacenter networks  flow scheduling  coflow scheduling  completion time  priority queues  bandwidth guarantee  deadline guarantee
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号