首页 | 本学科首页   官方微博 | 高级检索  
     

基于改进深度强化学习的虚拟网络功能部署优化算法
引用本文:唐伦, 贺兰钦, 连沁怡, 谭颀. 基于改进深度强化学习的虚拟网络功能部署优化算法[J]. 电子与信息学报, 2021, 43(6): 1724-1732. doi: 10.11999/JEIT200297
作者姓名:唐伦  贺兰钦  连沁怡  谭颀
作者单位:1.重庆邮电大学通信与信息工程学院 重庆 400065;;2.重庆邮电大学移动通信技术重点实验室 重庆 400065;;3.三峡大学国际交流学院 宜昌 443002
基金项目:国家自然科学基金(62071078),重庆市教委科学技术研究项目(KJZD-M201800601),重庆市重大主题专项 (cstc2019jscx-zdztzxX0006)
摘    要:针对网络功能虚拟化/软件定义网络 (NFV/SDN)架构下,网络服务请求动态到达引起的服务功能链(SFC)部署优化问题,该文提出一种基于改进深度强化学习的虚拟网络功能(VNF)部署优化算法。首先,建立了马尔科夫决策过程 (MDP)的随机优化模型,完成SFC的在线部署以及资源的动态分配,该模型联合优化SFC部署成本和时延成本,同时受限于SFC的时延以及物理资源约束。其次,在VNF部署和资源分配的过程中,存在状态和动作空间过大,以及状态转移概率未知等问题,该文提出了一种基于深度强化学习的VNF智能部署算法,从而得到近似最优的VNF部署策略和资源分配策略。最后,针对深度强化学习代理通过ε贪婪策略进行动作探索和利用,造成算法收敛速度慢等问题,提出了一种基于值函数差异的动作探索和利用方法,并进一步采用双重经验回放池,解决经验样本利用率低的问题。仿真结果表示,该算法能够加快神经网络收敛速度,并且可以同时优化SFC部署成本和SFC端到端时延。

关 键 词:虚拟网络功能   深度强化学习   服务功能链端到端时延   服务功能链部署成本
收稿时间:2020-04-21
修稿时间:2021-01-22

Virtual Network Function Placement Optimization Algorithm Based on Improve Deep Reinforcement Learning
Lun TANG, Lanqin HE, Qinyi LIAN, Qi TAN. Virtual Network Function Placement Optimization Algorithm Based on Improve Deep Reinforcement Learning[J]. Journal of Electronics & Information Technology, 2021, 43(6): 1724-1732. doi: 10.11999/JEIT200297
Authors:Lun TANG  Lanqin HE  Qinyi LIAN  Qi TAN
Affiliation:1. School of Communication and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China;;2. Key Laboratory of Mobile Communications Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, China;;3. College of International Communications, China Three Gorges University, Yichang 443002, China
Abstract:Considering the problem of Service Function Chain (SFC) placement optimization caused by the dynamic arrival of network service requests under the Network Function Virtualization/Software Defined Network (NFV/SDN) architecture, a Virtual Network Function (VNF) placement optimization algorithm based on improved deep reinforcement learning is proposed. Firstly, a stochastic optimization model of Markov Decision Process (MDP) is established to jointly optimizes SFC placement cost and delay cost, and is constrained by the delay of SFC, as well as the resources of common server Central Processing Unit (CPU) and physical link bandwidth. Secondly, in the process of VNF placement and resource allocation, there are problems such as too large state space, high dimension of action space, and unknown state transition probability. A VNF intelligent placement algorithm based on deep reinforcement learning is proposed to obtain an approximately optimal VNF placement strategy and resource allocation strategy. Finally, considering the problems of deep reinforcement learning agent's action exploration and utilization through ε greedy strategy, resulting in low learning efficiency and slow convergence speed, a method of action exploration and utilization based on the difference of value function is proposed, and further adopts dual experience playback pool to solve the problem of low utilization of empirical samples. Simulation results show that the algorithm can converge quickly, and it can optimize SFC placement cost and SFC end-to-end delay.
Keywords:Virtual Network Function(VNF)  Deep reinforcement learning  Service Function Chain (SFC) end-to-end delay  Service Function Chain (SFC) placement cost
点击此处可从《电子与信息学报》浏览原始摘要信息
点击此处可从《电子与信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号