基于深度强化学习的服务功能链多维资源优化 Multi-dimensional Resource Optimization of Service Function Chain Based on Deep Reinforcement Learning期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于深度强化学习的服务功能链多维资源优化

引用本文：	王晓,唐伦,贺小雨,陈前斌.基于深度强化学习的服务功能链多维资源优化[J].计算机工程与应用,2021,57(4):68-76.

作者姓名：	王晓唐伦贺小雨陈前斌

作者单位：	1.重庆邮电大学通信与信息工程学院，重庆 400065 2.重庆邮电大学移动通信技术重点实验室，重庆 400065

基金项目：	国家自然科学基金;重庆市教委科学技术研究项目

摘要：	在网络功能虚拟化（Network Function Virtualization，NFV）环境下，保证用户服务功能链（Service Function Chain，SFC）服务质量的同时节约资源消耗，降低运营成本，对运营商来说至关重要。联合考虑SFC部署和无线接入网资源分配，提出一种基于深度强化学习的SFC多维资源联合分配算法。构建一种基于环境感知的SFC资源分配机制，建立用户时延要求、无线速率需求以及资源容量等约束下的SFC部署成本最小化模型。考虑到无线环境的动态变化，将此优化问题转化为一个无模型离散时间马尔科夫决策过程（Markov Decision Process，MDP）模型。由于该MDP状态空间的连续性和动作空间的高维性，采用深度确定性策略梯度（Deep Deterministic Policy Gradient，DDPG）强化学习算法进行求解，得到最小化部署成本的资源分配策略。仿真结果表明，该算法可在满足性能需求及资源容量等约束的同时，有效降低SFC部署成本和端到端传输时延。
关键词：	网络功能虚拟化服务功能链部署无线资源分配强化学习深度确定性策略梯度算法
Multi-dimensional Resource Optimization of Service Function Chain Based on Deep Reinforcement Learning

WANG Xiao,TANG Lun,HE Xiaoyu,CHEN Qianbin.Multi-dimensional Resource Optimization of Service Function Chain Based on Deep Reinforcement Learning[J].Computer Engineering and Applications,2021,57(4):68-76.

Authors:	WANG Xiao TANG Lun HE Xiaoyu CHEN Qianbin

Affiliation:	1.School of Communication and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China 2.Key Laboratory of Mobile Communication, Chongqing University of Posts and Telecommunications, Chongqing 400065, China

Abstract:	In the Network Function Virtualization（NFV） environment, it is important for operators to save resource consumption and reduce operating costs while ensuring the service quality of the users’ Service Function Chain（SFC）. This paper jointly considers SFC deployment and radio access network resource allocation, and proposes an SFC multi-dimensional resource allocation algorithm based on deep reinforcement learning. Firstly, an SFC resource allocation mechanism based on environment awareness is built, and an SFC deployment cost minimization model is established with the constraints of user delay requirements, wireless rate requirements and resource capacity. Secondly, considering the dynamics of the wireless environment, this optimization problem is transformed into a model-free discrete-time Markov Decision Process（MDP） model. Due to the continuity of the MDP’s state space and the high dimensionality of the action space, a Deep Deterministic Policy Gradient（DDPG） reinforcement learning algorithm is leveraged to solve the problem, accordingly a resource allocation strategy that minimizes the deployment cost is obtained. Simulation results show that the algorithm can effectively reduce the SFC deployment cost and end-to-end transmission delay while satisfying the constraints of performance requirements and resource capacity.

Keywords:	network function virtualization service function chain deployment radio resource allocation reinforcement learning deep deterministic policy gradient
本文献已被万方数据等数据库收录！
	点击此处可从《计算机工程与应用》浏览原始摘要信息
	点击此处可从《计算机工程与应用》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏