基于自然梯度Actor-Critic强化学习的卫星边缘网络服务功能链部署方法 A Satellite Edge Network Service Function Chain Deployment Method Based on Natural Gradient Actor-Critic Reinforcement Learning期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于自然梯度Actor-Critic强化学习的卫星边缘网络服务功能链部署方法

引用本文：	高媛,方海,赵扬,杨旭.基于自然梯度Actor-Critic强化学习的卫星边缘网络服务功能链部署方法[J].电子与信息学报,2023,45(2):455-463.

作者姓名：	高媛方海赵扬杨旭

作者单位：	西安空间无线电技术研究所西安 710100

基金项目：	国家重点研发计划(2020YFB1808003)

摘要：	鉴于低轨卫星网络的高动态性和空间环境的复杂性，如何提供在线的快速服务功能链(SFC)部署方法，成为低轨卫星边缘网络中亟待解决的问题。综合考虑节点和链路容量等约束以及服务迁移等切换代价，针对部署多接入边缘计算(MEC)服务器的低轨卫星网络，该文提出一种基于自然梯度参与者-评价者(Actor-Critic)强化学习架构的SFC在线部署方法。首先，针对低轨卫星网络的环境高动态性，对实时容量约束和迁移代价进行建模；其次，引入马尔可夫决策过程(MDP)，综合考虑服务迁移和卫星坐标等因素，描述低轨卫星网络的状态转移过程；最后，提出一种基于自然梯度的在线SFC部署强化学习方法，不同于标准梯度，自然梯度法进行模型层面的更新，以避免神经网络的训练陷入局部最优解。仿真结果表明，该文方法可逼近全局最优解，并在端到端时延性能上优于基于标准梯度的强化学习部署方法。
关键词：	服务功能链强化学习低轨卫星网络服务迁移
收稿时间：	2021-11-30
A Satellite Edge Network Service Function Chain Deployment Method Based on Natural Gradient Actor-Critic Reinforcement Learning

GAO Yuan,FANG Hai,ZHAO Yang,YANG Xu.A Satellite Edge Network Service Function Chain Deployment Method Based on Natural Gradient Actor-Critic Reinforcement Learning[J].Journal of Electronics & Information Technology,2023,45(2):455-463.

Authors:	GAO Yuan FANG Hai ZHAO Yang YANG Xu

Affiliation:	Xi’an Institute of Space Radio Technology, Xi’an 710100, China

Abstract:	In view of the high dynamics in low-orbit satellite networks and complexity of space environment, the online provisioning of Service Function Chain (SFC) has become the key problem in satellite edge networks. Considering constraints in node and link capacity and switching costs in service migration, an online SFC deployment method based on natural gradient actor-critic reinforcement learning is proposed for low-orbit satellites equipped with Multi-access Edge Computing (MEC) servers. Firstly, the real-time capacity constraints and migration costs are formulated following the high environmental dynamics in low-orbit satellite networks, respectively. Secondly, involving the migration costs and satellite coordinates, Markov Decision Process (MDP) is introduced to describe the state transition in low-orbit satellite networks. Finally, a natural gradient method-based online SFC deployment method is proposed, which facilitates the training of neural network escaping from the local optimum as compared to the standard gradient. Simulation results show that proposed method could asymptotically approach the global optimum, and exceeds existing ones based on the standard gradient in terms of end-to-end delay.

Keywords:

	点击此处可从《电子与信息学报》浏览原始摘要信息
	点击此处可从《电子与信息学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏