基于SAC强化学习的车联网频谱资源动态分配 Dynamic Spectrum Resource Allocation in Internet of Vehicles Based on SAC Reinforcement Learning期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于SAC强化学习的车联网频谱资源动态分配

引用本文：	黄煜梵,彭诺蘅,林艳,范建存,张一晋,余妍秋.基于SAC强化学习的车联网频谱资源动态分配[J].计算机工程,2021,47(9):34-43.

作者姓名：	黄煜梵彭诺蘅林艳范建存张一晋余妍秋

作者单位：	1. 南京理工大学电子工程与光电技术学院, 南京 210094;2. 西安交通大学信息与通信工程学院, 西安 710049

基金项目：	国家自然科学基金（62001225，62071236）；中央高校基本科研业务费专项资金（30920021127，30919011227）；江苏省自然科学青年基金（BK20190454）。

摘要：	针对车联网频谱资源稀缺问题，提出一种基于柔性致动-评价（SAC）强化学习算法的多智能体频谱资源动态分配方案。以最大化信道总容量与载荷成功交付率为目标，建立车辆-车辆（V2V）链路频谱资源分配模型。将每条V2V链路作为单个智能体，构建多智能体马尔科夫决策过程模型。利用SAC强化学习算法设计神经网络，通过最大化熵与累计奖励和以训练智能体，使得V2V链路经过不断学习优化频谱资源分配。仿真结果表明，与基于深度Q网络和深度确定性策略梯度的频谱资源分配方案相比，该方案可以更高效地完成车联网链路之间的频谱共享任务，且信道传输速率和载荷成功交付率更高。
关键词：	车联网资源分配多智能体强化学习柔性致动-评价算法频谱分配
收稿时间：	2020-08-18
修稿时间：	2020-11-08
Dynamic Spectrum Resource Allocation in Internet of Vehicles Based on SAC Reinforcement Learning

HUANG Yufan,PENG Nuoheng,LIN Yan,FAN Jiancun,ZHANG Yijin,YU Yanqiu.Dynamic Spectrum Resource Allocation in Internet of Vehicles Based on SAC Reinforcement Learning[J].Computer Engineering,2021,47(9):34-43.

Authors:	HUANG Yufan PENG Nuoheng LIN Yan FAN Jiancun ZHANG Yijin YU Yanqiu

Affiliation:	1. School of Electronic and Optical Engineering, Nanjing University of Science and Technology, Nanjing 210094, China;2. School of Information and Communications Engineering, Xi'an Jiaotong University, Xi'an 710049, China

Abstract:	To address the scarcity of spectrum resources in Internet of Vehicles(IoV), a novel multi-agent dynamic spectrum allocation solution based on Soft Actor-Critic(SAC) reinforcement learning is proposed.The solution aims to maximize the total channel capacity and the success rate of payload delivery.To achieve this goal, a spectrum resource allocation model consisting of Vehicle-to-Vehicle(V2V) links is constructed.Each V2V link is regarded as an agent to model this problem as a Markov decision process.Then the SAC reinforcement learning algorithm is used to design a neural network.The agents are trained by maximum entropy and cumulative reward, so the V2V links can optimize the allocation of spectrum resources through rounds of learning.Simulation results show that compared with spectrum resource allocation scheme based on Deep Q-Network(DQN) and Deep Deterministic Policy Gradient(DDPG), the proposed scheme can more efficiently implement spectrum sharing between V2V links, and improves the channel transmission rate and the success rate of payload delivery.

Keywords:	Internet of Vehicles(IoV) resource allocation multi-agent reinforcement learning Soft Actor-Critic(SAC) algorithm spectrum allocation
本文献已被万方数据等数据库收录！
	点击此处可从《计算机工程》浏览原始摘要信息
	点击此处可从《计算机工程》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏