首页 | 本学科首页   官方微博 | 高级检索  
     

基于K-臂赌博机的多无人机空地网络动态资源分配方法
引用本文:马楠,许魁,夏晓晨,谢威,徐键卉,申麦英.基于K-臂赌博机的多无人机空地网络动态资源分配方法[J].电子与信息学报,2022,44(9):3117-3125.
作者姓名:马楠  许魁  夏晓晨  谢威  徐键卉  申麦英
作者单位:陆军工程大学 南京 210007
基金项目:国家自然科学基金(62071485, 61901519, 61771486, 62001513), 江苏省基础研究计划(BK20192002),江苏省自然科学基金(BK20201334, BK20181335, BK20200579)
摘    要:针对配置大规模MIMO的多无人机空地网络中的动态资源分配问题,从最大化系统吞吐量的角度出发,该文提出一种基于K-臂赌博机的强化学习算法联合优化多个无人机的用户选择与功率分配策略。首先根据地理位置对用户进行分簇,利用簇中心节点规划无人机飞行路径;其次在不考虑无人机之间端到端通信的情况下,将多无人机资源分配问题转化为相互独立的多个智能体强化学习问题;最后提出分幕式多智能体多状态K-臂赌博机算法来实现用户选择与功率分配的联合优化。通过将无人机每个时刻的位置索引定义为状态空间,从而使得无人机可动态适配自身位置及信道的动态变化。仿真结果表明,所提方案可根据环境状态变化自主智能调整资源分配策略,相比于已有方案能有效提升系统总吞吐量。

关 键 词:无人机空地网络    动态资源分配    多智能体强化学习    K-臂赌博机    大规模MIMO
收稿时间:2021-08-25

Dynamic Resource Allocation Based on K-armed Bandit for Multi-UAV Air-Ground Network
MA Nan,XU Kui,XIA Xiaochen,XIE Wei,XU Jianhui,SHEN Maiying.Dynamic Resource Allocation Based on K-armed Bandit for Multi-UAV Air-Ground Network[J].Journal of Electronics & Information Technology,2022,44(9):3117-3125.
Authors:MA Nan  XU Kui  XIA Xiaochen  XIE Wei  XU Jianhui  SHEN Maiying
Affiliation:The Army Engineering University of PLA, Nanjing 210007, China
Abstract:In view of the problem of resource allocation in the Unmanned Aerial Vehicle (UAV) enabled air-ground network with massive MIMO, a K-armed bandit-based reinforcement learning algorithm is proposed to jointly optimize the user selection and power allocation to maximize the total throughput of ground users. Firstly, users are clustered according to their geographic location, and the cluster center nodes are used to plan the trajectory of UAVs. Secondly, without considering the UAV-UAV communication links, the problem of multi-UAV resource allocation is transformed into a mutually independent multi-agent reinforcement learning problem. Finally, an episode-based K-armed bandit algorithm with multi-agent and multi-state is proposed to realize the joint optimization of user selection and power allocation, so that the UAV can dynamically adapt to the changes of its position and channel state by defining the position index of the UAV as the state space. Simulation results verify that the proposed algorithm can adaptively adjust the resource allocation strategy according to the channel conditions, which can effectively improve the total system throughput compared with the existing schemes.
Keywords:
点击此处可从《电子与信息学报》浏览原始摘要信息
点击此处可从《电子与信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号