首页 | 本学科首页   官方微博 | 高级检索  
     

基于多智能体强化学习的混合博弈模式下多无人机辅助通信系统设计
引用本文:吴官翰, 贾维敏, 赵建伟, 高飞飞, 姚敏立. 基于多智能体强化学习的混合博弈模式下多无人机辅助通信系统设计[J]. 电子与信息学报, 2022, 44(3): 940-950. doi: 10.11999/JEIT210662
作者姓名:吴官翰  贾维敏  赵建伟  高飞飞  姚敏立
作者单位:1.火箭军工程大学 西安 710038;;2.中国酒泉卫星发射中心 酒泉 735000;;3.清华大学 北京 100084
摘    要:空天地一体化通信作为未来6G的发展方向,很好地弥补了当前无线通信覆盖不足的弊端。该文提出一种基于多智能体强化学习(MARL)的多无人机(Multi-UAV)辅助通信算法,在用户与无人机(UAVs)构成的混合博弈模式下求解纳什均衡近似解,解决了动态环境下UAVs轨迹设计、多维资源调度以及用户接入策略联合优化问题。结合马尔可夫博弈概念建模该连续决策过程,以集中训练分布式执行(CTDE)机制,将近端策略优化(PPO)算法扩展到多智能体领域。针对离散与连续共存的动作空间设计了两种策略输出模式,并且结合Beta策略改进实现,最后通过仿真实验验证了算法的有效性。

关 键 词:多无人机辅助通信   多智能体强化学习   混合博弈   纳什均衡
收稿时间:2021-07-02
修稿时间:2021-09-06

MARL-based Design of Multi-Unmanned Aerial Vehicle Assisted Communication System with Hybrid Gaming Mode
WU Guanhan, JIA Weimin, ZHAO Jianwei, GAO Feifei, YAO Minli. MARL-based Design of Multi-Unmanned Aerial Vehicle Assisted Communication System with Hybrid Gaming Mode[J]. Journal of Electronics & Information Technology, 2022, 44(3): 940-950. doi: 10.11999/JEIT210662
Authors:WU Guanhan  JIA Weimin  ZHAO Jianwei  GAO Feifei  YAO Minli
Affiliation:1. Rocket Force University of Engineering, Xi’an 710038, China;;2. Jiuquan Satellite Launch Center, Jiuquan 735000, China;;3. Tsinghua University, Beijing 100084, China
Abstract:As the future development direction of 6G, integrated space-air-ground communication well compensates for the drawback of insufficient current wireless communication coverage. In this paper, a Multi-Unmanned Aerial Vehicle (Multi-UAV) assisted communication algorithm with Multi-Agent Reinforcement Learning (MARL) is proposed to solve the Nash equilibrium approximate solution in a hybrid game model composed of users and UAVs and solve the joint optimization problem of UAV trajectory design, multidimensional resource scheduling and user access strategy in dynamic environment. The Markov game concept is exploited to model this continuous decision process with a Centralized Training Distributed Execution (CTDE) mechanism, and the Proximal Policy Optimization (PPO) algorithm is extended to the multi-agent domain. Two policy output modes are designed for the action space, where both the discrete and continuous actions coexist. Then, the implementation is improved by combining Beta policy. Finally, the effectiveness of the algorithm is verified by simulation experiments.
Keywords:Multi-Unmanned Aerial Vehicle (Multi-UAV) assisted communications  Multi-Agent Reinforcement Learning (MARL)  Hybrid game  Nash equilibrium
本文献已被 万方数据 等数据库收录!
点击此处可从《电子与信息学报》浏览原始摘要信息
点击此处可从《电子与信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号