首页 | 本学科首页   官方微博 | 高级检索  
     

基于强化学习的无人机基站多播通信系统的飞行路线在线优化
引用本文:张广驰,严雨琳,崔苗,陈伟,张景.基于强化学习的无人机基站多播通信系统的飞行路线在线优化[J].电子与信息学报,2022,44(3):969-975.
作者姓名:张广驰  严雨琳  崔苗  陈伟  张景
作者单位:1.广东工业大学信息工程学院 广州 5100062.广东省环境地质勘查院 广州 5100803.中国电子科学研究院 北京 100043
基金项目:广东特支计划项目;广东省科技计划项目
摘    要:针对无人机(UAV)基站(BS)多播通信系统的通信时延最小化问题,该文提出飞行路线在线优化算法。在该系统中无人机基站向多个地面用户同时发送公共信息,其中每次通信任务中地面用户位置是随机的。为了保证地面用户能够接收完整的公共信息以及考虑到无人机的能量有限性,该文以最小化无人机基站完成通信任务的平均时间为目标。首先将问题转化成一个马尔可夫决策过程(MDP);然后把通信时延引入到动作价值函数中;最后提出使用Q-Learning算法对无人机飞行路线进行学习和在线优化,从而实现平均通信时延最小化。仿真结果显示,与其他基准方案相比,该文所提方案能够有效地为无人机多播通信系统飞行路线实现在线优化,并有效降低通信任务的完成时间。

关 键 词:无人机基站    飞行路线在线优化    强化学习
收稿时间:2021-05-19

Online Trajectory Optimization for the UAV-Enabled Base Station Multicasting System Based on Reinforcement Learning
ZHANG Guangchi,YAN Yulin,CUI Miao,CHEN Wei,ZHANG Jing.Online Trajectory Optimization for the UAV-Enabled Base Station Multicasting System Based on Reinforcement Learning[J].Journal of Electronics & Information Technology,2022,44(3):969-975.
Authors:ZHANG Guangchi  YAN Yulin  CUI Miao  CHEN Wei  ZHANG Jing
Affiliation:1.School of Information Engineering, Guangdong University of Technology, Guangzhou 510006, China2.Institute of Environmental Geology Exploration of Guangdong Province, Guangzhou 510080, China3.China Academic of Electronics and Information Technology, Beijing 100043, China
Abstract:In order to deal with the communication delay problem in an Unmanned Aerial Vehicle (UAV) enabled Base Station (BS) multicasting communication system, the online trajectory design for the UAV BS is investigated. A UAV BS is dispatched to disseminate common information to multiple ground users simultaneously in this system, where the locations of the ground users are random in each multicasting communication task. To ensure that the ground users can receive the complete multicasting information and considering the limited energy of the UAV, this paper focuses on minimizing the average duration for the UAV BS to complete the multicasting task. First, the considered problem is casted as a Markov Decision Process (MDP), and then the communication delay is introduced into the action value function. Finally, an online trajectory optimization algorithm based on the Q-Learning algorithm is proposed to minimize the average duration for the UAV BS to complete the multicasting task. Simulation results show that the proposed algorithm can effectively optimize the trajectory of the UAV BS for its multicasting task in an online manner and can effectively reduce the duration of the multicast task, as compared to other benchmark schemes.
Keywords:
本文献已被 万方数据 等数据库收录!
点击此处可从《电子与信息学报》浏览原始摘要信息
点击此处可从《电子与信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号