首页 | 本学科首页   官方微博 | 高级检索  
     

一种新的基于强化学习改进SAR的无人机路径规划
引用本文:周文娟,张超群,汤卫东,易云恒,刘文武,秦唯栋.一种新的基于强化学习改进SAR的无人机路径规划[J].控制与决策,2024,39(4):1203-1211.
作者姓名:周文娟  张超群  汤卫东  易云恒  刘文武  秦唯栋
作者单位:广西民族大学 人工智能学院,南宁 530006;广西民族大学 人工智能学院,南宁 530006;广西混杂计算与集成电路设计分析重点实验室,南宁 530006
基金项目:国家自然科学基金项目(62062011);广西民族大学研究生创新计划项目(gxun-chxs2021057).
摘    要:搜索和救援优化算法(SAR)是2020年提出的模拟搜救行为的一种元启发式优化算法,用来解决工程中的约束优化问题.但是, SAR存在收敛慢、个体不能自适应选择操作等问题,鉴于此,提出一种新的基于强化学习改进的SAR算法(即RLSAR).该算法重新设计SAR的局部搜索和全局搜索操作,并增加路径调整操作,采用异步优势演员评论家算法(A3C)训练强化学习模型使得SAR个体获得自适应选择算子的能力.所有智能体在威胁区数量、位置和大小均随机生成的动态环境中训练,进而从每个动作的贡献、不同威胁区下规划出的路径长度和每个个体的执行操作序列3个方面对训练好的模型进行探索性实验.实验结果表明, RLSAR比标准SAR、差分进化算法、松鼠搜索算法具有更高的收敛速度,能够在随机生成的三维动态环境中成功地为无人机规划出更加经济且安全有效的可行路径,表明所提出算法可作为一种有效的无人机路径规划方法.

关 键 词:强化学习  搜索与救援优化算法  异步优势演员-评论家算法  路径规划  路径调整  无人机

A novel modified search and rescue optimization algorithm based on reinforcement learning for UAV path planning
ZHOU Wen-juan,ZHANG Chao-qun,TANG Wei-dong,YI Yun-heng,LIU Wen-wu,QIN Wei-dong.A novel modified search and rescue optimization algorithm based on reinforcement learning for UAV path planning[J].Control and Decision,2024,39(4):1203-1211.
Authors:ZHOU Wen-juan  ZHANG Chao-qun  TANG Wei-dong  YI Yun-heng  LIU Wen-wu  QIN Wei-dong
Affiliation:College of Artificial Intelligence,Guangxi Minzu University,Nanning 530006,China;College of Artificial Intelligence,Guangxi Minzu University,Nanning 530006,China;Guangxi Key Laboratory of Hybrid Computation and IC Design Analysis,Nanning 530006,China
Abstract:The search and rescue optimization algorithm(SAR) proposed in 2020 is a meta-heuristic optimization algorithm. It simulates the search and rescue behavior, which is used to solve constrained engineering optimization problems. However, the SAR has slow convergence and its individuals can not adaptively select operations. A modifed version of the SAR based on reinforcement learning, namely RLSAR, is proposed, which redesigns the local search and global search of the SAR, and adds path adjustment operation. The asynchronous advanced actor critic algorithm(A3C) is used to train the reinforcement learning model so that the SAR individuals acquire the ability to adaptively select operators. All agents are trained in a dynamic environment in which the number, location and size of threat areas are randomly generated, and then exploratory experiments are conducted on the trained model from three aspects: The contribution of each action, the path length planned under different threat areas, and the execution sequence of each individual. The results show that the RLSAR has higher convergence speed than the standard SAR, the differential evolution algorithm and the squirrel search algorithm. Furthermore, it can successfully plan a more economical, safe and effective feasible path for an unmanned aerial vehicle(UAV) in a randomly generated three-dimensional dynamic environment, which shows that the proposed algorithm can serve as an effective path planning method for UAVs.
Keywords:
点击此处可从《控制与决策》浏览原始摘要信息
点击此处可从《控制与决策》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号