首页 | 本学科首页   官方微博 | 高级检索  
     

多约束复杂环境下UAV航迹规划策略自学习方法
引用本文:邱月,郑柏通,蔡超.多约束复杂环境下UAV航迹规划策略自学习方法[J].计算机工程,2021,47(5):44-51.
作者姓名:邱月  郑柏通  蔡超
作者单位:华中科技大学 人工智能与自动化学院 多谱信息处理技术国家级重点实验室, 武汉 430074
摘    要:在多约束复杂环境下,多数无人飞行器(UAV)航迹规划方法无法从历史经验中获得先验知识,导致对多变的环境适应性较差.提出一种基于深度强化学习的航迹规划策略自学习方法,利用飞行约束条件设计UAV的状态及动作模式,从搜索宽度和深度2个方面降低航迹规划搜索规模,基于航迹优化目标设计奖惩函数,利用由卷积神经网络引导的蒙特卡洛树搜...

关 键 词:深度强化学习  蒙特卡洛树搜索  航迹规划策略  策略自学习  多约束  复杂环境
收稿时间:2020-02-25
修稿时间:2020-04-28

Self-Learning Method of UAV Track Planning Strategy in Complex Environment with Multiple Constraints
QIU Yue,ZHENG Baitong,CAI Chao.Self-Learning Method of UAV Track Planning Strategy in Complex Environment with Multiple Constraints[J].Computer Engineering,2021,47(5):44-51.
Authors:QIU Yue  ZHENG Baitong  CAI Chao
Affiliation:National Key Laboratory for Multi-Spectral Information Processing Technologies, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan 430074, China
Abstract:In a complex multi-constrained environment,the Unmanned Aerial Vehicle(UAV)track planning methods generally fail to obtain priori knowledge from historical experience,resulting in poor adaptability to a variable environment.To address the problem,this paper proposes a self-learning method for track planning strategy based on deep reinforcement learning.Based on the UAV flight constraints,the design of the UAV state and action modes is optimized to reduce the width and depth of track planning search.The reward and punishment function is designed based on the track optimization objective.Then,a Monte Carlo Tree Search(MCTS)algorithm guided by a convolutional neural network is used to learn the track planning strategy.Simulation results show that the track planning strategy obtained by the proposed self-learning method has generalization ability.Compared with the networks without iterative training,the strategy obtained by this method requires only 17% of the number of NN-MCTS simulation times to guide the UAV to reach the destination safely without collision and satisfy the constraints in an unknown environment.
Keywords:deep reinforcement learning  Monte Carlo Tree Search(MCTS)  track planning strategy  strategy selflearning  multiple constraints  complex environment
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《计算机工程》浏览原始摘要信息
点击此处可从《计算机工程》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号