多约束复杂环境下UAV航迹规划策略自学习方法 Self-Learning Method of UAV Track Planning Strategy in Complex Environment with Multiple Constraints期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

多约束复杂环境下UAV航迹规划策略自学习方法

引用本文：	邱月,郑柏通,蔡超.多约束复杂环境下UAV航迹规划策略自学习方法[J].计算机工程,2021,47(5):44-51.

作者姓名：	邱月郑柏通蔡超

作者单位：	华中科技大学人工智能与自动化学院多谱信息处理技术国家级重点实验室, 武汉 430074

摘要：	在多约束复杂环境下,多数无人飞行器(UAV)航迹规划方法无法从历史经验中获得先验知识,导致对多变的环境适应性较差.提出一种基于深度强化学习的航迹规划策略自学习方法,利用飞行约束条件设计UAV的状态及动作模式,从搜索宽度和深度2个方面降低航迹规划搜索规模,基于航迹优化目标设计奖惩函数,利用由卷积神经网络引导的蒙特卡洛树搜...
关键词：	深度强化学习蒙特卡洛树搜索航迹规划策略策略自学习多约束复杂环境
收稿时间：	2020-02-25
修稿时间：	2020-04-28
Self-Learning Method of UAV Track Planning Strategy in Complex Environment with Multiple Constraints

QIU Yue,ZHENG Baitong,CAI Chao.Self-Learning Method of UAV Track Planning Strategy in Complex Environment with Multiple Constraints[J].Computer Engineering,2021,47(5):44-51.

Authors:	QIU Yue ZHENG Baitong CAI Chao

Affiliation:	National Key Laboratory for Multi-Spectral Information Processing Technologies, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan 430074, China

Abstract:	In a complex multi-constrained environment,the Unmanned Aerial Vehicle(UAV)track planning methods generally fail to obtain priori knowledge from historical experience,resulting in poor adaptability to a variable environment.To address the problem,this paper proposes a self-learning method for track planning strategy based on deep reinforcement learning.Based on the UAV flight constraints,the design of the UAV state and action modes is optimized to reduce the width and depth of track planning search.The reward and punishment function is designed based on the track optimization objective.Then,a Monte Carlo Tree Search(MCTS)algorithm guided by a convolutional neural network is used to learn the track planning strategy.Simulation results show that the track planning strategy obtained by the proposed self-learning method has generalization ability.Compared with the networks without iterative training,the strategy obtained by this method requires only 17% of the number of NN-MCTS simulation times to guide the UAV to reach the destination safely without collision and satisfy the constraints in an unknown environment.

Keywords:	deep reinforcement learning Monte Carlo Tree Search(MCTS) track planning strategy strategy selflearning multiple constraints complex environment
本文献已被维普万方数据等数据库收录！
	点击此处可从《计算机工程》浏览原始摘要信息
	点击此处可从《计算机工程》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏