首页 | 本学科首页   官方微博 | 高级检索  
     

深度强化学习的无人作战飞机空战机动决策
引用本文:李永丰,史静平,章卫国,蒋维.深度强化学习的无人作战飞机空战机动决策[J].哈尔滨工业大学学报,2021,53(12):33-41.
作者姓名:李永丰  史静平  章卫国  蒋维
作者单位:西北工业大学 自动化学院,西安710029;西北工业大学 自动化学院,西安710029;陕西省飞行控制与仿真技术重点实验室(西北工业大学),西安710029
基金项目:国家自然科学基金(7,6,61573286);陕西省自然科学基金(2019JM-3,0JQ-218)
摘    要:无人作战飞机(unmanned combat aerial vehicle,UCAV)在进行空战自主机动决策时,面临大规模计算,易受敌方不确定性操纵的影响。针对这一问题,提出了一种基于深度强化学习算法的无人作战飞机空战自主机动决策模型。利用该算法,无人作战飞机可以在空战中自主地进行机动决策以获得优势地位。首先,基于飞机控制系统,利用MATLAB/Simulink仿真平台搭建了六自由度无人作战飞机模型,选取适当的空战动作作为机动输出。在此基础上,设计了无人作战飞机空战自主机动的决策模型,通过敌我双方的相对运动构建作战评估模型,分析了导弹攻击区的范围,将相应的优势函数作为深度强化学习的评判依据。之后,对无人作战飞机进行了由易到难的分阶段训练,并通过对深度Q网络的研究分析了最优机动控制指令。从而无人作战飞机可以在不同的态势情况下选择相应的机动动作,独立评估战场态势,做出战术决策,以达到提高作战效能的目的。仿真结果表明,该方法能使无人作战飞机在空战中自主的选择战术动作,快速达到优势地位,极大地提高了无人作战飞机的作战效率。

关 键 词:无人作战飞机  深度强化学习  空战自主机动决策  六自由度  优势函数  深度Q网络
收稿时间:2020/5/22 0:00:00

Maneuver decision of UCAV in air combat based on deep reinforcement learning
LI Yongfeng,SHI Jingping,ZHANG Weiguo,JIANG Wei.Maneuver decision of UCAV in air combat based on deep reinforcement learning[J].Journal of Harbin Institute of Technology,2021,53(12):33-41.
Authors:LI Yongfeng  SHI Jingping  ZHANG Weiguo  JIANG Wei
Abstract:When an unmanned combat aerial vehicle (UCAV) is making the decision of autonomous maneuver in air combat, it faces large-scale calculation and is susceptible to the uncertain manipulation of the enemy. To tackle such problems, a decision-making model for autonomous maneuver of UCAV in air combat was proposed based on deep reinforcement learning algorithm in this study. With this algorithm, the UCAV can autonomously make maneuver decisions during air combat to achieve dominant position. First, based on the aircraft control system, a six-degree-of-freedom UCAV model was built using MATLAB/Simulink simulation platform, and the appropriate air combat action was selected as the maneuver output. On this basis, the decision-making model for the autonomous maneuver of UCAV in air combat was designed. Through the relative movement of both sides, the operational evaluation model was constructed. The range of the missile attack area was analyzed, and the corresponding advantage function was taken as the evaluation basis of the deep reinforcement learning. Then, the UCAV was trained by stages from the easy to the difficult, and the optimal maneuver control command was analyzed by investigating the deep Q network. Thereby, the UCAV could select corresponding maneuver actions in different situations and evaluate the battlefield situation independently, making tactical decisions and achieving the purpose of improving combat effectiveness. Simulation results suggest that the proposed method can make UCAV choose the tactical action independently in air combat and reach the dominant position quickly, which greatly improves the combat efficiency of the UCAV.
Keywords:unmanned combat aerial vehicle (UCAV)  deep reinforcement learning  autonomous maneuver decision in air combat  six-degree-of-freedom  advantage function  deep Q network
本文献已被 万方数据 等数据库收录!
点击此处可从《哈尔滨工业大学学报》浏览原始摘要信息
点击此处可从《哈尔滨工业大学学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号