首页 | 本学科首页   官方微博 | 高级检索  
     

基于深度强化学习的巡飞弹突防控制决策
引用本文:高昂,董志明,叶红兵,宋敬华,郭齐胜.基于深度强化学习的巡飞弹突防控制决策[J].兵工学报,2021,42(5):1101-1110.
作者姓名:高昂  董志明  叶红兵  宋敬华  郭齐胜
作者单位:(1.陆军装甲兵学院 演训中心, 北京 100072; 2.湘南学院, 湖南 郴州 423099)
基金项目:军队科研计划项目(41405030302、41401020301)
摘    要:巡飞弹突防控制决策(LMPCD)问题是“多域战”作战概念背景下的重要研究方向。针对该问题,建立基于马尔可夫决策过程的LMPCD模型。拟合LMPCD函数与飞行状态-动作值函数,构建基于演员-评论家方法的LMPCD框架,给出基于深度确定性策略梯度算法的深度强化学习模型求解方法,生成巡飞弹突防控制最优决策网络。通过1 000次巡飞弹突防仿真测试,结果表明,巡飞弹执行任务成功率为82.1%,平均决策时间为1.48 ms,验证了LMPCD模型及其求解过程的有效性。

关 键 词:巡飞弹  深度强化学习  马尔可夫决策过程  突防  控制决策  

Loitering Munition Penetration Control Decision Based on Deep Reinforcement Learning
GAO Ang,DONG Zhiming,YE Hongbing,SONG Jinghua,GUO Qisheng.Loitering Munition Penetration Control Decision Based on Deep Reinforcement Learning[J].Acta Armamentarii,2021,42(5):1101-1110.
Authors:GAO Ang  DONG Zhiming  YE Hongbing  SONG Jinghua  GUO Qisheng
Affiliation:(1.Military Exercise and Training Center, Army Academy of Armored Forces, Beijing 100072, China; 2.Xiangnan University, Chenzhou 423099, Hunan, China)
Abstract:Loitering munition penetration control decision (LMPCD) is an important research direction under the concept of “multi-domain war”. The research on real-time route planning of loitering munition penetration has important military significance. Traditional knowledge, reasoning, and planning methods do not have the ability to explore and discover new knowledge outside the framework. The bionic optimization method is suitable for solving the path planning problem in static environment, such as traveling salesman problem, and is difficult to be applied to the penetration problem of loitering munition with high requirement of environmental dynamics and real-time decision-making. For the limitations of the first two methods, the applicability of the deep reinforcement learning method is analyzed, and the domain knowledge of loitering munition is introduced into each element of the deep reinforcement learning algorithm. The flight motion model of loitering munition is analyzed, the state space, action space and reward function of loitering munition are designed, the algorithm framework of loitering munition penetration control decision is analyzed, and the training process of loitering munition penetration control decision algorithm is designed. Through the penetration simulation test of 1 000 rounds of loitering munition, the result shows that the penetration success rate of loitering munition is 82.1% and the average decision time is 1.48 ms, which verifies the effectiveness of the algorithm training process and the control decision model.
Keywords:loiteringmunition  deepreinforcementlearning  Markovdecisionprocess  penetration  controldecision  
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《兵工学报》浏览原始摘要信息
点击此处可从《兵工学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号