首页 | 本学科首页   官方微博 | 高级检索  
     

基于BN-DDPG轻量级强化学习算法的智能兵棋推演
引用本文:李卓远,张德平. 基于BN-DDPG轻量级强化学习算法的智能兵棋推演[J]. 计算机系统应用, 2023, 32(4): 293-299
作者姓名:李卓远  张德平
作者单位:南京航空航天大学 计算机科学与技术学院, 南京 211106
基金项目:国防基础科研基金(JCKY2020605C003)
摘    要:兵棋推演与智能算法融合成为当前军事应用领域的研究热点,利用深度强化学习技术实现仿真推演中决策过程的智能化,可显著减少人为经验对决策过程的影响,提高推演效率和灵活性.现有基于DRL算法的决策模型,其训练时间过长,算力开销过大,无法满足作战任务的实时性需求.本文提出一种基于轻量级深度确定性策略梯度(BN-DDPG)算法的智能推演方法,根据推演规则,采用马尔可夫决策过程描述推演过程中的决策行为,以actorcritic体系为基础,构建智能体训练网络,其中actor网络使用自定义混合二进制神经网络,减少计算量;同时根据经验样本的状态和回报值建立双缓冲池结构,采用环境相似度优先提取的方法对样本进行采样,提高训练效率;最后基于自主研制的仿真推演平台进行实例验证.结果表明, BN-DDPG算法可简化模型训练过程,加快模型收敛速度,显著提高推演决策的准确性.

关 键 词:智能推演  深度强化学习  二值神经网络  自主决策
收稿时间:2022-08-25
修稿时间:2022-09-27

Intelligent Wargame Deduction Based on BN-DDPG Lightweight Reinforcement Learning Algorithm
LI Zhuo-Yuan,ZHANG De-Ping. Intelligent Wargame Deduction Based on BN-DDPG Lightweight Reinforcement Learning Algorithm[J]. Computer Systems& Applications, 2023, 32(4): 293-299
Authors:LI Zhuo-Yuan  ZHANG De-Ping
Affiliation:College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
Abstract:The integration of wargaming and an intelligent algorithm has become a research hotspot in the field of military application. Using deep reinforcement learning (DRL) to realize the intellectualized decision-making process in simulation deduction can significantly reduce the impact of human experience on the decision-making process and improve deduction efficiency and flexibility. Limited by its long training time and high computational cost, the existing decision-making model based on the DRL algorithm cannot meet the requirement of combat tasks for real-time performance. This study introduces an intelligent deduction method based on the lightweight binary neural network-deep deterministic policy gradient (BN-DDPG) algorithm. According to deduction rules, the Markov decision process is used to describe the decision behavior during deduction. Relying on the actor-critic system, an agent training network is constructed, in which the actor network uses a custom hybrid binary neural network to reduce the amount of calculation. At the same time, a double-buffer-pool structure is built according to the status and return value of empirical samples, and sampling is performed by the method of priority extraction of environmental similarity for higher training efficiency. Finally, an example is verified on a self-developed simulation deduction platform. The results show that the BN-DDPG algorithm can simplify the model training process, accelerate the convergence of the model, and significantly improve the accuracy of deduction and decision-making.
Keywords:intelligent deduction  deep reinforcement learning  binary neural network  autonomous decision-making
点击此处可从《计算机系统应用》浏览原始摘要信息
点击此处可从《计算机系统应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号