首页 | 本学科首页   官方微博 | 高级检索  
     

基于自回归预测模型的深度注意力强化学习方法
引用本文:梁星星,冯旸赫,黄金才,王琦,马扬,刘忠.基于自回归预测模型的深度注意力强化学习方法[J].软件学报,2020,31(4):948-966.
作者姓名:梁星星  冯旸赫  黄金才  王琦  马扬  刘忠
作者单位:国防科技大学系统工程学院,湖南长沙410072;国防科技大学系统工程学院,湖南长沙410072;国防科技大学系统工程学院,湖南长沙410072;国防科技大学系统工程学院,湖南长沙410072;国防科技大学系统工程学院,湖南长沙410072;国防科技大学系统工程学院,湖南长沙410072
基金项目:国家自然科学基金(71701205);
摘    要:近年来,深度强化学习在各种决策、规划问题中展示了强大的智能性和良好的普适性,出现了诸如AlphaGo、OpenAI Five、Alpha Star等成功案例.然而,传统深度强化学习对计算资源的重度依赖及低效的数据利用率严重限制了其在复杂现实任务中的应用.传统的基于模型的强化学习算法通过学习环境的潜在动态性,可充分利用样本信息,有效提升数据利用率,加快模型训练速度,但如何快速建立准确的环境模型是基于模型的强化学习面临的难题.结合基于模型和无模型两类强化学习的优势,提出了一种基于时序自回归预测模型的深度注意力强化学习方法.利用自编码模型压缩表示潜在状态空间,结合自回归模型建立环境预测模型,基于注意力机制结合预测模型估计每个决策状态的值函数,通过端到端的方式统一训练各算法模块,实现高效的训练.通过CartPole-V0等经典控制任务的实验结果表明,该模型能够高效地建立环境预测模型,并有效结合基于模型和无模型两类强化学习方法,实现样本的高效利用.最后,针对导弹突防智能规划问题进行了算法实证研究,应用结果表明,采用所提出的学习模型可在特定场景取得优于传统突防规划的效果.

关 键 词:注意力机制  深度强化学习  actor-critic算法  变分自动编码  混合密度网络-循环神经网络
收稿时间:2019/5/31 0:00:00
修稿时间:2019/7/29 0:00:00

Novel Deep Reinforcement Learning Algorithm Based on Attention-based Value Function and Autoregressive Environment Model
LIANG Xing-Xing,FENG Yang-He,HUANG Jin-Cai,WANG Qi,MA Yang,LIU Zhong.Novel Deep Reinforcement Learning Algorithm Based on Attention-based Value Function and Autoregressive Environment Model[J].Journal of Software,2020,31(4):948-966.
Authors:LIANG Xing-Xing  FENG Yang-He  HUANG Jin-Cai  WANG Qi  MA Yang  LIU Zhong
Affiliation:College of Systems Engineering, National University of Defense Technology, Changsha 410072, China,College of Systems Engineering, National University of Defense Technology, Changsha 410072, China,College of Systems Engineering, National University of Defense Technology, Changsha 410072, China,College of Systems Engineering, National University of Defense Technology, Changsha 410072, China,College of Systems Engineering, National University of Defense Technology, Changsha 410072, China and College of Systems Engineering, National University of Defense Technology, Changsha 410072, China
Abstract:Recently deep reinforcement learning (DRL) is believed to be promising in continuous decision-making and intelligent scheduling problems, and some examples such as AlphaGo, OpenAI Five and Alpha Star have demonstrated the great generalization capability of the paradigm. However, the inefficient utility of collected experience dataset in DRL restricts the universal extension to more practical scenarios and complicated tasks. As the auxiliary, the model-based reinforcement learning can well capture the dynamics of environment and bring the reduction in experience sampling. In this paper, we aggregate the model-based and model-free reinforcement learning algorithms to formulate an end-to-end framework, where the autoregressive environment model is constructed, and attention layer is incorporated to forecast state value function. Experiments on classical CartPole-V0 and so on witness the effectiveness of proposed framework in simulating environment and advancing utility of dataset. Finally, penetration mission as the practical instantiation is successfully completed with our framework.
Keywords:attention mechanism  deep reinforcement learning  actor-critic algorithm  variational auto-encoder(VAE)  mixture density network-recurrent neural network(MDN-RNN)
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号