首页 | 本学科首页   官方微博 | 高级检索  
     

双Q网络学习的迁移强化学习算法
引用本文:曾睿,周建,刘满禄,张俊俊,陈卓. 双Q网络学习的迁移强化学习算法[J]. 计算机应用研究, 2021, 38(6): 1699-1703. DOI: 10.19734/j.issn.1001-3695.2020.09.0232
作者姓名:曾睿  周建  刘满禄  张俊俊  陈卓
作者单位:西南科技大学 制造科学与工程学院,四川 绵阳621000;西南科技大学 特殊环境机器人技术四川省重点实验室,四川 绵阳621000;西南科技大学 特殊环境机器人技术四川省重点实验室,四川 绵阳621000;西南科技大学 信息工程学院,四川 绵阳621000
基金项目:国家“十三五”核能开发项目(20161295);国家科技重大专项资助项目(2019ZX06002022)
摘    要:深度强化学习在训练过程中会探索大量环境样本,造成算法收敛时间过长,而重用或传输来自先前任务(源任务)学习的知识,对算法在新任务(目标任务)的学习具有提高算法收敛速度的潜力.为了提高算法学习效率,提出一种双Q网络学习的迁移强化学习算法,其基于actor-critic框架迁移源任务最优值函数的知识,使目标任务中值函数网络对策略作出更准确的评价,引导策略快速向最优策略方向更新.将该算法用于Open AI Gym以及在三维空间机械臂到达目标物位置的实验中,相比于常规深度强化学习算法取得了更好的效果,实验证明提出的双Q网络学习的迁移强化学习算法具有较快的收敛速度,并且在训练过程中算法探索更加稳定.

关 键 词:深度强化学习  双Q网络学习  actor-critic框架  迁移学习
收稿时间:2020-09-06
修稿时间:2021-05-09

Transfer reinforcement learning algorithm with double Q-learning
zengrui,zhoujian,liumanlu,zhangjunjun and chenzhuo. Transfer reinforcement learning algorithm with double Q-learning[J]. Application Research of Computers, 2021, 38(6): 1699-1703. DOI: 10.19734/j.issn.1001-3695.2020.09.0232
Authors:zengrui  zhoujian  liumanlu  zhangjunjun  chenzhuo
Affiliation:Southwest University of Science and Technology,,,,
Abstract:Deep reinforcement learning explore a large number of environmental samples during the training process, which will cause the algorithm to take too long to converge. Reuse or transfer the knowledge of the previous task(source task), which has the potential to improve the convergence speed for the learning of the algorithm in the new task(target task). In order to improve the efficiency of algorithm learning, this paper proposed transfer reinforcement learning algorithm with double Q-learning. The algorithm based on the actor-critic framework utilized the knowledge of the optimal value function of the source task, so that the value function network of the target task made a more accurate evaluation of the strategy, and guided the stra-tegy to quickly update in the direction of the optimal strategy. In Open AI Gym and the experiments where manipulator reaches the target position in the three-dimensional space, this algorithm achieved better results than conventional deep reinforcement learning algorithms. Experiments show that transfer reinforcement learning algorithm with double Q-learning has faster convergence speed, and the algorithm exploration is more stable during the training process.
Keywords:deep reinforcement learning(DRL)   double Q-learning   actor-critic framework   transfer learning(TL)
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号