双Q网络学习的迁移强化学习算法 Transfer reinforcement learning algorithm with double Q-learning期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

双Q网络学习的迁移强化学习算法

引用本文：	曾睿,周建,刘满禄,张俊俊,陈卓. 双Q网络学习的迁移强化学习算法[J]. 计算机应用研究, 2021, 38(6): 1699-1703. DOI: 10.19734/j.issn.1001-3695.2020.09.0232

作者姓名：	曾睿周建刘满禄张俊俊陈卓

作者单位：	西南科技大学制造科学与工程学院,四川绵阳621000;西南科技大学特殊环境机器人技术四川省重点实验室,四川绵阳621000;西南科技大学特殊环境机器人技术四川省重点实验室,四川绵阳621000;西南科技大学信息工程学院,四川绵阳621000

基金项目：	国家“十三五”核能开发项目(20161295);国家科技重大专项资助项目(2019ZX06002022)

摘要：	深度强化学习在训练过程中会探索大量环境样本,造成算法收敛时间过长,而重用或传输来自先前任务(源任务)学习的知识,对算法在新任务(目标任务)的学习具有提高算法收敛速度的潜力.为了提高算法学习效率,提出一种双Q网络学习的迁移强化学习算法,其基于actor-critic框架迁移源任务最优值函数的知识,使目标任务中值函数网络对策略作出更准确的评价,引导策略快速向最优策略方向更新.将该算法用于Open AI Gym以及在三维空间机械臂到达目标物位置的实验中,相比于常规深度强化学习算法取得了更好的效果,实验证明提出的双Q网络学习的迁移强化学习算法具有较快的收敛速度,并且在训练过程中算法探索更加稳定.
关键词：	深度强化学习双Q网络学习 actor-critic框架迁移学习
收稿时间：	2020-09-06
修稿时间：	2021-05-09
Transfer reinforcement learning algorithm with double Q-learning

zengrui,zhoujian,liumanlu,zhangjunjun and chenzhuo. Transfer reinforcement learning algorithm with double Q-learning[J]. Application Research of Computers, 2021, 38(6): 1699-1703. DOI: 10.19734/j.issn.1001-3695.2020.09.0232

Authors:	zengrui zhoujian liumanlu zhangjunjun chenzhuo

Affiliation:	Southwest University of Science and Technology,,,,

Abstract:	Deep reinforcement learning explore a large number of environmental samples during the training process, which will cause the algorithm to take too long to converge. Reuse or transfer the knowledge of the previous task(source task), which has the potential to improve the convergence speed for the learning of the algorithm in the new task(target task). In order to improve the efficiency of algorithm learning, this paper proposed transfer reinforcement learning algorithm with double Q-learning. The algorithm based on the actor-critic framework utilized the knowledge of the optimal value function of the source task, so that the value function network of the target task made a more accurate evaluation of the strategy, and guided the stra-tegy to quickly update in the direction of the optimal strategy. In Open AI Gym and the experiments where manipulator reaches the target position in the three-dimensional space, this algorithm achieved better results than conventional deep reinforcement learning algorithms. Experiments show that transfer reinforcement learning algorithm with double Q-learning has faster convergence speed, and the algorithm exploration is more stable during the training process.

Keywords:	deep reinforcement learning(DRL) double Q-learning actor-critic framework transfer learning(TL)
本文献已被万方数据等数据库收录！
	点击此处可从《计算机应用研究》浏览原始摘要信息
	点击此处可从《计算机应用研究》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏