基于二阶时序差分误差的双网络DQN算法 Dual Network DQN Algorithm Based on Second-order Temporal Difference Error期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于二阶时序差分误差的双网络DQN算法

引用本文：	陈建平,周鑫,傅启明,高振,付保川,吴宏杰.基于二阶时序差分误差的双网络DQN算法[J].计算机工程,2020,46(5):78-85,93.

作者姓名：	陈建平周鑫傅启明高振付保川吴宏杰

作者单位：	苏州科技大学电子与信息工程学院,江苏苏州215009;苏州科技大学江苏省建筑智慧节能重点实验室,江苏苏州215009;苏州科技大学电子与信息工程学院,江苏苏州215009

基金项目：	江苏省重点研发计划;江苏省研究生科研与实践创新计划项目;国家自然科学基金

摘要：	针对深度Q网络(DQN)算法因过估计导致收敛稳定性差的问题,在传统时序差分(TD)的基础上提出N阶TD误差的概念,设计基于二阶TD误差的双网络DQN算法。构造基于二阶TD误差的值函数更新公式,同时结合DQN算法建立双网络模型,得到两个同构的值函数网络分别用于表示先后两轮的值函数,协同更新网络参数,以提高DQN算法中值函数估计的稳定性。基于Open AI Gym平台的实验结果表明,在解决Mountain Car和Cart Pole问题方面,该算法较经典DQN算法具有更好的收敛稳定性。
关键词：	深度强化学习马尔科夫决策过程深度Q网络二阶时序差分误差梯度下降
Dual Network DQN Algorithm Based on Second-order Temporal Difference Error

CHEN Jianping,ZHOU Xin,FU Qiming,GAO Zhen,FU Baochuan,WU Hongjie.Dual Network DQN Algorithm Based on Second-order Temporal Difference Error[J].Computer Engineering,2020,46(5):78-85,93.

Authors:	CHEN Jianping ZHOU Xin FU Qiming GAO Zhen FU Baochuan WU Hongjie

Affiliation:	(School of Electronic and Information Engineering,Suzhou University of Science and Technology,Suzhou,Jiangsu 215009,China;Jiangsu Province Key Laboratory of Intelligent Building Energy Efficiency,Suzhou University of Science and Technology,Suzhou,Jiangsu 215009,China)

Abstract:	Aiming at the problem of poor convergence stability caused by overestimation of Depth Q-Network(DQN)algorithm,on the basis of traditional Temporal Difference(TD),the concept of n-order TD error is proposed and a dual-network DQN algorithm based on second-order TD error is designed.A value function updating formula based on second-order TD error is constructed.Meanwhile,a two-network model is established in combination with DQN algorithm,and two isomorphic value function networks are obtained,which are respectively used to represent the value functions of two successive rounds,and the network parameters are cooperatively updated to improve the stability of value function estimation in DQN algorithm.Experimental results based on the Open AI Gym platform show that,the proposed algorithm has better convergence stability compared with the classical DQN algorithm in solving the Mountain Car and Cart Pole problems.

Keywords:	Deep Reinforcement Learning(DRL) Markov Decision Process(MDP) Deep Q-Network(DQN) second-order Temporal Difference(TD)error gradient descent
本文献已被维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏