首页 | 本学科首页   官方微博 | 高级检索  
     

一种深度Q网络的改进算法
引用本文:夏宗涛,秦进.一种深度Q网络的改进算法[J].计算机应用研究,2019,36(12).
作者姓名:夏宗涛  秦进
作者单位:贵州大学 计算机科学与技术学院,贵州大学 计算机科学与技术学院
基金项目:国家自然科学基金资助项目(61562009)
摘    要:深度Q网络存在严重的过估计问题,导致智能体寻找最优策略的能力下降。为了缓解深度Q网络中存在的过估计问题,提出一个更正函数用于对深度Q网络中的评价函数进行改进,当选择的动作为最优动作时更正函数为1,不对当前状态—动作值进行修改,当选择的动作不是最优动作时更正函数小于1,缩小当前状态—动作值,从而使得最优状态—动作值与非最优状态—动作值的差异增大,减少过估计问题的影响。实验证明改进的算法在Playing Atari 2600视频游戏以及OpenAI Gym中取得了更好的性能。说明改进的算法比深度Q网络寻得了更优的策略。

关 键 词:深度Q网络    过估计问题    更正函数    状态—动作值
收稿时间:2018/7/25 0:00:00
修稿时间:2019/11/13 0:00:00

Improved algorithm for deep Q net
Xia Zongtao and Qin Jin.Improved algorithm for deep Q net[J].Application Research of Computers,2019,36(12).
Authors:Xia Zongtao and Qin Jin
Affiliation:College of Computer Science and Technology,Guizhou University,Guiyang Guizhou,
Abstract:There is a serious overestimation problem in deep Q net, which leads to reduce the ability of the agent to find the optimal strategy. In order to relieve the overestimation in deep Q net, this paper proposed a correction function to improve the evaluation function of deep Q net. When the selected action is the optimal action, the correction function is 1, and the current state-action value is not modified. When the selected action is not the optimal action, the correction function is less than 1, and the current state-action value is reduced. Thus the difference between non-optimal state-action values increases, reducing the impact of overestimation. Experiments show that the improved algorithm achieves better performance in Playing Atari 2600 and OpenAI Gym. indicating that the improved algorithm could find a better strategy than deep Q net.
Keywords:deep Q net  overestimation  correction function  state-action value
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号