一种深度Q网络的改进算法 Improved algorithm for deep Q net期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

一种深度Q网络的改进算法

引用本文：	夏宗涛,秦进.一种深度Q网络的改进算法[J].计算机应用研究,2019,36(12).

作者姓名：	夏宗涛秦进

作者单位：	贵州大学计算机科学与技术学院,贵州大学计算机科学与技术学院

基金项目：	国家自然科学基金资助项目(61562009)

摘要：	深度Q网络存在严重的过估计问题，导致智能体寻找最优策略的能力下降。为了缓解深度Q网络中存在的过估计问题，提出一个更正函数用于对深度Q网络中的评价函数进行改进，当选择的动作为最优动作时更正函数为1，不对当前状态—动作值进行修改，当选择的动作不是最优动作时更正函数小于1，缩小当前状态—动作值，从而使得最优状态—动作值与非最优状态—动作值的差异增大，减少过估计问题的影响。实验证明改进的算法在Playing Atari 2600视频游戏以及OpenAI Gym中取得了更好的性能。说明改进的算法比深度Q网络寻得了更优的策略。
关键词：	深度Q网络过估计问题更正函数状态—动作值
收稿时间：	2018/7/25 0:00:00
修稿时间：	2019/11/13 0:00:00
Improved algorithm for deep Q net

Xia Zongtao and Qin Jin.Improved algorithm for deep Q net[J].Application Research of Computers,2019,36(12).

Authors:	Xia Zongtao and Qin Jin

Affiliation:	College of Computer Science and Technology,Guizhou University,Guiyang Guizhou,

Abstract:	There is a serious overestimation problem in deep Q net, which leads to reduce the ability of the agent to find the optimal strategy. In order to relieve the overestimation in deep Q net, this paper proposed a correction function to improve the evaluation function of deep Q net. When the selected action is the optimal action, the correction function is 1, and the current state-action value is not modified. When the selected action is not the optimal action, the correction function is less than 1, and the current state-action value is reduced. Thus the difference between non-optimal state-action values increases, reducing the impact of overestimation. Experiments show that the improved algorithm achieves better performance in Playing Atari 2600 and OpenAI Gym. indicating that the improved algorithm could find a better strategy than deep Q net.

Keywords:	deep Q net overestimation correction function state-action value

	点击此处可从《计算机应用研究》浏览原始摘要信息
	点击此处可从《计算机应用研究》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏