基于最佳子策略记忆的强化探索策略 Reinforcement Exploration Strategy Based on Best Sub-Strategy Memory期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于最佳子策略记忆的强化探索策略

引用本文：	周瑞朋,秦进.基于最佳子策略记忆的强化探索策略[J].计算机工程,2022,48(2):106-112.

作者姓名：	周瑞朋秦进

作者单位：	贵州大学计算机科学与技术学院, 贵阳 550025

基金项目：	国家自然科学基金（61562009）；;贵州省科学技术基金（黔科合支撑[2020]3Y004号）；

摘要：	现有强化学习探索策略存在过度探索的问题，导致智能体收敛速度减慢。通过设计一个基于奖励排序的存储表（M表）和ε-greedy改进算法，提出基于最佳子策略记忆的强化探索策略。将奖励值大于零的样本以子策略的形式存入M表，使其基于奖励降序排序，在整个训练过程中，使用与表中相似且奖励值较高的样本以子策略形式替换表中子策略，从而在表中形成一个能有效产生目前最优奖励的动作集合，提高探索的针对性，而不是随机探索。同时，在ε-greedy算法基础上按一定的概率分配，使智能体通过使用M表探索得到MEG探索策略。基于此，智能体在一定概率下将当前状态与M表中子策略匹配，若相似，则将表中与其相似的子策略对应动作反馈给智能体，智能体执行该动作。实验结果表明，该策略能够有效缓解过度探索现象，与DQN系列算法和非DQN系列的A2C算法相比，其在Playing Atari 2600游戏的控制问题中获得了更高的平均奖励值。
关键词：	强化学习过度探索 MEG探索相似度最佳子策略
收稿时间：	2020-12-04
修稿时间：	2021-01-28
Reinforcement Exploration Strategy Based on Best Sub-Strategy Memory

ZHOU Ruipeng,QIN Jin.Reinforcement Exploration Strategy Based on Best Sub-Strategy Memory[J].Computer Engineering,2022,48(2):106-112.

Authors:	ZHOU Ruipeng QIN Jin

Affiliation:	College of Computer Science and Technology, Guizhou University, Guiyang 550025, China

Abstract:	Existing reinforcement learning exploration strategies are limited by over exploration, resulting in the slow convergence of agents.To address this issue, in this study, a storage table(M table) is designed and the ε-greedy algorithm is improved upon to propose a reinforcement exploration strategy based on best sub-strategy memory.The samples with reward values greater than zero are stored in the M table in the form of sub-strategies, which are then sorted in descending order based on the reward.During the training process, samples with similar and higher reward values are used to replace the sub-strategies in the table, to form an action set that can effectively produce the current optimal reward in the table, while making the exploration process more relevant rather than random.Additionally, based on the ε-greedy algorithm, the sub-strategies are distributed according to a certain probability, such that the agent can obtain the M-Epsilon-Greedy(MEG) exploration strategy by using the M table.Under this strategy, the agent matches the current state with the sub-strategy in the M table for a certain probability, whereby in the case of a match, the corresponding action of the sub-strategy in the table is fed back to the agent, and the agent executes the action.Experimental results indicate that this strategy can effectively alleviate the phenomenon of over exploration.Compared with the DQN series algorithm and non-DQN series A2C algorithm, a higher reward value is obtained in the control problem of Playing Atari 2600 game using the proposed strategy.

Keywords:	reinforcement learning excessive exploration M-Epsilon-Greedy(MEG)exploration similarity best sub-strategy
本文献已被维普等数据库收录！
	点击此处可从《计算机工程》浏览原始摘要信息
	点击此处可从《计算机工程》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏