一类值函数激励学习的遗忘算法 A CLASS OF FORGETTING ALGORITHMS FOR THE VALUE-BASED REINFORCEMENT LEARNING期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

一类值函数激励学习的遗忘算法

引用本文：	陈焕文,谢丽娟,谢建平.一类值函数激励学习的遗忘算法[J].计算机研究与发展,2001,38(4):487-494.

作者姓名：	陈焕文谢丽娟谢建平

作者单位：	1. 长沙电力学院数学与计算机系 2. 长沙交通学院网络中心

基金项目：	国家自然科学基金项目资助 !(6 0 0 75 0 19)

摘要：	大状态空间值函数的激励学习是当今国际激励学习领域的一个热点和难点的问题，将记忆心理中有关遗忘的基本原理引入值函数的激励学习，形成了一类适合于值函数激励学习的遗忘算法，首先简要介绍了解决马尔夫决策问题的基本概念，比较了离策略和在策略激励学习算法的差别，概述了标准的ＳＡＲＳＡ（λ）算法，在分析了人类记忆和遗忘的一些特征后，提出了一个智能遗忘准则，进而将ＳＡＲＳＡ（λ）算法改进为具有遗忘功能的Ｆorget-ＳＡＲＳＡ（λ）算法，最后给出了实结果。
关键词：	激励学习 SARSA（λ）算法 Markov决策过程遗忘算法值函数人工智能
A CLASS OF FORGETTING ALGORITHMS FOR THE VALUE-BASED REINFORCEMENT LEARNING

CHEN Huan-wen,XIE Li-juan,XIE Jian-Ping.A CLASS OF FORGETTING ALGORITHMS FOR THE VALUE-BASED REINFORCEMENT LEARNING[J].Journal of Computer Research and Development,2001,38(4):487-494.

Authors:	CHEN Huan-wen XIE Li-juan XIE Jian-Ping

Abstract:	One of the interesting and difficult problems in recent reinforcement learning (RL) is to solve large-scale state space problem. The basic principle on forgetting in memory psychology has been combined with value-based reinforcement learning, thus generating a class of forgetting algorithms suitable to overcoming the RL problems. In this paper, the basic concepts for solving Markov decision problems are briefly introduced, the differences between off-policy and on-policy algorithms are compared, and the standard SARSA(λ) method is also outlined. After some characteristics of human memory and forgetting are analyzed, a forgetting rule for the RL agent is proposed, and then the SARSA(λ) algorithm is improved so as to form a Forget-SARSA(λ) with forgetting function. Finally, the experimental results are presented.

Keywords:
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏