首页 | 本学科首页   官方微博 | 高级检索  
     

一类值函数激励学习的遗忘算法
引用本文:陈焕文,谢丽娟,谢建平.一类值函数激励学习的遗忘算法[J].计算机研究与发展,2001,38(4):487-494.
作者姓名:陈焕文  谢丽娟  谢建平
作者单位:1. 长沙电力学院数学与计算机系
2. 长沙交通学院网络中心
基金项目:国家自然科学基金项目资助 !(6 0 0 75 0 19)
摘    要:大状态空间值函数的激励学习是当今国际激励学习领域的一个热点和难点的问题,将记忆心理中有关遗忘的基本原理引入值函数的激励学习,形成了一类适合于值函数激励学习的遗忘算法,首先简要介绍了解决马尔夫决策问题的基本概念,比较了离策略和在策略激励学习算法的差别,概述了标准的SARSA(λ)算法,在分析了人类记忆和遗忘的一些特征后,提出了一个智能遗忘准则,进而将SARSA(λ)算法改进为具有遗忘功能的Forget-SARSA(λ)算法,最后给出了实结果。

关 键 词:激励学习  SARSA(λ)算法  Markov决策过程  遗忘算法  值函数  人工智能

A CLASS OF FORGETTING ALGORITHMS FOR THE VALUE-BASED REINFORCEMENT LEARNING
CHEN Huan-wen,XIE Li-juan,XIE Jian-Ping.A CLASS OF FORGETTING ALGORITHMS FOR THE VALUE-BASED REINFORCEMENT LEARNING[J].Journal of Computer Research and Development,2001,38(4):487-494.
Authors:CHEN Huan-wen  XIE Li-juan  XIE Jian-Ping
Abstract:One of the interesting and difficult problems in recent reinforcement learning (RL) is to solve large-scale state space problem. The basic principle on forgetting in memory psychology has been combined with value-based reinforcement learning, thus generating a class of forgetting algorithms suitable to overcoming the RL problems. In this paper, the basic concepts for solving Markov decision problems are briefly introduced, the differences between off-policy and on-policy algorithms are compared, and the standard SARSA(λ) method is also outlined. After some characteristics of human memory and forgetting are analyzed, a forgetting rule for the RL agent is proposed, and then the SARSA(λ) algorithm is improved so as to form a Forget-SARSA(λ) with forgetting function. Finally, the experimental results are presented.
Keywords:
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号