一类基于有效跟踪的广义平均奖赏激励学习算法 A Class of Generalized Algorithms of Average-Reward Reinforcement Learning Based on Eligibility Traces期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

一类基于有效跟踪的广义平均奖赏激励学习算法

引用本文：	陈焕文,谢建平. 一类基于有效跟踪的广义平均奖赏激励学习算法[J]. 计算机工程与应用, 2002, 38(1): 65-68

作者姓名：	陈焕文谢建平

作者单位：	1. 长沙电力学院数学与计算机系,长沙,410077 2. 长沙交通学院网络中心,长沙,410076

基金项目：	国家自然科学基金，湖南省教育厅科研基金

摘要：	取消了平均奖赏激励学习的单链或互通MDPs假设,基于有效跟踪技术和折扣奖赏型SARSA(λ)算法,对传统的平均奖赏激励学习进行了推广,提出了一类广义平均奖赏激励学习算法,并对算法的性能进行了初步的比较实验。
关键词：	激励学习 Markov决策过程平均奖赏有效跟踪
文章编号：	1002-8331-(2002)01-0065-04
修稿时间：	2001-06-01
A Class of Generalized Algorithms of Average-Reward Reinforcement Learning Based on Eligibility Traces

Chen Huanwen Xie Jianping. A Class of Generalized Algorithms of Average-Reward Reinforcement Learning Based on Eligibility Traces[J]. Computer Engineering and Applications, 2002, 38(1): 65-68

Authors:	Chen Huanwen Xie Jianping

Affiliation:	Chen Huanwen 1 Xie Jianping 21

Abstract:	The assumption of unichain or communicating MDPs in average-reward reinforcement learning has been taken off.The classical methods of average -reward reinforcement learning are generalized with eligibility traces and discounted SARSA(λ)algorithm.A class of generalized algorithms for the average -reward reinforcement learning is proposed,and preliminary empirical results are presented to compare the performance of these new algorithms.

Keywords:	Reinforcement learning Markov decision processes(MDPs ) Average rewards Eligibility traces.
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏