首页 | 本学科首页   官方微博 | 高级检索  
     

一类基于有效跟踪的广义平均奖赏激励学习算法
引用本文:陈焕文,谢建平. 一类基于有效跟踪的广义平均奖赏激励学习算法[J]. 计算机工程与应用, 2002, 38(1): 65-68
作者姓名:陈焕文  谢建平
作者单位:1. 长沙电力学院数学与计算机系,长沙,410077
2. 长沙交通学院网络中心,长沙,410076
基金项目:国家自然科学基金,湖南省教育厅科研基金
摘    要:取消了平均奖赏激励学习的单链或互通MDPs假设,基于有效跟踪技术和折扣奖赏型SARSA(λ)算法,对传统的平均奖赏激励学习进行了推广,提出了一类广义平均奖赏激励学习算法,并对算法的性能进行了初步的比较实验。

关 键 词:激励学习  Markov决策过程  平均奖赏  有效跟踪
文章编号:1002-8331-(2002)01-0065-04
修稿时间:2001-06-01

A Class of Generalized Algorithms of Average-Reward Reinforcement Learning Based on Eligibility Traces
Chen Huanwen Xie Jianping. A Class of Generalized Algorithms of Average-Reward Reinforcement Learning Based on Eligibility Traces[J]. Computer Engineering and Applications, 2002, 38(1): 65-68
Authors:Chen Huanwen Xie Jianping
Affiliation:Chen Huanwen 1 Xie Jianping 21
Abstract:The assumption of unichain or communicating MDPs in average-reward reinforcement learning has been taken off.The classical methods of average -reward reinforcement learning are generalized with eligibility traces and discounted SARSA(λ)algorithm.A class of generalized algorithms for the average -reward reinforcement learning is proposed,and preliminary empirical results are presented to compare the performance of these new algorithms.
Keywords:Reinforcement learning  Markov decision processes(MDPs )  Average rewards  Eligibility traces.  
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号