首页 | 本学科首页   官方微博 | 高级检索  
     

多步R学习算法
引用本文:胡光华,吴沧浦.多步R学习算法[J].北京理工大学学报(英文版),1999,8(3):245-250.
作者姓名:胡光华  吴沧浦
作者单位:北京理工大学自动控制系,北京,100081
基金项目:国家自然科学基金;69674005;
摘    要:目的讨论平均准则下控制马氏链的强化学习算法,在事先未知状态转移矩阵及报酬函数的条件下,通过试凑法寻找使得长期每阶段期望平均报酬最大的最优控制策略.方法结合平均报酬问题的一步学习算法和即时差分学习算法,提出了一种多步强化学习算法--R(λ)学习算法.结果与结论新算法使得已有的R学习算法成为其λ=0时的特例.同时它也是折扣报酬Q(λ)学习算法到平均报酬问题的自然推广.仿真结果表明λ取中间值的R(λ)学习算法明显优于一步的R学习算法.

关 键 词:强化学习  平均报酬  R学习  Markov决策过程  即时差分学习
收稿时间:1998/10/13 0:00:00

Incremental Multi-Step R-Learning
Hu Guanghua and Wu Cangpu.Incremental Multi-Step R-Learning[J].Journal of Beijing Institute of Technology,1999,8(3):245-250.
Authors:Hu Guanghua and Wu Cangpu
Affiliation:Department of Automatic Control, Beijing Institute of Technology, Beijing 100081;Department of Automatic Control, Beijing Institute of Technology, Beijing 100081
Abstract:Aim To investigate the model-free multi-step average reward reinforcement learning algorithm. Methods By combining the R-learning algorithms with the temporal-difference learning (TD(λ)-learning) algorithms for average reward problems, a novel incremental algorithm, called R(λ)-learning, was proposed. Results and Conclusion The proposed algorithm is a natural extension of the Q(λ)-learning, the multi-step discounted reward reinforcement learning algorithm, to the average reward cases. Simulation results show that the R(λ)-learning with intermediate λ values makes significant performance improvement over the simple R-learning.
Keywords:reinforcement learning  average reward  R-learning  Markov decision processes  temporal difference learning
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《北京理工大学学报(英文版)》浏览原始摘要信息
点击此处可从《北京理工大学学报(英文版)》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号