多步R学习算法 Incremental Multi-Step R-Learning期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

多步R学习算法

引用本文：	胡光华,吴沧浦.多步R学习算法[J].北京理工大学学报(英文版),1999,8(3):245-250.

作者姓名：	胡光华吴沧浦

作者单位：	北京理工大学自动控制系,北京,100081

基金项目：	国家自然科学基金;69674005;

摘要：	目的讨论平均准则下控制马氏链的强化学习算法,在事先未知状态转移矩阵及报酬函数的条件下,通过试凑法寻找使得长期每阶段期望平均报酬最大的最优控制策略.方法结合平均报酬问题的一步学习算法和即时差分学习算法,提出了一种多步强化学习算法--R(λ)学习算法.结果与结论新算法使得已有的R学习算法成为其λ=0时的特例.同时它也是折扣报酬Q(λ)学习算法到平均报酬问题的自然推广.仿真结果表明λ取中间值的R(λ)学习算法明显优于一步的R学习算法.
关键词：	强化学习平均报酬 R学习 Markov决策过程即时差分学习
收稿时间：	1998/10/13 0:00:00
Incremental Multi-Step R-Learning

Hu Guanghua and Wu Cangpu.Incremental Multi-Step R-Learning[J].Journal of Beijing Institute of Technology,1999,8(3):245-250.

Authors:	Hu Guanghua and Wu Cangpu

Affiliation:	Department of Automatic Control, Beijing Institute of Technology, Beijing 100081;Department of Automatic Control, Beijing Institute of Technology, Beijing 100081

Abstract:	Aim To investigate the model-free multi-step average reward reinforcement learning algorithm. Methods By combining the R-learning algorithms with the temporal-difference learning (TD(λ)-learning) algorithms for average reward problems, a novel incremental algorithm, called R(λ)-learning, was proposed. Results and Conclusion The proposed algorithm is a natural extension of the Q(λ)-learning, the multi-step discounted reward reinforcement learning algorithm, to the average reward cases. Simulation results show that the R(λ)-learning with intermediate λ values makes significant performance improvement over the simple R-learning.

Keywords:	reinforcement learning average reward R-learning Markov decision processes temporal difference learning
本文献已被 CNKI 万方数据等数据库收录！
	点击此处可从《北京理工大学学报(英文版)》浏览原始摘要信息
	点击此处可从《北京理工大学学报(英文版)》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏