首页 | 本学科首页   官方微博 | 高级检索  
     


Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms
Authors:Singh  Satinder  Jaakkola  Tommi  Littman  Michael L  Szepesvári  Csaba
Affiliation:(1) AT&T Labs-Research, 180 Park Avenue, Florham Park, NJ 07932, USA;(2) Department of Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139, USA;(3) Department of Computer Science, Duke University, Durham, NC 27708-0129, USA;(4) Mindmaker Ltd., Konkoly Thege M. u. 29-33, Budapest, 1121, Hungary
Abstract:An important application of reinforcement learning (RL) is to finite-state control problems and one of the most difficult problems in learning for control is balancing the exploration/exploitation tradeoff. Existing theoretical results for RL give very little guidance on reasonable ways to perform exploration. In this paper, we examine the convergence of single-step on-policy RL algorithms for control. On-policy algorithms cannot separate exploration from learning and therefore must confront the exploration problem directly. We prove convergence results for several related on-policy algorithms with both decaying exploration and persistent exploration. We also provide examples of exploration strategies that can be followed during learning that result in convergence to both optimal values and optimal policies.
Keywords:reinforcement-learning  on-policy  convergence  Markov decision processes
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号