首页 | 本学科首页   官方微博 | 高级检索  
     


Reducing reinforcement learning to KWIK online regression
Authors:Lihong Li  Michael L Littman
Affiliation:1. Yahoo! Research, 4401 Great America Parkway, Santa Clara, CA, 95054, USA
2. Rutgers Laboratory for Real-Life Reinforcement Learning (RL3), Department of Computer Science, Rutgers University, Piscataway, NJ, 08854, USA
Abstract:One of the key problems in reinforcement learning (RL) is balancing exploration and exploitation. Another is learning and acting in large Markov decision processes (MDPs) where compact function approximation has to be used. This paper introduces REKWIRE, a provably efficient, model-free algorithm for finite-horizon RL problems with value function approximation (VFA) that addresses the exploration-exploitation tradeoff in a principled way. The crucial element of this algorithm is a reduction of RL to online regression in the recently proposed KWIK learning model. We show that, if the KWIK online regression problem can be solved efficiently, then the sample complexity of exploration of REKWIRE is polynomial. Therefore, the reduction suggests a new and sound direction to tackle general RL problems. The efficiency of our algorithm is verified on a set of proof-of-concept experiments where popular, ad hoc exploration approaches fail.
Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号