Reducing reinforcement learning to KWIK online regression期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Reducing reinforcement learning to KWIK online regression

Authors:	Lihong Li Michael L Littman

Affiliation:	1. Yahoo! Research, 4401 Great America Parkway, Santa Clara, CA, 95054, USA 2. Rutgers Laboratory for Real-Life Reinforcement Learning (RL3), Department of Computer Science, Rutgers University, Piscataway, NJ, 08854, USA

Abstract:	One of the key problems in reinforcement learning (RL) is balancing exploration and exploitation. Another is learning and acting in large Markov decision processes (MDPs) where compact function approximation has to be used. This paper introduces REKWIRE, a provably efficient, model-free algorithm for finite-horizon RL problems with value function approximation (VFA) that addresses the exploration-exploitation tradeoff in a principled way. The crucial element of this algorithm is a reduction of RL to online regression in the recently proposed KWIK learning model. We show that, if the KWIK online regression problem can be solved efficiently, then the sample complexity of exploration of REKWIRE is polynomial. Therefore, the reduction suggests a new and sound direction to tackle general RL problems. The efficiency of our algorithm is verified on a set of proof-of-concept experiments where popular, ad hoc exploration approaches fail.

Keywords:
本文献已被 SpringerLink 等数据库收录！