Transfer in variable-reward hierarchical reinforcement learning |
| |
Authors: | Neville Mehta Sriraam Natarajan Prasad Tadepalli Alan Fern |
| |
Affiliation: | (1) School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, 97330, USA |
| |
Abstract: | Transfer learning seeks to leverage previously learned tasks to achieve faster learning in a new task. In this paper, we consider
transfer learning in the context of related but distinct Reinforcement Learning (RL) problems. In particular, our RL problems are derived from Semi-Markov Decision Processes (SMDPs) that share the same
transition dynamics but have different reward functions that are linear in a set of reward features. We formally define the
transfer learning problem in the context of RL as learning an efficient algorithm to solve any SMDP drawn from a fixed distribution
after experiencing a finite number of them. Furthermore, we introduce an online algorithm to solve this problem, Variable-Reward
Reinforcement Learning (VRRL), that compactly stores the optimal value functions for several SMDPs, and uses them to optimally
initialize the value function for a new SMDP. We generalize our method to a hierarchical RL setting where the different SMDPs
share the same task hierarchy. Our experimental results in a simplified real-time strategy domain show that significant transfer
learning occurs in both flat and hierarchical settings. Transfer is especially effective in the hierarchical setting where
the overall value functions are decomposed into subtask value functions which are more widely amenable to transfer across
different SMDPs. |
| |
Keywords: | Hierarchical reinforcement learning Transfer learning Average-reward learning Multi-criteria learning |
本文献已被 SpringerLink 等数据库收录! |
|