首页 | 本学科首页   官方微博 | 高级检索  
     


Technical Update: Least-Squares Temporal Difference Learning
Authors:Boyan  Justin A
Affiliation:(1) ITA Software, 141 Portland Street, Cambridge, MA 02139, USA
Abstract:TD.lambda/ is a popular family of algorithms for approximate policy evaluation in large MDPs. TD.lambda/ works by incrementally updating the value function after each observed transition. It has two major drawbacks: it may make inefficient use of data, and it requires the user to manually tune a stepsize schedule for good performance. For the case of linear value function approximations and lambda = 0, the Least-Squares TD (LSTD) algorithm of Bradtke and Barto (1996, Machine learning, 22:1–3, 33–57) eliminates all stepsize parameters and improves data efficiency.This paper updates Bradtke and Barto's work in three significant ways. First, it presents a simpler derivation of the LSTD algorithm. Second, it generalizes from lambda = 0 to arbitrary values of lambda; at the extreme of lambda = 1, the resulting new algorithm is shown to be a practical, incremental formulation of supervised linear regression. Third, it presents a novel and intuitive interpretation of LSTD as a model-based reinforcement learning technique.
Keywords:reinforcement learning  temporal difference learning  value function approximation  linear least-squares methods
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号