首页 | 本学科首页   官方微博 | 高级检索  
     


Linear Least-Squares Algorithms for Temporal Difference Learning
Authors:Bradtke  Steven J.  Barto  Andrew G.
Affiliation:(1) One E Telecom Pkwy, GTE Data Services, DC B2H, 33637 Temple Terrace, FL;(2) Dept. of Computer Science, University of Massachusetts, 01003-4610 Amherst, MA
Abstract:We introduce two new temporal difference (TD) algorithms based on the theory of linear least-squares function approximation. We define an algorithm we call Least-Squares TD (LS TD) for which we prove probability-one convergence when it is used with a function approximator linear in the adjustable parameters. We then define a recursive version of this algorithm, Recursive Least-Square TD (RLS TD). Although these new TD algorithms require more computation per time-step than do Suttonlsquos TD(lambda) algorithms, they are more efficient in a statistical sense because they extract more information from training experiences. We describe a simulation experiment showing the substantial improvement in learning rate achieved by RLS TD in an example Markov prediction problem. To quantify this improvement, we introduce the TD error variance of a Markov chain, sgrTD, and experimentally conclude that the convergence rate of a TD algorithm depends linearly on sgrTD. In addition to converging more rapidly, LS TD and RLS TD do not have control parameters, such as a learning rate parameter, thus eliminating the possibility of achieving poor performance by an unlucky choice of parameters.
Keywords:Reinforcement learning  Markov Decision Problems  Temporal Difference Methods  Least-Squares
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号