首页 | 本学科首页   官方微博 | 高级检索  
     


Least Squares Policy Evaluation Algorithms with Linear Function Approximation
Authors:A Nedi?  D P Bertsekas
Affiliation:(1) Department of Electrical Engineering and Computer Science, M.I.T., Cambridge, MA 02139, USA
Abstract:We consider policy evaluation algorithms within the context of infinite-horizon dynamic programming problems with discounted cost. We focus on discrete-time dynamic systems with a large number of states, and we discuss two methods, which use simulation, temporal differences, and linear cost function approximation. The first method is a new gradient-like algorithm involving least-squares subproblems and a diminishing stepsize, which is based on the lambda-policy iteration method of Bertsekas and Ioffe. The second method is the LSTD(lambda) algorithm recently proposed by Boyan, which for lambda=0 coincides with the linear least-squares temporal-difference algorithm of Bradtke and Barto. At present, there is only a convergence result by Bradtke and Barto for the LSTD(0) algorithm. Here, we strengthen this result by showing the convergence of LSTD(lambda), with probability 1, for every lambda isin 0, 1].
Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号