首页 | 本学科首页   官方微博 | 高级检索  
     


A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning
Authors:David Choi  Benjamin Van Roy
Affiliation:(1) Lincoln Laboratory, Massachusetts Institue of Technology, 244 Wood Street, Lexington, MA 02420-9108, USA;(2) Departments of Management Science and Engineering and Electrical Engineering, Stanford University, Stanford, CA 94305, USA
Abstract:The traditional Kalman filter can be viewed as a recursive stochastic algorithm that approximates an unknown function via a linear combination of prespecified basis functions given a sequence of noisy samples. In this paper, we generalize the algorithm to one that approximates the fixed point of an operator that is known to be a Euclidean norm contraction. Instead of noisy samples of the desired fixed point, the algorithm updates parameters based on noisy samples of functions generated by application of the operator, in the spirit of Robbins–Monro stochastic approximation. The algorithm is motivated by temporal-difference learning, and our developments lead to a possibly more efficient variant of temporal-difference learning. We establish convergence of the algorithm and explore efficiency gains through computational experiments involving optimal stopping and queueing problems. This research was supported in part by NSF CAREER Grant ECS-9985229, and by the ONR under Grant MURI N00014-00-1-0637.
Keywords:Dynamic programming  Kalman filter  Optimal stopping  Queueing  Recursive least-squares  Reinforcement learning  Temporal-difference learning
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号