首页 | 本学科首页   官方微博 | 高级检索  
     


Asymptotic analysis of temporal-difference learning algorithms with constant step-sizes
Authors:Vladislav B. Tadić
Affiliation:(1) Department of Automatic Control and Systems Engineering, University of Sheffield, S1 3JD Sheffield, United Kingdom
Abstract:The mean-square asymptotic behavior of temporal-difference learning algorithms with constant step-sizes and linear function approximation is analyzed in this paper. The analysis is carried out for the case of discounted cost function associated with a Markov chain with a finite dimensional state-space. Under mild conditions, an upper bound for the asymptotic mean-square error of these algorithms is determined as a function of the step-size. Moreover, under the same assumptions, it is also shown that this bound is linear in the step size. The main results of the paper are illustrated with examples related to M/G/1 queues and nonlinear AR models with Markov switching. Editor: Robert Schapire
Keywords:Temporal-difference learning  Neuro-dynamic programming  Reinforcement learning  Stochastic approximation  Markov chains
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号