首页 | 本学科首页   官方微博 | 高级检索  
     


Potential-based online policy iteration algorithms for Markov decision processes
Authors:Hai-Tao Fang Xi-Ren Cao
Affiliation:Lab. of Syst. & Control, Acad. of Math. & Syst. Sci., Beijing, China;
Abstract:Performance potentials play a crucial role in performance sensitivity analysis and policy iteration of Markov decision processes. The potentials can be estimated on a single sample path of a Markov process. In this paper, we propose two potential-based online policy iteration algorithms for performance optimization of Markov systems. The algorithms are based on online estimation of potentials and stochastic approximation. We prove that with these two algorithms the optimal policy can be attained after a finite number of iterations. A simulation example is given to illustrate the main ideas and the convergence rates of the algorithms.
Keywords:
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号