From Perturbation Analysis to Markov Decision Processes and Reinforcement Learning期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

From Perturbation Analysis to Markov Decision Processes and Reinforcement Learning

Authors:	Xi-Ren Cao

Affiliation:	(1) Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong

Abstract:	The goals of perturbation analysis (PA), Markov decision processes (MDPs), and reinforcement learning (RL) are common: to make decisions to improve the system performance based on the information obtained by analyzing the current system behavior. In this paper, we study the relations among these closely related fields. We show that MDP solutions can be derived naturally from performance sensitivity analysis provided by PA. Performance potential plays an important role in both PA and MDPs; it also offers a clear intuitive interpretation for many results. Reinforcement learning, TD(), neuro-dynamic programming, etc., are efficient ways of estimating the performance potentials and related quantities based on sample paths. The sensitivity point of view of PA, MDP, and RL brings in some new insight to the area of learning and optimization. In particular, gradient-based optimization can be applied to parameterized systems with large state spaces, and gradient-based policy iteration can be applied to some nonstandard MDPs such as systems with correlated actions, etc. Potential-based on-line approaches and their advantages are also discussed.

Keywords:	Potentials Poisson equations gradient-based policy iteration perturbation realization Q-learning TD()" target="_blank">gif" alt="lambda" align="BASELINE" BORDER="0">)
本文献已被 SpringerLink 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏