The <formula formulatype="inline"><tex>$n$</tex></formula>th-Order Bias Optimality for Multichain Markov Decision Processes期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

The $n$th-Order Bias Optimality for Multichain Markov Decision Processes

Authors:	Xi-Ren Cao Junyu Zhang

Affiliation:	Dept. of Electron. & Comput. Eng., Hong Kong Univ. of Sci. & Technol., Hong Kong;

Abstract:	In this paper, we propose a new approach to the theory of finite multichain Markov decision processes (MDPs) with different performance optimization criteria. We first propose the concept of nth-order bias; then, using the average reward and bias difference formulas derived in this paper, we develop an optimization theory for finite MDPs that covers a complete spectrum from average optimality, bias optimality, to all high-order bias optimality, in a unified way. The approach is simple, direct, natural, and intuitive; it depends neither on Laurent series expansion nor on discounted MDPs. We also propose one-phase policy iteration algorithms for bias and high-order bias optimal policies, which are more efficient than the two-phase algorithms in the literature. Furthermore, we derive high-order bias optimality equations. This research is a part of our effort in developing sensitivity-based learning and optimization theory.

Keywords: