首页 | 本学科首页   官方微博 | 高级检索  
     


Solving Controlled Markov Set-Chains With Discounting via Multipolicy Improvement
Authors:Hyeong Soo Chang Chong  EKP
Affiliation:Dept. of Comput. Sci. & Eng., Sogang Univ., Seoul;
Abstract:We consider Markov decision processes (MDPs) where the state transition probability distributions are not uniquely known, but are known to belong to some intervals-so called "controlled Markov set-chains"-with infinite-horizon discounted reward criteria. We present formal methods to improve multiple policies for solving such controlled Markov set-chains. Our multipolicy improvement methods follow the spirit of parallel rollout and policy switching for solving MDPs. In particular, these methods are useful for online control of Markov set-chains and for designing policy iteration (PI) type algorithms. We develop a PI-type algorithm and prove that it converges to an optimal policy
Keywords:
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号