首页 | 本学科首页   官方微博 | 高级检索  
     


Simulation-based optimization of Markov decision processes: An empirical process theory approach
Authors:Rahul Jain [Author Vitae]  Pravin Varaiya [Author Vitae]
Affiliation:
  • a EE Department, University of Southern California, Los Angeles, CA 90089, USA
  • b ISE Department, University of Southern California, Los Angeles, CA 90089, USA
  • c EECS Department, University of California, Berkeley, CA 94720, USA
  • Abstract:We generalize and build on the PAC Learning framework for Markov Decision Processes developed in Jain and Varaiya (2006). We consider the reward function to depend on both the state and the action. Both the state and action spaces can potentially be countably infinite. We obtain an estimate for the value function of a Markov decision process, which assigns to each policy its expected discounted reward. This expected reward can be estimated as the empirical average of the reward over many independent simulation runs. We derive bounds on the number of runs needed for the convergence of the empirical average to the expected reward uniformly for a class of policies, in terms of the V-C or pseudo dimension of the policy class. We then propose a framework to obtain an ?-optimal policy from simulation. We provide sample complexity of such an approach.
    Keywords:Markov decision processes  Learning algorithms  Monte Carlo simulation  Stochastic Control  Optimization
    本文献已被 ScienceDirect 等数据库收录!
    设为首页 | 免责声明 | 关于勤云 | 加入收藏

    Copyright©北京勤云科技发展有限公司  京ICP备09084417号