首页 | 本学科首页   官方微博 | 高级检索  
     


An analysis of model-based Interval Estimation for Markov Decision Processes
Authors:Alexander L. Strehl
Affiliation:a Yahoo! Inc, 701 First Avenue, Sunnyvale, California 94089, USA
b Computer Science Department, Rutgers University, 110 Frelinghuysen Road, Piscataway, NJ 08854, USA
Abstract:Several algorithms for learning near-optimal policies in Markov Decision Processes have been analyzed and proven efficient. Empirical results have suggested that Model-based Interval Estimation (MBIE) learns efficiently in practice, effectively balancing exploration and exploitation. This paper presents a theoretical analysis of MBIE and a new variation called MBIE-EB, proving their efficiency even under worst-case conditions. The paper also introduces a new performance metric, average loss, and relates it to its less “online” cousins from the literature.
Keywords:Reinforcement learning   Learning theory   Markov Decision Processes
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号