首页 | 本学科首页   官方微博 | 高级检索  
     


Regret bounds for sleeping experts and bandits
Authors:Robert Kleinberg  Alexandru Niculescu-Mizil  Yogeshwer Sharma
Affiliation:1. Department of Computer Science, Cornell University, Ithaca, NY, 14853, USA
2. Mathematical Sciences Department, IBM T.J. Watson Research Center, Yorktown Heights, NY, 10598, USA
Abstract:We study on-line decision problems where the set of actions that are available to the decision algorithm varies over time. With a few notable exceptions, such problems remained largely unaddressed in the literature, despite their applicability to a large number of practical problems. Departing from previous work on this “Sleeping Experts” problem, we compare algorithms against the payoff obtained by the best ordering of the actions, which is a natural benchmark for this type of problem. We study both the full-information (best expert) and partial-information (multi-armed bandit) settings and consider both stochastic and adversarial rewards models. For all settings we give algorithms achieving (almost) information-theoretically optimal regret bounds (up to a constant or a sub-logarithmic factor) with respect to the best-ordering benchmark.
Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号