首页 | 本学科首页   官方微博 | 高级检索  
     


New algorithms of the Q-learning type
Authors:Shalabh Bhatnagar [Author Vitae]  K Mohan Babu [Author Vitae]
Affiliation:a Department of Computer Science and Automation, Indian Institute of Science, Bangalore 560 012, India
b Motorola India Electronics Ltd., Bangalore, India
Abstract:We propose two algorithms for Q-learning that use the two-timescale stochastic approximation methodology. The first of these updates Q-values of all feasible state-action pairs at each instant while the second updates Q-values of states with actions chosen according to the ‘current’ randomized policy updates. A proof of convergence of the algorithms is shown. Finally, numerical experiments using the proposed algorithms on an application of routing in communication networks are presented on a few different settings.
Keywords:Q-learning  Reinforcement learning  Markov decision processes  Two-timescale stochastic approximation  SPSA
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号