New algorithms of the Q-learning type |
| |
Authors: | Shalabh Bhatnagar [Author Vitae] K Mohan Babu [Author Vitae] |
| |
Affiliation: | a Department of Computer Science and Automation, Indian Institute of Science, Bangalore 560 012, India b Motorola India Electronics Ltd., Bangalore, India |
| |
Abstract: | We propose two algorithms for Q-learning that use the two-timescale stochastic approximation methodology. The first of these updates Q-values of all feasible state-action pairs at each instant while the second updates Q-values of states with actions chosen according to the ‘current’ randomized policy updates. A proof of convergence of the algorithms is shown. Finally, numerical experiments using the proposed algorithms on an application of routing in communication networks are presented on a few different settings. |
| |
Keywords: | Q-learning Reinforcement learning Markov decision processes Two-timescale stochastic approximation SPSA |
本文献已被 ScienceDirect 等数据库收录! |