New algorithms of the Q-learning type期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

New algorithms of the Q-learning type

Authors:	Shalabh Bhatnagar [Author Vitae] K Mohan Babu [Author Vitae]

Affiliation:	^a Department of Computer Science and Automation, Indian Institute of Science, Bangalore 560 012, India ^b Motorola India Electronics Ltd., Bangalore, India

Abstract:	We propose two algorithms for Q-learning that use the two-timescale stochastic approximation methodology. The first of these updates Q-values of all feasible state-action pairs at each instant while the second updates Q-values of states with actions chosen according to the ‘current’ randomized policy updates. A proof of convergence of the algorithms is shown. Finally, numerical experiments using the proposed algorithms on an application of routing in communication networks are presented on a few different settings.

Keywords:	Q-learning Reinforcement learning Markov decision processes Two-timescale stochastic approximation SPSA
本文献已被 ScienceDirect 等数据库收录！