Exploring selfish reinforcement learning in repeated games with stochastic rewards |
| |
Authors: | Katja Verbeeck Ann Nowé Johan Parent Karl Tuyls |
| |
Affiliation: | (1) Computational Modeling Lab (COMO), Vrije Universiteit Brussel, Brussels, Belgium;(2) Institute for Knowledge and Agent Technology (IKAT), University of Maastricht, Maastricht, The Netherlands |
| |
Abstract: | In this paper we introduce a new multi-agent reinforcement learning algorithm, called exploring selfish reinforcement learning
(ESRL). ESRL allows agents to reach optimal solutions in repeated non-zero sum games with stochastic rewards, by using coordinated
exploration. First, two ESRL algorithms for respectively common interest and conflicting interest games are presented. Both
ESRL algorithms are based on the same idea, i.e. an agent explores by temporarily excluding some of the local actions from
its private action space, to give the team of agents the opportunity to look for better solutions in a reduced joint action
space. In a latter stage these two algorithms are transformed into one generic algorithm which does not assume that the type
of the game is known in advance. ESRL is able to find the Pareto optimal solution in common interest games without communication.
In conflicting interest games ESRL only needs limited communication to learn a fair periodical policy, resulting in a good
overall policy. Important to know is that ESRL agents are independent in the sense that they only use their own action choices
and rewards to base their decisions on, that ESRL agents are flexible in learning different solution concepts and they can
handle both stochastic, possible delayed rewards and asynchronous action selection. A real-life experiment, i.e. adaptive
load-balancing of parallel applications is added. |
| |
Keywords: | Multi-agent reinforcement learning Learning automata Non-zero sum games |
本文献已被 SpringerLink 等数据库收录! |
|