Rationality of reward sharing in multi-agent reinforcement learning |
| |
Authors: | Kazuteru Miyazaki Shigenobu Kobayashi |
| |
Affiliation: | (1) National Institution for Academic Degrees, 3-29-1 Ootsuka Bunkyo-ku, 112-0012 Tokyo, Japan;(2) Tokyo Institute of Technology, 4259 Nagatsuta Midori, 226-8502 Yokohama, Japan |
| |
Abstract: | ![]() In multi-agent reinforcement learning systems, it is important to share a reward among all agents. We focus on theRationality Theorem of Profit Sharing 5) and analyze how to share a reward among all profit sharing agents. When an agent gets adirect reward R (R>0), anindirect reward μR (μ≥0) is given to the other agents. We have derived the necessary and sufficient condition to preserve the rationality as follows; whereM andL are the maximum number of conflicting all rules and rational rules in the same sensory input,W andW o are the maximum episode length of adirect and anindirect-reward agents, andn is the number of agents. This theory is derived by avoiding the least desirable situation whose expected reward per an action is zero. Therefore, if we use this theorem, we can experience several efficient aspects of reward sharing. Through numerical examples, we confirm the effectiveness of this theorem. Kazuteru Miyazaki, Dr. Eng.: He is an associate professor in the Faculty of Assessment and Research for Degrees at National Institution for Academic Degrees. He obtained his BEng. form Meiji University in 1991, and his Dr. Eng. form Tokyo Institute of Technology in 1996. His research interests are in Machine Learning and Robotics. He has published over 30 research papers and received several awards. He is a member of the Japan Society of Mechanical Engineers (JSME), Japanese Society for Artificial Intelligence (JSAI), and the Society of Instrument and Control Engineers of Japan (SICE). Shigenobu Kobayashi, Dr. Eng.: He received his Dr. Eng. from Tokyo Institute of Technology in 1974. He is professor at Dept. of Computational Intelligence and Systems Science, Tokyo Institute of Technology. His research interests include artificial intelligence, emergent systems, evolutionary computation and reinforcement learning. |
| |
Keywords: | Reinforcement Learning Multi-agent System Profit Sharing Rationality Theorem Direct and Indirect Rewards |
本文献已被 SpringerLink 等数据库收录! |
|