首页 | 本学科首页   官方微博 | 高级检索  
     

Multiagent reinforcement learning through merging individually learned value functions
作者姓名:张化祥  黄上腾
作者单位:[1]InformationandManagementSchool,ShandongNormalUniversity,Jinan250014,China [2]Dept.ofComputerScieneeandEngineering,ShanghaiJiaotongUniversity,Shanghai200030,China
摘    要:In cooperative multiagent systems, to learn the optimal policies of multiagents is very difficult. As the numbers of states and actions increase exponentially with the number of agents, their action policies become more intractable. By learning these value functions, an agent can learn its optimal action policies for a task. If a task can be decomposed into several subtasks and the agents have learned the optimal value functions for each subtask, this knowledge can be helpful for the agents in learning the optimal action policies for the whole task when they are acting simultaneously. When merging the agents‘ independently learned optimal value functions,a novel multiagent online reinforcement learning algorithm LU - Q is proposed. By applying a transformation to the individually learned value functions, the constraints on the optimal value functions of each subtask are loosened. In each learning iteration process in algorithm LU - Q, the agents‘ joint action set in a state is processed. Some actions of that state are pruned from the available action set according to the defined multiagent value function in LU - Q. As the items of the available action set of each state are reduced gradually in the iteration process of LU - Q, the convergence of the value functions is accelerated. LU - Q‘s effectiveness, soundness and convergence are analyzed, and the experimental results show that the learning performance of LU-Q is better than the performance of standard Q learning.

关 键 词:计算机技术  专家系统  知识工程  评价函数

Multiagent reinforcement learning through merging individually learned value functions
ZHANG Hua-xiang,HUANG Shang-teng.Multiagent reinforcement learning through merging individually learned value functions[J].Journal of Harbin Institute of Technology,2005,12(3):346-350.
Authors:ZHANG Hua-xiang  HUANG Shang-teng
Affiliation:1. Information and Management School, Shandong Normal University, Jinan 250014,China
2. Dept. of Computer Science and Engineering, Shanghai Jiaotong University, Shanghai 200030, China
Abstract:In cooperative multiagent systems, to learn the optimal policies of multiagents is very difficult. As the numbers of states and actions increase exponentially with the number of agents, their action policies become more intractable. By learning these value functions, an agent can learn its optimal action policies for a task. If a task can be decomposed into several subtasks and the agents have learned the optimal value functions for each subtask, this knowledge can be helpful for the agents in learning the optimal action policies for the whole task when they are acting simultaneously. When merging the agents' independently learned optimal value functions,a novel multiagent online reinforcement learning algorithm LU-Q is proposed. By applying a transformation to the individually learned value functions, the constraints on the optimal value functions of each subtask are loosened. In each learning iteration process in algorithm LU-Q, the agents ' joint action set in a state is processed. Some actions of that state are pruned from the available action set according to the defined multiagent value function in LU-Q. As the items of the available action set of each state are reduced gradually in the iteration process of LU-Q, the convergence of the value functions is accelerated. LU-Q's effectiveness, soundness and convergence are analyzed, and the experimental results show that the learning performance of LU-Q is better than the performance of standard Q learning.
Keywords:reinforcement learning  multiagent  value function
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号