首页 | 本学科首页   官方微博 | 高级检索  
     

部分可观测下基于RGMAAC算法的多智能体协同
引用本文:王子豪,张严心,黄志清,殷辰堃.部分可观测下基于RGMAAC算法的多智能体协同[J].控制与决策,2023,38(5):1267-1277.
作者姓名:王子豪  张严心  黄志清  殷辰堃
作者单位:北京交通大学 电子信息工程学院,北京 100091;北京工业大学 信息学部,北京 100124
摘    要:多智能体深度强化学习(MADRL)将深度强化学习的思想和算法应用到多智能体系统的学习和控制中,是开发具有群智能体的多智能体系统的重要方法.现有的MADRL研究主要基于环境完全可观测或通信资源不受限的假设展开算法设计,然而部分可观测性是多智能体系统实际应用中客观存在的问题,例如智能体的观测范围通常是有限的,可观测的范围外不包括完整的环境信息,从而对多智能体间协同造成困难.鉴于此,针对实际场景中的部分可观测问题,基于集中式训练分布式执行的范式,将深度强化学习算法Actor-Critic扩展到多智能体系统,并增加智能体间的通信信道和门控机制,提出recurrent gated multi-agent Actor-Critic算法(RGMAAC).智能体可以基于历史动作观测记忆序列进行高效的通信交流,最终利用局部观测、历史观测记忆序列以及通过通信信道显式地由其他智能体共享的观察进行行为决策;同时,基于多智能体粒子环境设计多智能体同步且快速到达目标点任务,并分别设计2种奖励值函数和任务场景.实验结果表明,当任务场景中明确出现部分可观测问题时,RGMAAC算法训练后的智能体具有很好的表现,在稳定性...

关 键 词:多智能体  深度强化学习  部分可观测  多智能体深度确定性策略梯度  智能体间通信

Multi-agent collaboration based on RGMAAC algorithm under partial observability
WANG Zi-hao,ZHANG Yan-xin,HUANG Zhi-qing,YIN Chen-kun.Multi-agent collaboration based on RGMAAC algorithm under partial observability[J].Control and Decision,2023,38(5):1267-1277.
Authors:WANG Zi-hao  ZHANG Yan-xin  HUANG Zhi-qing  YIN Chen-kun
Affiliation:School of Electronic and Information Engineering,Beijing Jiaotong University,Beijing 100091,China;Department of Information Science,Beijing University of Technology,Beijing 100124,China
Abstract:Multi-agent deep reinforcement learning(MADRL) applies the ideas and algorithms of deep reinforcement learning to the learning and control of multi-agent systems, which is an important method to develop multi-agent systems with swarm agents. Existing MADRL studies mainly design algorithms based on the assumption that the environment is completely observable or communication resources are not limited. However, partial observability is an objective problem in the practical application of multi-agent systems. For example, the observation range of agentsis is usually limited, and the complete environmental information is not included outside the observable range, which makes it difficult for multi-agent collaboration. Aiming at the problem of partial observability in real scenes, based on the paradigm of centralized training and distributed execution, this paper extends the deep reinforcement learning algorithm Actor-Critic to multi-agent systems and adds communication channels and gating mechanisms between agents, finally proposes a recurrent gated multi-agent Actor-Critic (RGMAAC) algorithm. Agents can communicate efficiently based on the historical action observation sequence, and finally use the local observation, the historical observation sequence and observations shared by other agents through communication channels to make behavior decisions. Meanwhile, based on the multi-agent particle environment, the multi-agent task of synchronous and fast arrival is designed, and two reward value functions and task scenarios are designed respectively. The experimental results show that the trained agent with the RGMAAC algorithm performs well and is superior to the baseline algorithm in terms of stability when some observable problems clearly appear in the task scenario.
Keywords:
点击此处可从《控制与决策》浏览原始摘要信息
点击此处可从《控制与决策》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号