首页 | 本学科首页   官方微博 | 高级检索  
     

面向多智能体协作的注意力意图与交流学习方法
引用本文:俞文武, 杨晓亚, 李海昌, 王瑞, 胡晓惠. 面向多智能体协作的注意力意图与交流学习方法. 自动化学报, 2023, 49(11): 2311−2325 doi: 10.16383/j.aas.c210430
作者姓名:俞文武  杨晓亚  李海昌  王瑞  胡晓惠
作者单位:1.中国科学院软件研究所天基综合信息系统重点实验室 北京 100190;;2.中国科学院大学 北京 100049
基金项目:国家重点研发计划(2019YFB1405100), 国家自然科学基金(61802380, 61802016)资助
摘    要:对于部分可观测环境下的多智能体交流协作任务, 现有研究大多只利用了当前时刻的网络隐藏层信息, 限制了信息的来源. 研究如何使用团队奖励训练一组独立的策略以及如何提升独立策略的协同表现, 提出多智能体注意力意图交流算法(Multi-agent attentional intention and communication, MAAIC), 增加了意图信息模块来扩大交流信息的来源, 并且改善了交流模式. 将智能体历史上表现最优的网络作为意图网络, 且从中提取策略意图信息, 按时间顺序保留成一个向量, 最后结合注意力机制推断出更为有效的交流信息. 在星际争霸环境中, 通过实验对比分析, 验证了该算法的有效性.

关 键 词:多智能体   强化学习   意图交流   注意力机制
收稿时间:2021-05-18

Attentional Intention and Communication for Multi-agent Learning
Yu Wen-Wu, Yang Xiao-Ya, Li Hai-Chang, Wang Rui, Hu Xiao-Hui. Attentional intention and communication for multi-agent learning. Acta Automatica Sinica, 2023, 49(11): 2311−2325 doi: 10.16383/j.aas.c210430
Authors:YU Wen-Wu  YANG Xiao-Ya  LI Hai-Chang  WANG Rui  HU Xiao-Hui
Affiliation:1. Science and Technology on Integrated Information System Laboratory, Institute of Software, Chinese Academy of Sciences, Beijing 100190;;2. University of Chinese Academy of Sciences, Beijing 100049
Abstract:For multi-agent communication and cooperation tasks in partially observable environments, most of the existing studies only use the information of the hidden layer of the network at the current time, which limits the source of information. This paper studies how to use team rewards to train a set of independent policies and how to improve the collaborative performance of independent policies. A multi-agent attentional intention communication (MAAIC) algorithm is proposed to improve the communication mode, and an intention information module is added to expand the source of communication information. The network with the best performance in the history of an agent is taken as the intention network, from which the policy intention information is extracted. The historical intention information of the agent that performs best at all times is retained as a vector in chronological order, and combined with the attention mechanism and current observation history information to extract more effective information as input for decision-making. The effectiveness of the algorithm is verified by experimental comparison and analysis on StarCraft multi-agent challenge.
Keywords:Multi-agent  reinforcement learning  intention communication  attention mechanism
点击此处可从《自动化学报》浏览原始摘要信息
点击此处可从《自动化学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号