基于观测空间关系提取的多智能体强化学习 Multi-agent reinforcement learning based on observation relation extraction期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于观测空间关系提取的多智能体强化学习

引用本文：	许书卿,臧传治.基于观测空间关系提取的多智能体强化学习[J].计算机应用研究,2022,39(10).

作者姓名：	许书卿臧传治

作者单位：	中国科学院沈阳自动化研究所,沈阳工业大学

基金项目：	国家自然科学基金资助项目(92067205);辽宁省自然科学基金资助项目(2020-KF-11-02);机器人学国家重点实验室开放课题(2020-Z11)

摘要：	针对多智能体系统（multi-agent systems，MAS）中环境具有不稳定性、智能体决策相互影响所导致的策略学习困难的问题，提出了一种名为观测空间关系提取（observation relation extraction，ORE）的方法，该方法使用一个完全图来建模MAS中智能体观测空间不同部分之间的关系，并使用注意力机制来计算智能体观测空间不同部分之间关系的重要程度。通过将该方法应用在基于值分解的多智能体强化学习算法上，提出了基于观测空间关系提取的多智能体强化学习算法。在星际争霸微观场景（StarCraft multi-agent challenge，SMAC）上的实验结果表明，与原始算法相比，带有ORE结构的值分解多智能体算法在收敛速度和最终性能方面都有更好的性能。
关键词：	多智能体强化学习注意力机制观测空间
收稿时间：	2022/3/2 0:00:00
修稿时间：	2022/5/10 0:00:00
Multi-agent reinforcement learning based on observation relation extraction

Xu Shuqing and Zang Chuanzhi.Multi-agent reinforcement learning based on observation relation extraction[J].Application Research of Computers,2022,39(10).

Authors:	Xu Shuqing and Zang Chuanzhi

Affiliation:	Shenyang Institute of Automation, Chinese Academy of Sciences,

Abstract:	In order to overcome the challenges of policy learning in MAS, such as the unstable environment and the interaction of agent decisions, this paper proposed a method named ORE, which used a complete graph to model the relationship between different parts of each agent''s observation, and took advantage of the attention mechanism to calculate the importance of the relationship between different parts of each agent''s observation. By applying the above method to multi-agent reinforcement learning algorithms based on value decomposition, this paper proposed multi-agent reinforcement learning algorithms based on observation relation extraction. Experimental results on SMAC show the proposed algorithms with ORE leads to better performance than the original algorithms in terms of both convergence speed and final performance.

Keywords:	multi-agent reinforcement learning attention mechanism observation

	点击此处可从《计算机应用研究》浏览原始摘要信息
	点击此处可从《计算机应用研究》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏