首页 | 本学科首页   官方微博 | 高级检索  
     

基于GAED-MADDPG多智能体强化学习的协作策略研究
引用本文:邹长杰,郑皎凌.基于GAED-MADDPG多智能体强化学习的协作策略研究[J].计算机应用研究,2020,37(12):3656-3661.
作者姓名:邹长杰  郑皎凌
作者单位:成都信息工程大学 软件工程学院,成都610225;成都信息工程大学 软件工程学院,成都610225;成都信息工程大学 软件工程学院,成都610225
摘    要:目前多智能体强化学习算法多采用集中学习,分散行动的框架。该框架存在算法收敛时间过长和可能无法收敛的问题。为了加快多智能体的集体学习时间,提出多智能体分组学习策略。通过使用循环神经网络预测出多智能体的分组矩阵,通过在分组内部共享智能体之间经验的机制,提高了多智能体的团队学习效率;同时,为了弥补分组带来的智能体无法共享信息的问题,提出了信息微量的概念在所有智能体之间传递部分全局信息;为了加强分组内部优秀经验的留存,提出了推迟组内优秀智能体死亡时间的生灭过程。最后,在迷宫实验中,训练时间比MADDPG减少12%;夺旗实验中,训练时间比MADDPG减少17%。

关 键 词:强化学习  群体协作  深度学习  群体智慧
收稿时间:2019/9/30 0:00:00
修稿时间:2020/11/1 0:00:00

Research on collaborative strategy based on GAED-MADDPG multi-agent reinforcement learning
Affiliation:zouchangjie,
Abstract:At present, multi-agent reinforcement learning algorithms mostly adopt frameworks that are centralized in learning and decentralized in action. These frameworks may take too long to converge or may not converge at all. In order to speed up the collective learning time of multi-agents, this paper proposed a novel multi-agent group learning strategy. It used recurrent neural networks(RNNs) to predict the grouping matrix of multi-agents to share the experience between them, resulting in improved learning efficiency within the multi-agents group. Meanwhile, this paper proposed the concept of information trace to remedy the problem that the agents could not share information brought by the grouping. In order to strengthen the retention of excellent experience within the group, this paper proposed the practice of delaying the death time of excellent agents in the group. Finally, the results show that, compared to MADDPG, the training time is reduced by 12% in the labyrinth experiment and by 17% in capture the flag experiment.
Keywords:reinforcement learning  group collaboration  deep learning  group wisdom
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号