首页 | 本学科首页   官方微博 | 高级检索  
     

多Agent深度强化学习综述
引用本文:梁星星,冯旸赫,马扬,程光权,黄金才,王琦,周玉珍,刘忠.多Agent深度强化学习综述[J].自动化学报,2020,46(12):2537-2557.
作者姓名:梁星星  冯旸赫  马扬  程光权  黄金才  王琦  周玉珍  刘忠
作者单位:1.国防科技大学系统工程学院 长沙 410072
基金项目:国家自然科学基金71701205国家自然科学基金62073333
摘    要:近年来, 深度强化学习(Deep reinforcement learning, DRL)在诸多复杂序贯决策问题中取得巨大突破.由于融合了深度学习强大的表征能力和强化学习有效的策略搜索能力, 深度强化学习已经成为实现人工智能颇有前景的学习范式.然而, 深度强化学习在多Agent系统的研究与应用中, 仍存在诸多困难和挑战, 以StarCraft Ⅱ为代表的部分观测环境下的多Agent学习仍然很难达到理想效果.本文简要介绍了深度Q网络、深度策略梯度算法等为代表的深度强化学习算法和相关技术.同时, 从多Agent深度强化学习中通信过程的角度对现有的多Agent深度强化学习算法进行归纳, 将其归纳为全通信集中决策、全通信自主决策、欠通信自主决策3种主流形式.从训练架构、样本增强、鲁棒性以及对手建模等方面探讨了多Agent深度强化学习中的一些关键问题, 并分析了多Agent深度强化学习的研究热点和发展前景.

关 键 词:多Agent系统    深度学习    深度强化学习    通用人工智能
收稿时间:2018-06-04

Deep Multi-Agent Reinforcement Learning: A Survey
Affiliation:1.College of Systems Engineering, National University of Defense Technology, Changsha 410072
Abstract:Recent years has witnessed the great success of deep reinforcement learning (DRL) in addressing complicated problems, and it is widely used to capture plausible policies in sequential decision-making tasks. Recognized as a promising learning paradigm, the deep reinforcement learning takes advantage of the great power of representations in deep learning and superior capability of policy improvement in reinforcement learning, driving the development of artificial intelligence into a new era. Though the DRL has shown its great power in typical applications, the effective multi-agent DRL still needs further explorations, and a challenging task is to guide multi-agents to play StarCraft Ⅱ, where the environment is partially observed and dynamic. To enable DRL better accommodate the multi-agent environment and overcome challenges, we briefly introduced the foundation of reinforcement learning and then reviewed some representative or state-of-art algorithms of multi-agent DRL, including the deep Q learning algorithm, the deep policy gradient algorithm and related extensions. Meanwhile, some dominant approaches regarding making decisions for multi-agents were elaborated, and we categorized them into three mainstream classes from the aspect of stage of communication in DRL as full communication centralized learning, full communication decentralized learning and limited communication decentralized learning Finally, we discussed some key problems in multi-agent DRL tasks, such as training architecture, example enhancement, robust improvement, and opponent modeling, and highlighted future directions in this issue.
Keywords:
点击此处可从《自动化学报》浏览原始摘要信息
点击此处可从《自动化学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号