首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于角色跟踪的群体Agent再励学习算法
引用本文:张双民,石纯一.一种基于角色跟踪的群体Agent再励学习算法[J].计算机研究与发展,2005,42(2):203-209.
作者姓名:张双民  石纯一
作者单位:清华大学计算机科学与技术系,北京,100084
基金项目:国家自然科学基金项目 (60 173 0 11),国家“八六三”高技术研究发展计划基金项目 (2 0 0 1AA113 12 0 )
摘    要:在多Agent系统中,通过学习可以使Agent不断增加和强化已有的知识与能力,并选择合理的动作最大化自己的利益.但目前有关Agent学习大都限于单Agent模式,或仅考虑Agent个体之间的对抗,没有考虑Agent的群体对抗,没有考虑Agent在团队中的角色,完全依赖对效用的感知来判断对手的策略,导致算法的收敛速度不高.因此,将单Agent学习推广到在非通信群体对抗环境下的群体Agent学习.考虑不同学习问题的特殊性,在学习模型中加入了角色属性,提出一种基于角色跟踪的群体Agent再励学习算法,并进行了实验分析.在学习过程中动态跟踪对手角色,并根据对手角色与其行为的匹配度动态决定学习速率,利用minmax-Q算法修正每个状态的效用值,最终加快学习的收敛速度,从而改进了Bowling和Littman等人的工作.

关 键 词:MAS  再励学习  角色匹配  群体对抗  学习速率

A Multi-Agent Reinforcement Learning Method Based on Role Tracking
Zhang Shuangmin,Shi Chunyi.A Multi-Agent Reinforcement Learning Method Based on Role Tracking[J].Journal of Computer Research and Development,2005,42(2):203-209.
Authors:Zhang Shuangmin  Shi Chunyi
Abstract:In a multi-agent system, agent can add and improve his knowledge and capability continuously by learning, and then chooses the reasonable action to maximize his benefit. However, current works are mostly restricted to single agent mode, or only confrontations between two single agents are considered and group confrontations are not considered. Current works do not consider agent roles in agent group; they estimate policy of opponents only by benefits through observation, which makes algorithm converge to some static policy slowly. For these limitations, single agent learning process is extended to group agent learning process in incommunicable environment for group confrontation. Considering the particularity among different learning problems, a reinforcement learning algorithm based on role tracking and experimental analysis is given. Role concept is added in the learning model by tracking the opponents' roles dynamically in learning process. The learning rate is determined by computing the matching value of opponents' roles with their actions, and finally the benefit value of every state is updated by using minmax-Q algorithm so as to fast the speed of convergence, which improves the work of Bowling and Littman.
Keywords:MAS  reinforcement learning  role match  group confrontation  learning rate
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号