首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于Markov对策的多Agent强化学习模型及算法研究
引用本文:高阳,周志华,何佳洲,陈世福.基于Markov对策的多Agent强化学习模型及算法研究[J].计算机研究与发展,2000,37(3):257-263.
作者姓名:高阳  周志华  何佳洲  陈世福
作者单位:南京大学计算机软件新技术国家重点实验室,南京,210093
基金项目:本课题得到国家自然科学基金(项目编号69775019)和国家博士点基金(项目编号97028428)资助.
摘    要:在MDP,单Agent可以通过强化学习来寻找问题的最优解。但在多Agent系统中,MDP模型不再适用。同样极小极大Q算法只能解决采用零和对策模型的MAS学习问题。文中采用非零和Markov对策作为多Agent系统学习框架,并提出元对策强化学习的学习模型和元对策Q算法。理论证明元对策Q算法收敛在非零和Markov对策的元对策最优解。

关 键 词:元对策  强化学习  多Agent系统  人工智能

RESEARCH ON MARKOV GAME-BASED MULTIAGENT REIN-FORCEMENT LEARNING MODEL AND ALGORITHMS
GAO Yang,ZHOU Zhi-Hua,HE Jia-Zhou,CHEN Shi-Fu.RESEARCH ON MARKOV GAME-BASED MULTIAGENT REIN-FORCEMENT LEARNING MODEL AND ALGORITHMS[J].Journal of Computer Research and Development,2000,37(3):257-263.
Authors:GAO Yang  ZHOU Zhi-Hua  HE Jia-Zhou  CHEN Shi-Fu
Abstract:In Markov decision process, a single agent could find the optimal policy of the problem by reinforcement learning. But the model of the MDP doesn't adapt to the multi-agent system. And the minmax-Q learning algorithm could only solve the problem of zero-sum Markov games. In this paper, the non-zero-sum Markov games are adopted as a framework for multi-agent reinforcement learning, and the learning model and learning algorithms of the metagame reinforcement learning are brought forward. It is proved that this metagame-Q algorithms must converge at the most optimal value of the non-zero-game Markov games.
Keywords:metagame  reinforcement learning  multi-agent system  non-zero-sum Markov games
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号