基于Markov对策的多Agent强化学习模型及算法研究 RESEARCH ON MARKOV GAME-BASED MULTIAGENT REIN-FORCEMENT LEARNING MODEL AND ALGORITHMS期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于Markov对策的多Agent强化学习模型及算法研究

引用本文：	高阳,周志华,何佳洲,陈世福.基于Markov对策的多Agent强化学习模型及算法研究[J].计算机研究与发展,2000,37(3):257-263.

作者姓名：	高阳周志华何佳洲陈世福

作者单位：	南京大学计算机软件新技术国家重点实验室,南京,210093

基金项目：	本课题得到国家自然科学基金（项目编号69775019）和国家博士点基金（项目编号97028428）资助.

摘要：	在ＭＤＰ，单Ａｇｅｎｔ可以通过强化学习来寻找问题的最优解。但在多Ａｇｅｎｔ系统中，ＭＤＰ模型不再适用。同样极小极大Ｑ算法只能解决采用零和对策模型的ＭＡＳ学习问题。文中采用非零和Ｍａｒｋｏｖ对策作为多Ａｇｅｎｔ系统学习框架，并提出元对策强化学习的学习模型和元对策Ｑ算法。理论证明元对策Ｑ算法收敛在非零和Ｍａｒｋｏｖ对策的元对策最优解。
关键词：	元对策强化学习多Agent系统人工智能
RESEARCH ON MARKOV GAME-BASED MULTIAGENT REIN-FORCEMENT LEARNING MODEL AND ALGORITHMS

GAO Yang,ZHOU Zhi-Hua,HE Jia-Zhou,CHEN Shi-Fu.RESEARCH ON MARKOV GAME-BASED MULTIAGENT REIN-FORCEMENT LEARNING MODEL AND ALGORITHMS[J].Journal of Computer Research and Development,2000,37(3):257-263.

Authors:	GAO Yang ZHOU Zhi-Hua HE Jia-Zhou CHEN Shi-Fu

Abstract:	In Markov decision process, a single agent could find the optimal policy of the problem by reinforcement learning. But the model of the MDP doesn't adapt to the multi-agent system. And the minmax-Q learning algorithm could only solve the problem of zero-sum Markov games. In this paper, the non-zero-sum Markov games are adopted as a framework for multi-agent reinforcement learning, and the learning model and learning algorithms of the metagame reinforcement learning are brought forward. It is proved that this metagame-Q algorithms must converge at the most optimal value of the non-zero-game Markov games.

Keywords:	metagame reinforcement learning multi-agent system non-zero-sum Markov games
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏