首页 | 本学科首页   官方微博 | 高级检索  
     

基于后悔值的多Agent冲突博弈强化学习模型
引用本文:肖 正,张世永.基于后悔值的多Agent冲突博弈强化学习模型[J].软件学报,2008,19(11):2957-2967.
作者姓名:肖 正  张世永
作者单位:复旦大学,计算机与信息技术系,上海,200433
摘    要:对于冲突博弈,研究了一种理性保守的行为选择方法,即最小化最坏情况下Agent的后悔值.在该方法下,Agent当前的行为策略在未来可能造成的损失最小,并且在没有任何其他Agent信息的条件下,能够得到Nash均衡混合策略.基于后悔值提出了多Agent复杂环境下冲突博弈的强化学习模型以及算法实现.该模型中通过引入交叉熵距离建立信念更新过程,进一步优化了冲突博弈时的行为选择策略.基于Markov重复博弈模型验证了算法的收敛性,分析了信念与最优策略的关系.此外,与MMDP(multi-agent markov decision process)下Q学习扩展算法相比,该算法在很大程度上减少了冲突发生的次数,增强了Agent行为的协调性,并且提高了系统的性能,有利于维持系统的稳定.

关 键 词:Markov对策  强化学习  冲突博弈  冲突消解
收稿时间:2007/6/28 0:00:00
修稿时间:2007/8/24 0:00:00

Reinforcement Learning Model Based on Regret for Multi-Agent Conflict Games
XIAO Zheng and ZHANG Shi-Yong.Reinforcement Learning Model Based on Regret for Multi-Agent Conflict Games[J].Journal of Software,2008,19(11):2957-2967.
Authors:XIAO Zheng and ZHANG Shi-Yong
Abstract:For conflict game,a rational but conservative action selection method is investigated,namely, minimizing regret function in the worst case.By this method the loss incurred possibly in future is the lowest under this very policy,and Nash equilibrium mixed policy is obtained without information about other agents.Based on regret,a reinforcement learning model and its algorithm for conflict game under multi-agent complex environment are put forward.This model also builds agents' belief updating process on the concept of cross entropy distance, which further optimizes action selection policy for conflict games.Based on Markov repeated game model,this paper demonstrates the convergence property of this algorithm,and analyzes the relationship between belief and optimal policy.Additionally,compared with extended Q-learning algorithm under MMDP (multi-agent markov decision process),the proposed algorithm decreases the number of conflicts dramatically,enhances coordination among agents,improves system performance,and helps to maintain system stability.
Keywords:Markov game  reinforcement learning  conflict game  conflict resolving
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号