首页 | 本学科首页   官方微博 | 高级检索  
     

基于蒙特卡洛Q值函数的多智能体决策方法
引用本文:张健,潘耀宗,杨海涛,孙舒,赵洪利.基于蒙特卡洛Q值函数的多智能体决策方法[J].控制与决策,2020,35(3):637-644.
作者姓名:张健  潘耀宗  杨海涛  孙舒  赵洪利
作者单位:中国人民解放军战略支援部队航天工程大学,北京101416;中国人民解放军63628部队,河北三河065201
摘    要:多智能体决策问题是人工智能领域的研究热点.与单智能体决策问题相比,多智能体决策的策略搜索空间更大.分布式局部感知马尔可夫决策过程(Dec-POMDPs)建立了不确定环境下多智能体决策问题的通用模型,自提出以来受到很大关注,但是求解Dec-POMDPs问题计算复杂度高,内存占用大.基于此,提出一种新的Q值函数表示-----蒙特卡洛Q值函数$(Q_MC)$,并从理论上证明$Q_MC$是最优Q值函数$Q^\ast$的上界,能够保证启发式搜索到最优解;运用自适应抽样方法,平衡收敛准确性和求解时间的关系;结合启发式搜索的精确性和蒙特卡洛方法随机抽样的一般性,提出一种基于$Q_MC$的蒙特卡洛聚类/扩展算法(CEMC),CEMC整合了Q值函数求解和策略搜索过程,避免保存所有值函数,只按需求解.实验结果表明,CEMC在时间和内存占用上超过目前性能最好的使用紧凑Q值函数的启发式方法.

关 键 词:多智能体决策  蒙特卡洛  值函数  马尔可夫决策

Multi-agent decision making using Monte Carlo Q-value function
ZHANG Jian,PAN Yao-zong,YANG Hai-tao,SUN Shu and ZHAO Hong-li.Multi-agent decision making using Monte Carlo Q-value function[J].Control and Decision,2020,35(3):637-644.
Authors:ZHANG Jian  PAN Yao-zong  YANG Hai-tao  SUN Shu and ZHAO Hong-li
Affiliation:Space Engineering University,PAL Strategic Support Force,Beijing101416,China,Space Engineering University,PAL Strategic Support Force,Beijing101416,China,Space Engineering University,PAL Strategic Support Force,Beijing101416,China,The 63628 Army of PLA,Sanhe065201,China and Space Engineering University,PAL Strategic Support Force,Beijing101416,China
Abstract:Multi-agent decision making problems are very popular in artificial intelligence. Compared with single agent decision making problems, multi-agent decision making problems have larger policy space. Decentralized partially observable Markov decision processes(Dec-POMDPs) are general models for multi-agent decision making under uncertainty, which have caught much attention among researchers. Solving Dec-POMDPs has high computational complexity and takes much memory. This article presents a new Q-value function representation --- Monte Carlo Q-value function$(Q_MC)$, which is proved to be the upper bound of $Q^*$. This guarantees that the optimal policy can be found. An adaptive sampling method is used to balance the precision of convergence and solving time. And an algorighm called clustering and expansion for Monte Carlo(CEMC) based on $Q_MC$ is proposed, which combines the precision of heuristic search with the generality of Monte Carlo random sampling. This algorithm integrates Q-value function solving with policy search and calculates value functions as needed, which avoids the need to backup all Q-value functions. The experiments show that the proposed method outperforms the state-of-the-art heuristic methods, with the compact Q-value function.
Keywords:
本文献已被 万方数据 等数据库收录!
点击此处可从《控制与决策》浏览原始摘要信息
点击此处可从《控制与决策》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号