基于蒙特卡洛Q值函数的多智能体决策方法 Multi-agent decision making using Monte Carlo Q-value function期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于蒙特卡洛Q值函数的多智能体决策方法

引用本文：	张健,潘耀宗,杨海涛,孙舒,赵洪利.基于蒙特卡洛Q值函数的多智能体决策方法[J].控制与决策,2020,35(3):637-644.

作者姓名：	张健潘耀宗杨海涛孙舒赵洪利

作者单位：	中国人民解放军战略支援部队航天工程大学,北京101416;中国人民解放军63628部队,河北三河065201

摘要：	多智能体决策问题是人工智能领域的研究热点.与单智能体决策问题相比,多智能体决策的策略搜索空间更大.分布式局部感知马尔可夫决策过程(Dec-POMDPs)建立了不确定环境下多智能体决策问题的通用模型,自提出以来受到很大关注,但是求解Dec-POMDPs问题计算复杂度高,内存占用大.基于此,提出一种新的Q值函数表示-----蒙特卡洛Q值函数$(Q_MC)$,并从理论上证明$Q_MC$是最优Q值函数$Q^\ast$的上界,能够保证启发式搜索到最优解;运用自适应抽样方法,平衡收敛准确性和求解时间的关系;结合启发式搜索的精确性和蒙特卡洛方法随机抽样的一般性,提出一种基于$Q_MC$的蒙特卡洛聚类/扩展算法(CEMC),CEMC整合了Q值函数求解和策略搜索过程,避免保存所有值函数,只按需求解.实验结果表明,CEMC在时间和内存占用上超过目前性能最好的使用紧凑Q值函数的启发式方法.
关键词：	多智能体决策蒙特卡洛值函数马尔可夫决策
Multi-agent decision making using Monte Carlo Q-value function

ZHANG Jian,PAN Yao-zong,YANG Hai-tao,SUN Shu and ZHAO Hong-li.Multi-agent decision making using Monte Carlo Q-value function[J].Control and Decision,2020,35(3):637-644.

Authors:	ZHANG Jian PAN Yao-zong YANG Hai-tao SUN Shu and ZHAO Hong-li

Affiliation:	Space Engineering University,PAL Strategic Support Force,Beijing101416,China,Space Engineering University,PAL Strategic Support Force,Beijing101416,China,Space Engineering University,PAL Strategic Support Force,Beijing101416,China,The 63628 Army of PLA,Sanhe065201,China and Space Engineering University,PAL Strategic Support Force,Beijing101416,China

Abstract:	Multi-agent decision making problems are very popular in artificial intelligence. Compared with single agent decision making problems, multi-agent decision making problems have larger policy space. Decentralized partially observable Markov decision processes(Dec-POMDPs) are general models for multi-agent decision making under uncertainty, which have caught much attention among researchers. Solving Dec-POMDPs has high computational complexity and takes much memory. This article presents a new Q-value function representation --- Monte Carlo Q-value function$(Q_MC)$, which is proved to be the upper bound of $Q^*$. This guarantees that the optimal policy can be found. An adaptive sampling method is used to balance the precision of convergence and solving time. And an algorighm called clustering and expansion for Monte Carlo(CEMC) based on $Q_MC$ is proposed, which combines the precision of heuristic search with the generality of Monte Carlo random sampling. This algorithm integrates Q-value function solving with policy search and calculates value functions as needed, which avoids the need to backup all Q-value functions. The experiments show that the proposed method outperforms the state-of-the-art heuristic methods, with the compact Q-value function.

Keywords:
本文献已被万方数据等数据库收录！
	点击此处可从《控制与决策》浏览原始摘要信息
	点击此处可从《控制与决策》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏