首页 | 本学科首页   官方微博 | 高级检索  
     

基于采样的POMDP近似算法
引用本文:陈茂,陈小平.基于采样的POMDP近似算法[J].计算机仿真,2006,23(5):64-67.
作者姓名:陈茂  陈小平
作者单位:中国科学技术大学计算机系,安徽,合肥,230027
摘    要:部分可观察马尔科夫决策过程(POMDP)是一种描述机器人在动态不确定环境下行动选择的问题模型。对于具有稀疏转移矩阵的POMDP问题模型,该文提出了一种求解该问题模型的快速近似算法。该算法首先利用QMDP算法产生的策略进行信念空间采样,并通过点迭代算法快速生成POMDP值函数,从而产生近似的最优行动选择策略。在相同的POMDP试验模型上,执行该算法产生的策略得到的回报值与执行其他近似算法产生的策略得到的回报值相当,但该算法计算速度快,它产生的策略表示向量集合小于现有其他近似算法产生的集合。因此,它比这些近似算法更适应于大规模的稀疏状态转移矩阵POMDP模型求解计算。

关 键 词:决策  采样  近似  算法
文章编号:1006-9348(2006)05-0064-04
收稿时间:2005-02-22
修稿时间:2005-02-22

Sampling Based Approximate Algorithm for POMDP
CHEN Mao,CHEN Xiao-ping.Sampling Based Approximate Algorithm for POMDP[J].Computer Simulation,2006,23(5):64-67.
Authors:CHEN Mao  CHEN Xiao-ping
Affiliation:USTC, Computer Science Department, Hefei, Anhui 230026, China
Abstract:Partially observable Markov decision procedure is a kind of problem model which describes the continuous decision making for robot within dynamic uncertain environment. This paper introduces a fast approximate algorithm for special POMDP models which have sparse state transmit matrix. First, this algorithm makes use of the policy from QMDP approximate algorithm for sampling. Then it can use these samples with point based iteration algorithm to create the value function for POMDP. Finally, the optimal policy for action choosing will be generated from the value function. In the same experiment model, the policy generated by this algorithm will make the reward as much as other algorithms. But this algofitm can run faster than others , and can generate a smaller vector set to represent the policy. So, it is more suitable for solving large POMDPs with sparse state transmit matrix than other approximate algorithms.
Keywords:Decision  Sampling  Approximate  Algorithm
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号