一种基于独立任务的POMDP问题的解决方法<sub>*</sub> A solution based on Independent-Tasks POMDP problems期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

一种基于独立任务的POMDP问题的解决方法_*

引用本文：	房俊恒,朱斐,刘全,伏玉琛,凌兴宏.一种基于独立任务的POMDP问题的解决方法_*[J].计算机应用研究,2016,33(1).

作者姓名：	房俊恒朱斐刘全伏玉琛凌兴宏

作者单位：	苏州大学计算机科学与技术学院,苏州大学计算机科学与技术学院,苏州大学计算机科学与技术学院,苏州大学计算机科学与技术学院,苏州大学计算机科学与技术学院

基金项目：	国家自然科学基金资助项目(61103045,61272005,61272244, 61303108, 61373094)；江苏省自然科学基金资助项目(BK2012616)；江苏省高校自然科学研究项目资助(13KJB520020)；吉林大学符号计算与知识工程教育部重点实验室资助项目(93K172014K04)

摘要：	部分可观测马尔可夫决策过程(POMDP)是马尔可夫决策过程(MDP)的扩展。通常利用POMDPs来模拟在部分可观测的随机环境中决策的Agents。针对完整POMDP的求解方法扩展能力弱的问题,提出把一个多元的POMDP分解成一组受限制的POMDPs,然后分别独立地求解每个这样的模型,获得一个值函数并将这些受限制的POMDPs的值函数结合起来以便获得一个完整POMDP的策略。该方法主要阐述了识别与独立任务相关的状态变量的过程,以及如何构造一个被限制在一个单独任务上的模型。将该方法应用到两个不同规模的岩石采样问题中,实验结果表明,该方法能够获得很好的策略。
关键词：	POMDP 基于点的算法相互独立的任务多元的POMDP 受限制的POMDPs
收稿时间：	2014/9/18 0:00:00
修稿时间：	2015/11/23 0:00:00
A solution based on Independent-Tasks POMDP problems

FANG Jun-heng,ZHU Fei,LIU Quan,FU Yu-chen and LING Xing-hong.A solution based on Independent-Tasks POMDP problems[J].Application Research of Computers,2016,33(1).

Authors:	FANG Jun-heng ZHU Fei LIU Quan FU Yu-chen and LING Xing-hong

Affiliation:	Institute of Computer Science and Technology,Soochow University,Institute of Computer Science and Technology,Soochow University,Institute of Computer Science and Technology,Soochow University,,Institute of Computer Science and Technology,Soochow University

Abstract:	A partially observable Markov decision process (POMDP) is an extension of a Markov decision process (MDP). POMDPs are widely used to model agents acting in a stochastic environment under partial observability. Because the complete POMDP solvers have poor ability to scale up, We propose to decompose a factored POMDP into a set of restricted POMDPs and solve each such model independently, acquiring a value function. And then, the combination of the value functions of the restricted POMDPs is used to form a policy for the complete POMDP. In this paper, We mainly explain the process of identifying state variables that correspond to independent tasks, and how to create a model restricted to a single task. Using this method on RockSample domain with two different size, experiment results show that this method can gain a good policy.

Keywords:	POMDP point-based algorithms independent-tasks factored POMDP restricted POMDPs
本文献已被万方数据等数据库收录！
	点击此处可从《计算机应用研究》浏览原始摘要信息
	点击此处可从《计算机应用研究》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏