基于策略迭代和值迭代的POMDP算法 A Policy-and Value-Iteration Algorithm for POMDP期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于策略迭代和值迭代的POMDP算法

引用本文：	孙湧,仵博,冯延蓬.基于策略迭代和值迭代的POMDP算法[J].计算机研究与发展,2008,45(10).

作者姓名：	孙湧仵博冯延蓬

作者单位：	深圳职业技术学院电子与信息工程学院,广东深圳,518055

摘要：	部分可观察Markov决策过程是通过引入信念状态空间将非Markov链问题转化为Markov链问题来求解,其描述真实世界的特性使它成为研究随机决策过程的重要分支.介绍了部分可观察Markov决策过程的基本原理和决策过程,提出一种基于策略迭代和值迭代的部分可观察Markov决策算法,该算法利用线性规划和动态规划的思想,解决当信念状态空间较大时出现的"维数灾"问题,得到Markov决策的逼近最优解.实验数据表明该算法是可行的和有效的.
关键词：	部分可观察Markov决策决策算法智能体值迭代策略迭代
A Policy-and Value-Iteration Algorithm for POMDP

Sun Yong,Wu Bo,Feng Yanpeng.A Policy-and Value-Iteration Algorithm for POMDP[J].Journal of Computer Research and Development,2008,45(10).

Authors:	Sun Yong Wu Bo Feng Yanpeng

Affiliation:	Sun Yong,Wu Bo,, Feng Yanpeng(School of Electronics & Information Engineering,Shenzhen Polytechnic,Shenzhen,Guangdong 518055)

Abstract:	Partially observable Markov decision processes(POMDP)changes the non-Markovian into Markovian over the belief state space.It has been an important branch of stochastic decision processes for its characteristics of describing the real world.Tradional algorithms to solve POMPDs are value iteration algorithm and policy iteration algorithm.However,the complexity of exact solution algorithms for such POMDPs are typically computationally intractable for all but the smallest problems.At first,the authors describe ...

Keywords:	POMDP decision algorithm agent value iteration policy iteration
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏