求解POMDP的动态合并激励学习算法 Dynamic Merge Reinforcement Learning Algorithm for Solving POMDP期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

求解POMDP的动态合并激励学习算法

引用本文：	殷苌茗,王汉兴,陈焕文,谢丽娟.求解POMDP的动态合并激励学习算法[J].计算机工程,2005,31(22):4-6,148.

作者姓名：	殷苌茗王汉兴陈焕文谢丽娟

作者单位：	上海大学理学院,上海200436;长沙理工大学计算机与通信工程学院,长沙,410077;上海大学理学院,上海200436;长沙理工大学计算机与通信工程学院,长沙,410077

基金项目：	国家自然科学基金资助项目（60075019）

摘要：	把POMDP作为激励学习（Reinforcement Leaming）问题的模型，对于具有大状态空间问题的求解有比较好的适应性和有效性。但由于其求解的难度远远地超过了一般的Markov决策过程（MDP）的求解，因此还有许多问题有待解决。该文基于这样的背景，在给定一些特殊的约束条件下提出的一种求解POMDP的方法，即求解POMDP的动态合并激励学习算法。该方法利用区域的概念，在环境状态空间上建立一个区域系统，Agent在区域系统的每个区域上独自并行地实现其最优目标，加快了运算速度。然后把各组成部分的最优值函数按一定的方式整合，最后得出POMDP的最优解。
关键词：	部分可观测Markov决策过程激励学习动态合并信度状态
文章编号：	1000-3428（2005）22-0004-03
收稿时间：	2004-08-25
修稿时间：	2004-08-25
Dynamic Merge Reinforcement Learning Algorithm for Solving POMDP

YIN Changming,WANG Hanxing,CHEN Huanwen,XIE Lijuan.Dynamic Merge Reinforcement Learning Algorithm for Solving POMDP[J].Computer Engineering,2005,31(22):4-6,148.

Authors:	YIN Changming WANG Hanxing CHEN Huanwen XIE Lijuan

Affiliation:	1. College of Science, Shanghai University, Shanghai 200436; 2.College of Computer and Communication Engineering, Changsha University of Science and Technology, Changsha 410077

Abstract:	This paper advances a new algorithm for solving a POMDP with some restriction conditions, which is the dynamic merge reinforcement learning method for solving a POMDE This algorithm approves the conception of regions and then the paper sets up a regional system on state space of environment. The agent searches its optimal sub-goal separately at each region in regional system using parallel method, for the sake of speeding up the computations over this algorithm, and then merges these optimal solutions on each region to get a global optimal solution for this POMDP.

Keywords:	Partially observable Markov decision process Reinforcement learning Dynamic merge Belief state
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏