基于循环卷积神经网络的POMDP值迭代算法 Value Iteration Algorithm for POMDP Based on Recurrent Convolutional Neural Network期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于循环卷积神经网络的POMDP值迭代算法

引用本文：	于丹宁,倪坤,刘云龙.基于循环卷积神经网络的POMDP值迭代算法[J].计算机工程,2021,47(2):90-94,102.

作者姓名：	于丹宁倪坤刘云龙

作者单位：	厦门大学航空航天学院, 福建厦门 361102

摘要：	基于卷积神经网络的部分可观测马尔科夫决策过程（POMDP）值迭代算法QMDP-net在无先验知识的情况下具有较好的性能表现，但其存在训练效果不稳定、参数敏感等优化难题。提出基于循环卷积神经网络的POMDP值迭代算法RQMDP-net，使用门控循环单元网络实现值迭代更新，在保留输入和递归权重矩阵卷积特性的同时增强网络时序处理能力。实验结果表明，RQMDP-net在10×10网格地图规划任务中导航准确率高达98.5%，且在36×36网格地图规划任务中相比QMDP-net最多提升5.8个百分点，具有更快的网络收敛速度和更强的导航任务规划能力。
关键词：	部分可观测马尔科夫决策过程值迭代卷积神经网络循环卷积神经网络智能体规划
收稿时间：	2019-12-25
修稿时间：	2020-02-04
Value Iteration Algorithm for POMDP Based on Recurrent Convolutional Neural Network

YU Danning,NI Kun,LIU Yunlong.Value Iteration Algorithm for POMDP Based on Recurrent Convolutional Neural Network[J].Computer Engineering,2021,47(2):90-94,102.

Authors:	YU Danning NI Kun LIU Yunlong

Affiliation:	School of Aerospace Engineering, Xiamen University, Xiamen, Fujian 361102, China

Abstract:	The value iteration algorithm,QMDP-net,for Partially Observable Markov Decision Process(POMDP) based on Convolutional Neural Network(CNN)performs well in cases of no prior knowledge.However,it often suffers from instable training results,sensitive parameter and other optimization problems. For these problems,this paper proposes a value iteration algorithm called RQMDP-net for POMDP based on Recurrent Convolutional Neural Network(RCNN).The update of value iteration is realized by using Gated Recurrent Unit(GRU),which keeps the input and convolution features of the recursive weight matrix,and enhances the sequential processing ability of the network.Experimental results show that the navigation accuracy of RQMDP-net for10×10 planning tasks in the grid map reaches98.5%,and is up to5.8 percentage points higher than that of QMDP-net for36×36 planning tasks in the grid map,which demonstrates that RQMDP-net has a higher network convergence speed and better planning ability in navigation tasks.

Keywords:	Partially Observable Markov Decision Process(POMDP) value iteration Convolutional Neural Network(CNN) Recurrent Convolutional Neural Network(RCNN) agent planning
本文献已被维普万方数据等数据库收录！
	点击此处可从《计算机工程》浏览原始摘要信息
	点击此处可从《计算机工程》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏