首页 | 本学科首页   官方微博 | 高级检索  
     

基于循环卷积神经网络的POMDP值迭代算法
引用本文:于丹宁,倪坤,刘云龙.基于循环卷积神经网络的POMDP值迭代算法[J].计算机工程,2021,47(2):90-94,102.
作者姓名:于丹宁  倪坤  刘云龙
作者单位:厦门大学 航空航天学院, 福建 厦门 361102
摘    要:基于卷积神经网络的部分可观测马尔科夫决策过程(POMDP)值迭代算法QMDP-net在无先验知识的情况下具有较好的性能表现,但其存在训练效果不稳定、参数敏感等优化难题。提出基于循环卷积神经网络的POMDP值迭代算法RQMDP-net,使用门控循环单元网络实现值迭代更新,在保留输入和递归权重矩阵卷积特性的同时增强网络时序处理能力。实验结果表明,RQMDP-net在10×10网格地图规划任务中导航准确率高达98.5%,且在36×36网格地图规划任务中相比QMDP-net最多提升5.8个百分点,具有更快的网络收敛速度和更强的导航任务规划能力。

关 键 词:部分可观测马尔科夫决策过程  值迭代  卷积神经网络  循环卷积神经网络  智能体规划  
收稿时间:2019-12-25
修稿时间:2020-02-04

Value Iteration Algorithm for POMDP Based on Recurrent Convolutional Neural Network
YU Danning,NI Kun,LIU Yunlong.Value Iteration Algorithm for POMDP Based on Recurrent Convolutional Neural Network[J].Computer Engineering,2021,47(2):90-94,102.
Authors:YU Danning  NI Kun  LIU Yunlong
Affiliation:School of Aerospace Engineering, Xiamen University, Xiamen, Fujian 361102, China
Abstract:The value iteration algorithm,QMDP-net,for Partially Observable Markov Decision Process(POMDP) based on Convolutional Neural Network(CNN)performs well in cases of no prior knowledge.However,it often suffers from instable training results,sensitive parameter and other optimization problems. For these problems,this paper proposes a value iteration algorithm called RQMDP-net for POMDP based on Recurrent Convolutional Neural Network(RCNN).The update of value iteration is realized by using Gated Recurrent Unit(GRU),which keeps the input and convolution features of the recursive weight matrix,and enhances the sequential processing ability of the network.Experimental results show that the navigation accuracy of RQMDP-net for10×10 planning tasks in the grid map reaches98.5%,and is up to5.8 percentage points higher than that of QMDP-net for36×36 planning tasks in the grid map,which demonstrates that RQMDP-net has a higher network convergence speed and better planning ability in navigation tasks.
Keywords:Partially Observable Markov Decision Process(POMDP)  value iteration  Convolutional Neural Network(CNN)  Recurrent Convolutional Neural Network(RCNN)  agent planning
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《计算机工程》浏览原始摘要信息
点击此处可从《计算机工程》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号