一种基于模型的可分解贝叶斯在线强化学习 Model-Based Factored Bayesian Online Reinforcement Learning期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

一种基于模型的可分解贝叶斯在线强化学习

引用本文：	仵博,郑红燕,冯延蓬,陈鑫.一种基于模型的可分解贝叶斯在线强化学习[J].电子学报,2014,42(7):1429-1434.

作者姓名：	仵博郑红燕冯延蓬陈鑫

作者单位：	1. 深圳职业技术学院教育技术与信息中心, 广东深圳 518055; 2. 中南大学信息科学与工程学院, 湖南长沙 410083; 3. 先进控制与智能自动化湖南省工程实验室, 湖南长沙 410083

基金项目：	国家自然科学基金(No ．61074058，No ．60874042)；深圳市自然科学基金

摘要：	针对贝叶斯强化学习中参数个数巨大，收敛速度慢，无法实现在线学习的问题，提出一种基于模型的可分解贝叶斯强化学习方法.首先，将学习参数进行可分解表示，降低学习参数的个数；然后，根据先验知识和观察数据采用贝叶斯方法来学习，最优化探索和利用二者之间的平衡关系；最后，采用基于点的贝叶斯强化学习方法实现学习过程的快速收敛，从而达到在线学习的目的.仿真结果表明该算法能够满足实时系统性能的要求.
关键词：	马尔可夫决策过程贝叶斯强化学习动态贝叶斯网路
收稿时间：	2013-08-30
Model-Based Factored Bayesian Online Reinforcement Learning

WU Bo,ZHENG Hong-yan,FENG Yan-peng,CHEN Xin.Model-Based Factored Bayesian Online Reinforcement Learning[J].Acta Electronica Sinica,2014,42(7):1429-1434.

Authors:	WU Bo ZHENG Hong-yan FENG Yan-peng CHEN Xin

Affiliation:	1. Education Technology and Information Center, Shenzhen Polytechnic, Shenzhen, Guangdong 518055, China; 2. School of Information Science and Engineering, Central South University, Changsha, Hunan 410083, China; 3. Hunan Engineering Laboratory for Advanced Control and Intelligent Automation, Changsha, Hunan 410083, China

Abstract:	Due to the enormous number of parameters and slow convergence which are the major obstacles for online learning in model-based Bayesian reinforcement learning,the paper presents a model-based factored Bayesian reinforcement learning approach.Firstly,factored representations are made to represent the dynamics with fewer parameters.Then,according to prior knowledge and observable data,this paper exploits model-based reinforcement learning to provide an elegant solution to the optimal exploration-exploitation tradeoff.Finally,a pointed-based Bayesian reinforcement learning approach is proposed to speed up the convergence to achieve online learning.The experimental results show that the proposed approach can approximate the underlying Bayesian reinforcement learning task well with guaranteed real-time performance.

Keywords:	Markov decision processes Bayesian reinforcement learning dynamic Bayesian networks
本文献已被 CNKI 等数据库收录！
	点击此处可从《电子学报》浏览原始摘要信息
	点击此处可从《电子学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏