首页 | 本学科首页   官方微博 | 高级检索  
     

贝叶斯学习与强化学习结合技术的研究
引用本文:陈飞,王本年,高阳,陈兆乾,陈世福.贝叶斯学习与强化学习结合技术的研究[J].计算机科学,2006,33(2):173-177.
作者姓名:陈飞  王本年  高阳  陈兆乾  陈世福
作者单位:南京大学计算机软件新技术国家重点实验室,南京,210093
基金项目:中国科学院资助项目;科技部科研项目;江苏省自然科学基金
摘    要:强化学习的研究需要解决的重要难点之一是:探索未知的动作和采用已知的最优动作之间的平衡。贝叶斯学习是一种基于已知的概率分布和观察到的数据进行推理,做出最优决策的概率手段。因此,把强化学习和贝叶斯学习相结合,使 Agent 可以根据已有的经验和新学到的知识来选择采用何种策略:探索未知的动作还是采用已知的最优动作。本文分别介绍了单 Agent 贝叶斯强化学习方法和多 Agent 贝叶斯强化学习方法:单 Agent 贝叶斯强化学习包括贝叶斯 Q 学习、贝叶斯模型学习以及贝叶斯动态规划等;多 Agent 贝叶斯强化学习包括贝叶斯模仿模型、贝叶斯协同方法以及在不确定下联合形成的贝叶斯学习等。最后,提出了贝叶斯在强化学习中进一步需要解决的问题。

关 键 词:贝叶斯学习  强化学习  单Agent  多Agent

Research on the Combihation of Bayesian Learning and Reinforcement Learning
CHEN Fei,WANG Ben-Nian,GAO Yang,CHEN Zhao-Qian,CHEN Shi-Fu.Research on the Combihation of Bayesian Learning and Reinforcement Learning[J].Computer Science,2006,33(2):173-177.
Authors:CHEN Fei  WANG Ben-Nian  GAO Yang  CHEN Zhao-Qian  CHEN Shi-Fu
Affiliation:State Key Lab. for Novel Software Technology,Department of Science and Technology,Nanjing University,Nanjing 210093
Abstract:A central problem in reinforcement learning is balancing exploration of untested actions against exploitation of actions that are known to be good.Bayesian learning is a probability method that makes optimal decision based on known probability distribution and recently observed data.So combination of Bayesian learning and reinforcement learning the agent can choose the strategy of exploration or exploitation based on its own experience and newly incoming knowledge.In this paper,we introduce single-agent Bayesian reinforcement learning and multi-agent Bayesian reinforce- ment learning.Single-agent Bayesian reinforcement learning includes Bayesian Q-learning,model-based Bayesian learn ing and Bayesian DP,and muhi-agent Bayesian reinforcement learning includes Bayesian imitation,Bayesian coordina- tion and Bayesian reinforcement learning for coalition formation under uncertainty.At last,some unsolved problems in Bayesian reinforcement learning are discussed.
Keywords:Bayesian learning  Reinforcement learning  Single-agent  Multi-agent
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《计算机科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号