首页 | 本学科首页   官方微博 | 高级检索  
     

基于强化学习的值迭代算法
引用本文:崔军晓,朱蒙婷,王海燕,章鹏,王辉.基于强化学习的值迭代算法[J].数字社区&智能家居,2014(11):7348-7350.
作者姓名:崔军晓  朱蒙婷  王海燕  章鹏  王辉
作者单位:苏州大学计算机科学与技术学院,江苏苏州215006
摘    要:强化学习(Reinforcement Learning)是学习环境状态到动作的一种映射,并且能够获得最大的奖赏信号。强化学习中有三种方法可以实现回报的最大化:值迭代、策略迭代、策略搜索。该文介绍了强化学习的原理、算法,并对有环境模型和无环境模型的离散空间值迭代算法进行研究,并且把该算法用于固定起点和随机起点的格子世界问题。实验结果表明,相比策略迭代算法,该算法收敛速度快,实验精度好。

关 键 词:强化学习  值迭代  格子世界

Value Iteration Algorithm Based on Reinforcement Learning
CUI Jun-xiao,ZHU Meng-ting,WANG Hai-yan,ZHANG Peng,WANG Hui.Value Iteration Algorithm Based on Reinforcement Learning[J].Digital Community & Smart Home,2014(11):7348-7350.
Authors:CUI Jun-xiao  ZHU Meng-ting  WANG Hai-yan  ZHANG Peng  WANG Hui
Affiliation:(Soochow University College of Computer Science and Technology, Suzhou 215006, China)
Abstract:Reinforcement learning is learning how to map situations to actions and get the maximize reward signal. In reinforcement learning, there are three methods that can maximize the cumulative reward. They are value iteration, policy iteration and policy search. In this paper, we survey the foundation and algorithms of reinforcement learning , research about model-based value iteration and model-free value iteration and use this algorithms to solve the fixed starting point and random fixed starting point Gridworld problem. Experimental result on Gridworld show that the algorithm has faster convergence rate and better convergence performance than policy iteration.
Keywords:reinforcement learning  value Iteration  Gridworld
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号