首页 | 本学科首页   官方微博 | 高级检索  
     

最小状态变元平均奖赏的强化学习方法
引用本文:刘 全,傅启明,龚声蓉,伏玉琛,崔志明.最小状态变元平均奖赏的强化学习方法[J].通信学报,2011,32(1):66-71.
作者姓名:刘 全  傅启明  龚声蓉  伏玉琛  崔志明
作者单位:1. 苏州大学,计算机科学与技术学院,江苏,苏州,215006;南京大学,软件新技术国家重点实验室,江苏,南京,210093
2. 苏州大学,计算机科学与技术学院,江苏,苏州,215006
基金项目:国家自然科学基金资助项目,江苏省自然科学基金资助项目,江苏省高校自然科学研究基金资助项目,江苏省现代企业信息化应用支撑软件工程技术研究开发中心基金资助项目
摘    要:针对采用折扣奖赏作为评价目标的Q学习无法体现对后续动作的影响问题,提出将平均奖赏和Q学习相结合的AR-Q-Learning算法,并进行收敛性证明.针对学习参数个数随着状态变量维数呈几何级增长的"维数灾"问题,提出最小状态变元的思想.将最小变元思想和平均奖赏用于积木世界的强化学习中,试验结果表明,该方法更具有后效性,加快算法的收敛速度,同时在一定程度上解决积木世界中的"维数灾"问题.

关 键 词:强化学习  平均奖赏  俄罗斯方块  最小状态

Reinforcement learning algorithm based on minimum state method and average reward
LIU Quan,FU Qi-ming,GONG Sheng-rong,FU Yu-chen,CUI Zhi-ming.Reinforcement learning algorithm based on minimum state method and average reward[J].Journal on Communications,2011,32(1):66-71.
Authors:LIU Quan  FU Qi-ming  GONG Sheng-rong  FU Yu-chen  CUI Zhi-ming
Affiliation:LIU Quan1,2,FU Qi-ming1,GONG Sheng-rong1,FU Yu-chen1,CUI Zhi-ming1(1.Institute of Computer Science and Technology,Soochow University,Suzhou 215006,China,2.State Key Laboratory for Novel Software Technology,Nanjing University,Nanjing 210093,China)
Abstract:In allusion to the problem that Q-Learning,which was used discount reward as the evaluation criterion,could not show the affect of the action to the next situation,AR-Q-Learning was put forward based on the average reward and Q-Learning.In allusion to the curse of dimensionality,which meant that the computational requirement grew exponen-tially with the number of the state variable.Minimum state method was put forward.AR-Q-Learning and minimum state method were used in reinforcement learning for Blocks Worl...
Keywords:reinforcement learning  average reward  tetris  minimum state  
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《通信学报》浏览原始摘要信息
点击此处可从《通信学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号