最小状态变元平均奖赏的强化学习方法 Reinforcement learning algorithm based on minimum state method and average reward期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

最小状态变元平均奖赏的强化学习方法

引用本文：	刘全,傅启明,龚声蓉,伏玉琛,崔志明.最小状态变元平均奖赏的强化学习方法[J].通信学报,2011,32(1):66-71.

作者姓名：	刘全傅启明龚声蓉伏玉琛崔志明

作者单位：	1. 苏州大学,计算机科学与技术学院,江苏,苏州,215006;南京大学,软件新技术国家重点实验室,江苏,南京,210093 2. 苏州大学,计算机科学与技术学院,江苏,苏州,215006

基金项目：	国家自然科学基金资助项目，江苏省自然科学基金资助项目，江苏省高校自然科学研究基金资助项目，江苏省现代企业信息化应用支撑软件工程技术研究开发中心基金资助项目

摘要：	针对采用折扣奖赏作为评价目标的Q学习无法体现对后续动作的影响问题,提出将平均奖赏和Q学习相结合的AR-Q-Learning算法,并进行收敛性证明.针对学习参数个数随着状态变量维数呈几何级增长的"维数灾"问题,提出最小状态变元的思想.将最小变元思想和平均奖赏用于积木世界的强化学习中,试验结果表明,该方法更具有后效性,加快算法的收敛速度,同时在一定程度上解决积木世界中的"维数灾"问题.
关键词：	强化学习平均奖赏俄罗斯方块最小状态
Reinforcement learning algorithm based on minimum state method and average reward

LIU Quan,FU Qi-ming,GONG Sheng-rong,FU Yu-chen,CUI Zhi-ming.Reinforcement learning algorithm based on minimum state method and average reward[J].Journal on Communications,2011,32(1):66-71.

Authors:	LIU Quan FU Qi-ming GONG Sheng-rong FU Yu-chen CUI Zhi-ming

Affiliation:	LIU Quan1,2,FU Qi-ming1,GONG Sheng-rong1,FU Yu-chen1,CUI Zhi-ming1(1.Institute of Computer Science and Technology,Soochow University,Suzhou 215006,China,2.State Key Laboratory for Novel Software Technology,Nanjing University,Nanjing 210093,China)

Abstract:	In allusion to the problem that Q-Learning,which was used discount reward as the evaluation criterion,could not show the affect of the action to the next situation,AR-Q-Learning was put forward based on the average reward and Q-Learning.In allusion to the curse of dimensionality,which meant that the computational requirement grew exponen-tially with the number of the state variable.Minimum state method was put forward.AR-Q-Learning and minimum state method were used in reinforcement learning for Blocks Worl...

Keywords:	reinforcement learning average reward tetris minimum state
本文献已被 CNKI 万方数据等数据库收录！
	点击此处可从《通信学报》浏览原始摘要信息
	点击此处可从《通信学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏