一种基于动态参数调整的强化学习动作选择机制 Action choice mechanism of reinforcement learning based on adjusted dynamic parameters期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

一种基于动态参数调整的强化学习动作选择机制

引用本文：	胡晓辉.一种基于动态参数调整的强化学习动作选择机制[J].计算机工程与应用,2008,44(28):29-31.

作者姓名：	胡晓辉

作者单位：	兰州交通大学电子与信息工程学院，兰州 730070

基金项目：	国家自然科学基金，兰州交通大学校科研和教改项目

摘要：	强化学习是一种重要的无监督机器学习技术，它能够利用不确定的环境下的奖赏发现最优的行为序列，实现动态环境下的在线学习，被广泛地应用到Agent系统当中。应用强化学习算法的难点之一就是如何平衡强化学习当中探索和利用之间的关系，即如何进行动作选择。结合Q学习在ε-greedy策略基础上引入计数器，从而使动作选择时的参数ε能够分阶段进行调整，从而更好地平衡探索和利用间的关系。通过对方格世界的实验仿真，证明了方法的有效性。
关键词：	强化学习 Q学习动作选择 ε-greedy机制
收稿时间：	2008-6-20
修稿时间：	2008-7-10
Action choice mechanism of reinforcement learning based on adjusted dynamic parameters

HU Xiao-hui.Action choice mechanism of reinforcement learning based on adjusted dynamic parameters[J].Computer Engineering and Applications,2008,44(28):29-31.

Authors:	HU Xiao-hui

Affiliation:	School of Electronic & Information Engineering，Lanzhou Jiaotong University，Lanzhou 730070，China

Abstract:	Reinforcement Learning（RL） is a kind of unsupervised learning method for agent to acquire optimal behavior sequence to adapt to unknown environments with a clue of reward.Now RL is widely used in agent systems.One of difficult problems for RL is action selecting，which means how to balance the relation exploitation and exploration.A counter mechanism on the basis of Q learning combined with ε-greedy strategy is presented so that the parameters of ε-greedy can be adjusted in steps when choosing actions.The simulation results of Grid World verify the effectiveness of the method.

Keywords:	Reinforcement Learning（RL） Q learning action choice ε-greedy mechanism
本文献已被万方数据等数据库收录！
	点击此处可从《计算机工程与应用》浏览原始摘要信息
	点击此处可从《计算机工程与应用》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏