基于MetrOPOlis准则的Q-学习算法研究 RESEARCH ON Q-LEARNING ALGORITHM BASED ON METROPOLIS CRITERION期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于MetrOPOlis准则的Q-学习算法研究

引用本文：	郭茂祖,王亚东,孙华梅,刘扬.基于MetrOPOlis准则的Q-学习算法研究[J].计算机研究与发展,2002,39(6):684-688.

作者姓名：	郭茂祖王亚东孙华梅刘扬

作者单位：	1. 哈尔滨工业大学计算机科学与技术学院,哈尔滨,150001;哈尔滨工业大学管理学院,哈尔滨,150001 2. 哈尔滨工业大学计算机科学与技术学院,哈尔滨,150001 3. 哈尔滨工业大学管理学院,哈尔滨,150001

基金项目：	本课题得到国家“八六三”高技术研究发展计划（200lAA115550），国家自然科学基金（70071008），中国博士后科学基金资助

摘要：	探索与扩张是Q－学习算法中动作选取的关键问题，一味地扩张使智能体很快地陷入局部最优，虽然探索可以跳出局部最优并加速学习，而过多的探索将影响算法的性能，通过把Q－学习中寻求成策略表示为组合优化问题中最优解的搜索，将模拟退火算法的Mketropolis准则用于Q－学习中探索和扩张之间的折衷处理，提出基于Metropolis准则的Q－学习算法SA－Q-learning，通过比较，它具有更快的收敛速度，而且避免了过多探索引起的算法性能下降。
关键词：	机器学习 Metropolis准则 Q－学习算法
RESEARCH ON Q-LEARNING ALGORITHM BASED ON METROPOLIS CRITERION

GUO Mao-Zu,Wang Ya-Dong,SUN Hua-Mei,and LIU Yang.RESEARCH ON Q-LEARNING ALGORITHM BASED ON METROPOLIS CRITERION[J].Journal of Computer Research and Development,2002,39(6):684-688.

Authors:	GUO Mao-Zu Wang Ya-Dong SUN Hua-Mei and LIU Yang

Abstract:	The balance between exploration and exploitation is one of the key problems when action selection is performed in Q-learning. Pure exploitations will cause the agent to reach the local optimization quickly, whereas excessive explorations will degenerate the performance of the Q-learning algorithm even if they can accelerate learning process and can avoid the local optimization. In this paper, finding the optimum policy in Q-learning is described as searching optimum solution in combinatorial optimization. Then Metropolis criterion of simulated annealing algorithm is introduced in the balance between exploration and exploitation of Q-learning, and the Q-learning algorithm based on Metropolis criterion, SA-Q-learning, is correspondingly presented. Finally, tests show that SA-Q-learning converges more quickly than Q-learning, and can avoid the degeneracy in performance due to excessive explorations.

Keywords:	reinforcement learning Q-learning Metropolis criterion exploration exploitation
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏