首页 | 本学科首页   官方微博 | 高级检索  
     

基于每阶段平均费用最优的激励学习算法
引用本文:殷苌茗,陈焕文,谢丽娟.基于每阶段平均费用最优的激励学习算法[J].计算机应用,2002,22(4):25-27.
作者姓名:殷苌茗  陈焕文  谢丽娟
作者单位:长沙电力学院数学与计算机系,湖南,长沙,410077
基金项目:国家自然科学基金 (60 0 750 1 9)
摘    要:文中利用求解最优费用函数的方法给出了一种新的激励学习算法,即基于每阶段平均费用最优的激励学习算法。这种学习算法是求解信息不完全Markov决策问题的一种有效激励学习方法,它从求解分阶段最优平均费用函数的方法出发,分析了最优解的存在性,分阶段最优平均费用函数与初始状态的关系以及与之相关的Bellman方程。这种方法的建立,可以使得动态规划(DP)算法中的许多结论直接应用到激励学习的研究中来。

关 键 词:Q学习  最优平均费用函数  Bellman方程  智能体  激励学习算法  人工智能
文章编号:1001-9081(2002)04-0025-03
修稿时间:2001年10月15

REINFORCEMENT LEARNING ALGORITHM BASED ONAVERAGE COST OPTIMIZATION FOR EACH STAGE
YIN Chang ming,CHEN Huan wen,XIE Li juan.REINFORCEMENT LEARNING ALGORITHM BASED ONAVERAGE COST OPTIMIZATION FOR EACH STAGE[J].journal of Computer Applications,2002,22(4):25-27.
Authors:YIN Chang ming  CHEN Huan wen  XIE Li juan
Abstract:This paper is concerned with the problem of a novel reinforcement learning algorithm for solving optimal average cost function. Q-learning is a reinforcement learning method to solve Markovian decision problems with incomplete information. This paper begins with solving optimal average cost function for each stage, studies the existence of optimal solution, the relationship between the optimal average cost function for each and initial state, and corresponding Bellman equation,proposes the relative value iteration Q-learning algorithm. It can make many results of dynamic programming algorithm for studying Q-learning directly.
Keywords:reinforcement learning  Q  learning  optimal average cost function  Markovian decision process  Bellman equation
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号