首页 | 本学科首页   官方微博 | 高级检索  
     

激励学习的最优判据研究
引用本文:陈焕文,谢建平. 激励学习的最优判据研究[J]. 计算机工程与科学, 2001, 23(2): 62-65
作者姓名:陈焕文  谢建平
作者单位:1. 长沙电力学院数学与计算机系,
2. 长沙交通学院网络中心,
摘    要:激励学习智能体通过最优策略的学习与规划来求解序贯决策问题,因此如何定义策略的最优判所是激励学习研究的核心问题之一,本文讨论了一系列来自动态规划的最优判据,通过实例检验了各种判据对激励学习的适用性和优缺点,分析了设计各种判据的激励学习算法的必要性。

关 键 词:激励学习 智能体 最优判据 学习算法 人工智能
文章编号:1007-130X(2001)02-0062-04
修稿时间:2000-05-15

Research on Optimality Criteria in Reinforcement Learning
CHEN Huan-wen,XIE Jian-ping. Research on Optimality Criteria in Reinforcement Learning[J]. Computer Engineering & Science, 2001, 23(2): 62-65
Authors:CHEN Huan-wen  XIE Jian-ping
Affiliation:CHEN Huan wen 1,XIE Jian ping 2
Abstract:RL agents solve sequential decision problems by learning optim policies for choosing actions.Thus,at the core of RL is the definition of what it means for a policy to be “optimal”.In this paper,a variety of optimality criteria from the dynamic programming literature are discussed,and their suitability and characteristics for RL is examined through s ome examples.The necessity of devising RL algorithms for the various criteria has also been analyzed.
Keywords:reinforcement learning  Markov decision process  agent
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《计算机工程与科学》浏览原始摘要信息
点击此处可从《计算机工程与科学》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号