激励学习的最优判据研究 Research on Optimality Criteria in Reinforcement Learning期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

激励学习的最优判据研究

引用本文：	陈焕文,谢建平. 激励学习的最优判据研究[J]. 计算机工程与科学, 2001, 23(2): 62-65

作者姓名：	陈焕文谢建平

作者单位：	1. 长沙电力学院数学与计算机系, 2. 长沙交通学院网络中心,

摘要：	激励学习智能体通过最优策略的学习与规划来求解序贯决策问题，因此如何定义策略的最优判所是激励学习研究的核心问题之一，本文讨论了一系列来自动态规划的最优判据，通过实例检验了各种判据对激励学习的适用性和优缺点，分析了设计各种判据的激励学习算法的必要性。
关键词：	激励学习智能体最优判据学习算法人工智能
文章编号：	1007-130X(2001)02-0062-04
修稿时间：	2000-05-15
Research on Optimality Criteria in Reinforcement Learning

CHEN Huan-wen,XIE Jian-ping. Research on Optimality Criteria in Reinforcement Learning[J]. Computer Engineering & Science, 2001, 23(2): 62-65

Authors:	CHEN Huan-wen XIE Jian-ping

Affiliation:	CHEN Huan wen 1,XIE Jian ping 2

Abstract:	RL agents solve sequential decision problems by learning optim policies for choosing actions.Thus,at the core of RL is the definition of what it means for a policy to be “optimal”.In this paper,a variety of optimality criteria from the dynamic programming literature are discussed,and their suitability and characteristics for RL is examined through s ome examples.The necessity of devising RL algorithms for the various criteria has also been analyzed.

Keywords:	reinforcement learning Markov decision process agent
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《计算机工程与科学》浏览原始摘要信息
	点击此处可从《计算机工程与科学》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏