在加强型学习系统中用伪熵进行不确定性估计 Uncertainty Estimate with Pseudo-Entropy in Reinforcement Learning期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

在加强型学习系统中用伪熵进行不确定性估计

引用本文：	张平,斯特凡·卡纽.在加强型学习系统中用伪熵进行不确定性估计[J].控制理论与应用,1998,15(1):100-104.

作者姓名：	张平斯特凡·卡纽

摘要：	加强型学习系统是一种与没有约束的，未知的环境相互作用的系统，学习系统的目标在大最大可能地获取累积奖励信号，这个奖励信号在有限，未知的生命周期由系统所处的环境中得到，对于一个加强型学习系统，困难之一在于奖励信号非常稀疏，尤其是对于只有时延信号的系统，已有的加强型学习方法以价值函数的形式贮存奖励信号，例如著名的Ｑ－学习。本文提出了一个基于状态的不生估计模型的方法，这个算法对有利用存贮于价值函数中的奖励
关键词：	加强型熵估计马尔柯夫过程学习系统伪熵
收稿时间：	1996/2/26 0:00:00
修稿时间：	1996/10/30 0:00:00
Uncertainty Estimate with Pseudo-Entropy in Reinforcement Learning

ZHANG Ping and Stephane Canu.Uncertainty Estimate with Pseudo-Entropy in Reinforcement Learning[J].Control Theory & Applications,1998,15(1):100-104.

Authors:	ZHANG Ping and Stephane Canu

Abstract:	A reinforcement learning (RL) system interacts with an unrestricted, unknown environment. Its goal is to maximize cumulative rewards, to be obtained throughout its limited, unknown lifetime. One of difficulties for a RL system is that reward signal is sparse, specially for RL system with very delayed rewards. In this paper, we describe an algorithm based on a model of the state's uncertainty estimate. It uses efficiently reward information stored in value function. The experiments show that the algorithm has a very good performance.

Keywords:	reinforcement learning Q-learning entropy estimate uncertainty Markov decision
本文献已被维普等数据库收录！
	点击此处可从《控制理论与应用》浏览原始摘要信息
	点击此处可从《控制理论与应用》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏