一种有限时段Markov决策过程的强化学习算法 An algorithm of reinforcement learning for finite-horizon Markov decision processes期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

一种有限时段Markov决策过程的强化学习算法

引用本文：	李春贵,刘永信.一种有限时段Markov决策过程的强化学习算法[J].广西工学院学报,2003,14(1):1-4.

作者姓名：	李春贵刘永信

作者单位：	1. 广西工学院计算机系,广西,柳州,545006 2. 内蒙古大学自动化系,内蒙古,呼和浩特,010021

摘要：	研究有限时段非平稳的Markov决策过程的强化学习算法。通过引入一个人工吸收状态，把有限时段问题变为无限时段问题，从而可利用通常的强化学习方法来求解。在文献3]提出的算法思想基础上，提出了一种新的有限时段非平稳的Markov决策过程的强化学习算法，并用无完全模型的库存控制问题进行了实验。
关键词：	强化学习有限时段 Markov决策过程无完全模型库存控制机器学习非平稳
文章编号：	1004-6410(2003)01-0001-04
修稿时间：	2002年11月13
An algorithm of reinforcement learning for finite-horizon Markov decision processes

LI Chun gui ,LIU Yong xin.An algorithm of reinforcement learning for finite-horizon Markov decision processes[J].Journal of Guangxi University of Technology,2003,14(1):1-4.

Authors:	LI Chun gui LIU Yong xin

Affiliation:	LI Chun gui 1,LIU Yong xin 2

Abstract:	In this paper, reinforcement learning algorithms for finite horizon non stationary Markov decision processes are studied By introducing an artificialabsorbingstate, the finite horizon problem transforms to an infinite horizon one, so that a normal method of reinforcement learning algorithm can be used to solve the finite-horizon problem A new reinforcement learning algorithm of the finite horizon non stationary Markov decision process is put forward based on the algorithm thought presented by reference book\ And an experiment in inventory control problem with non complete model has been done

Keywords:	reinforcementlearning Markov decision process non stationary inventory control
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏