基于路径匹配的在线分层强化学习方法 Online Hierarchical Reinforcement Learning Based on Path-matching期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于路径匹配的在线分层强化学习方法

引用本文：	石川,史忠植,王茂光.基于路径匹配的在线分层强化学习方法[J].计算机研究与发展,2008,45(9).

作者姓名：	石川史忠植王茂光

作者单位：	1. 北京邮电大学北京市智能软件与多媒体重点实验室,北京,100876;中国科学院计算技术研究所智能信息处理重点实验室,北京,100190 2. 中国科学院计算技术研究所智能信息处理重点实验室,北京,100190

基金项目：	国家自然科学基金，国家科技支撑计划基金

摘要：	如何在线找到正确的子目标是基于option的分层强化学习的关键问题.通过分析学习主体在子目标处的动作,发现了子目标的有效动作受限的特性,进而将寻找子目标的问题转化为寻找路径中最匹配的动作受限状态.针对网格学习环境,提出了单向值方法表示子目标的有效动作受限特性和基于此方法的option自动发现算法.实验表明,基于单向值方法产生的option能够显著加快Q学习算法,也进一步分析了option产生的时机和大小对Q学习算法性能的影响.
关键词：	强化学习分层强化学习子目标路径匹配
Online Hierarchical Reinforcement Learning Based on Path-matching

Shi Chuan,Shi Zhongzhi,Wang Maoguang.Online Hierarchical Reinforcement Learning Based on Path-matching[J].Journal of Computer Research and Development,2008,45(9).

Authors:	Shi Chuan Shi Zhongzhi Wang Maoguang

Affiliation:	Shi Chuan1,2,Shi Zhongzhi2,, Wang Maoguang21(Smart Software , Multimedia of Beijing Key Laboratory,Beijing University of Posts , Telecommunications,Beijing 100876)2(Key Laboratory of Intelligent Information Processing,Institute of Computing Technology,Beijing 100190)

Abstract:	Although reinforcement learning (RL) is an effective approach for building autonomous agents that improve their performance with experiences, a fundamental problem of the standard RL algorithm is that in practice they are not solvable in reasonable time. The hierarchical reinforcement learning (HRL) is a successful solution which decomposes the learning task into simpler subtasks and learns each of them independently. As a promising HRL, option is introduced as closed-loop policies for sequences of actions ...

Keywords:	option
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏