首页 | 本学科首页   官方微博 | 高级检索  
     

基于路径匹配的在线分层强化学习方法
引用本文:石川,史忠植,王茂光.基于路径匹配的在线分层强化学习方法[J].计算机研究与发展,2008,45(9).
作者姓名:石川  史忠植  王茂光
作者单位:1. 北京邮电大学北京市智能软件与多媒体重点实验室,北京,100876;中国科学院计算技术研究所智能信息处理重点实验室,北京,100190
2. 中国科学院计算技术研究所智能信息处理重点实验室,北京,100190
基金项目:国家自然科学基金,国家科技支撑计划基金
摘    要:如何在线找到正确的子目标是基于option的分层强化学习的关键问题.通过分析学习主体在子目标处的动作,发现了子目标的有效动作受限的特性,进而将寻找子目标的问题转化为寻找路径中最匹配的动作受限状态.针对网格学习环境,提出了单向值方法表示子目标的有效动作受限特性和基于此方法的option自动发现算法.实验表明,基于单向值方法产生的option能够显著加快Q学习算法,也进一步分析了option产生的时机和大小对Q学习算法性能的影响.

关 键 词:强化学习  分层强化学习  子目标  路径匹配

Online Hierarchical Reinforcement Learning Based on Path-matching
Shi Chuan,Shi Zhongzhi,Wang Maoguang.Online Hierarchical Reinforcement Learning Based on Path-matching[J].Journal of Computer Research and Development,2008,45(9).
Authors:Shi Chuan  Shi Zhongzhi  Wang Maoguang
Affiliation:Shi Chuan1,2,Shi Zhongzhi2,, Wang Maoguang21(Smart Software , Multimedia of Beijing Key Laboratory,Beijing University of Posts , Telecommunications,Beijing 100876)2(Key Laboratory of Intelligent Information Processing,Institute of Computing Technology,Beijing 100190)
Abstract:Although reinforcement learning (RL) is an effective approach for building autonomous agents that improve their performance with experiences, a fundamental problem of the standard RL algorithm is that in practice they are not solvable in reasonable time. The hierarchical reinforcement learning (HRL) is a successful solution which decomposes the learning task into simpler subtasks and learns each of them independently. As a promising HRL, option is introduced as closed-loop policies for sequences of actions ...
Keywords:option
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号