首页 | 官方网站   微博 | 高级检索  
     

一类基于谱方法的强化学习混合迁移算法
引用本文:朱美强,程玉虎,李明,王雪松,冯涣婷.一类基于谱方法的强化学习混合迁移算法[J].自动化学报,2012,38(11):1765-1776.
作者姓名:朱美强  程玉虎  李明  王雪松  冯涣婷
作者单位:1.中国矿业大学信息与电气工程学院 徐州 221116
基金项目:国家自然科学基金(60974050,61072094,61273143);中国矿业大学青年科技基金(OC080252);教育部新世纪优秀人才支持计划(NCET-08-0836,NCET-10-0765);教育部高等学校博士学科点专项科研基金(20110095110016)资助~~
摘    要:在状态空间比例放大的迁移任务中, 原型值函数方法只能有效迁移较小特征值对应的基函数, 用于目标任务的值函数逼近时会使部分状态的值函数出现错误. 针对该问题, 利用拉普拉斯特征映射能保持状态空间局部拓扑结构不变的特点, 对基于谱图理论的层次分解技术进行了改进, 提出一种基函数与子任务最优策略相结合的混合迁移方法. 首先, 在源任务中利用谱方法求取基函数, 再采用线性插值技术将其扩展为目标任务的基函数; 然后, 用插值得到的次级基函数(目标任务的近似Fiedler特征向量)实现任务分解, 并借助改进的层次分解技术求取相关子任务的最优策略; 最后, 将扩展的基函数和获取的子任务策略一起用于目标任务学习中. 所提的混合迁移方法可直接确定目标任务部分状态空间的最优策略, 减少了值函数逼近所需的最少基函数数目, 降低了策略迭代次数, 适用于状态空间比例放大且具有层次结构的迁移任务. 格子世界的仿真结果验证了新方法的有效性.

关 键 词:强化学习    迁移学习    谱图理论    原型值函数    层次分解
收稿时间:2011-12-02

A Hybrid Transfer Algorithm for Reinforcement Learning Based on Spectral Method
ZHU Mei-Qiang,CHENG Yu-Hu,LI Ming,WANG Xue-Song,FENG Huan-Ting.A Hybrid Transfer Algorithm for Reinforcement Learning Based on Spectral Method[J].Acta Automatica Sinica,2012,38(11):1765-1776.
Authors:ZHU Mei-Qiang  CHENG Yu-Hu  LI Ming  WANG Xue-Song  FENG Huan-Ting
Affiliation:1.School of Information and Electrical Engineering, China University of Mining and Technology, Xuzhou 221116
Abstract:For scaling up state space transfer underlying the proto-value function framework, only some basis functions corresponding to smaller eigenvalues are transferred effectively, which will result in wrong approximation of value function in the target task. In order to solve the problem, according to the fact that Laplacian eigenmap can preserve the local topology structure of state space, an improved hierarchical decomposition algorithm based on the spectral graph theory is proposed and a hybrid transfer method integrating basis function transfer with subtask optimal polices transfer is designed. At first, the basis functions of the source task are constructed using spectral method. The basis functions of target task are produced through linearly interpolating basis functions of the source task. Secondly, the produced second basis function of the target task (approximating Fiedler eigenvector) is used to decompose the target task. Then the optimal polices of subtasks are obtained using the improved hierarchical decomposition algorithm. At last, the obtained basis functions and optimal subtask polices are transferred to the target task. The proposed hybrid transfer method can directly get optimal policies of some states, reduce the number of iterations and the minimum number of basis functions needed to approximate the value function. The method is suitable for scaling up state space transfer task with hierarchical control structure. Simulation results of grid world have verified the validity of the proposed hybrid transfer method.
Keywords:Reinforcement learning  transfer learning  spectral graph theory  proto-value functions  hierarchical decomposition
本文献已被 CNKI 等数据库收录!
点击此处可从《自动化学报》浏览原始摘要信息
点击此处可从《自动化学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号