首页 | 官方网站   微博 | 高级检索  
     

基于协同最小二乘支持向量机的Q学习
引用本文:王雪松,田西兰,程玉虎,易建强.基于协同最小二乘支持向量机的Q学习[J].自动化学报,2009,35(2):214-219.
作者姓名:王雪松  田西兰  程玉虎  易建强
作者单位:1.中国矿业大学信息与电气工程学院 徐州 221116
基金项目:国家自然科学基金,高等学校博士学科点专项科研基金,国家博士后科学基金,江苏省自然科学基金,江苏省博士后科研资助计划 
摘    要:针对强化学习系统收敛速度慢的问题, 提出一种适用于连续状态、离散动作空间的基于协同最小二乘支持向量机的Q学习. 该Q学习系统由一个最小二乘支持向量回归机(Least squares support vector regression machine, LS-SVRM)和一个最小二乘支持向量分类机(Least squares support vector classification machine, LS-SVCM)构成. LS-SVRM用于逼近状态--动作对到值函数的映射, LS-SVCM则用于逼近连续状态空间到离散动作空间的映射, 并为LS-SVRM提供实时、动态的知识或建议(建议动作值)以促进值函数的学习. 小车爬山最短时间控制仿真结果表明, 与基于单一LS-SVRM的Q学习系统相比, 该方法加快了系统的学习收敛速度, 具有较好的学习性能.

关 键 词:强化学习    Q学习    协同    最小二乘支持向量机    映射
收稿时间:2007-11-26
修稿时间:2008-7-11

Q-learning System Based on Cooperative Least Squares Support Vector Machine
WANG Xue-Song,TIAN Xi-Lan,CHENG Yu-Hu,YI Jian-Qiang.Q-learning System Based on Cooperative Least Squares Support Vector Machine[J].Acta Automatica Sinica,2009,35(2):214-219.
Authors:WANG Xue-Song  TIAN Xi-Lan  CHENG Yu-Hu  YI Jian-Qiang
Affiliation:1.School of Information and Electrical Engineering, China University of Mining and Technology, Xuzhou 221116;2.Institute of Automation, Chinese Academy of Sciences, Beijing 100190
Abstract:In order to solve the problem of slow convergence speed in reinforcement learning systems, a Q learning system based on a cooperative least squares support vector machine for continuous state space and discrete action space is proposed. The proposed Q learning system is composed of a least squares support vector regression machine (LS-SVRM) and a least squares support vector classification machine (LS-SVCM). The LS-SVRM is used to approximate a mapping from a state-action pair to a value function, and the LS-SVCM is used to approximate a mapping from a continuous state space to a discrete action space. In addition, the LS-SVCM supplies the LS-SVRM with dynamic and real-time knowledge or advice (suggested action) to accelerate its learning process. Simulation studies involving a mountain car control illustrate that compared with a Q learning system based on a single LS-SVRM, the proposed Q learning system has a faster convergence speed and a better learning performance.
Keywords:Reinforcement learning  Q learning  cooperative  least squares support vector machine (LS-SVM)  mapping
本文献已被 万方数据 等数据库收录!
点击此处可从《自动化学报》浏览原始摘要信息
点击此处可从《自动化学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号