首页 | 本学科首页   官方微博 | 高级检索  
     

基于自组织模糊RBF网络的连续空间Q学习
引用本文:程玉虎,王雪松,易建强,孙伟. 基于自组织模糊RBF网络的连续空间Q学习[J]. 信息与控制, 2008, 37(1): 1-1
作者姓名:程玉虎  王雪松  易建强  孙伟
作者单位:1. 中国矿业大学信息与电气工程学院,江苏,徐州,221008
2. 中国科学院自动化研究所,北京,100080
基金项目:高等学校博士学科点专项科研项目 , 中国博士后科学基金 , 江苏省博士后科学基金 , 江苏省教育厅青蓝工程项目 , 中国矿业大学校科研和教改项目
摘    要:针对连续空间下的强化学习控制问题,提出了一种基于自组织模糊RBF网络的Q学习方法.网络的输入为状态,输出为连续动作及其Q值,从而实现了“连续状态—连续动作”的映射关系.首先将连续动作空间离散化为确定数目的离散动作,采用完全贪婪策略选取具有最大Q值的离散动作作为每条模糊规则的局部获胜动作.然后采用命令融合机制对获胜的离散动作按其效用值进行加权,得到实际作用于系统的连续动作.另外,为简化网络结构和提高学习速度,采用改进的RAN算法和梯度下降法分别对网络的结构和参数进行在线自适应调整.倒立摆平衡控制的仿真结果验证了所提Q学习方法的有效性.

关 键 词:自组织  模糊RBF网络  连续空间  Q学习  Q值
文章编号:1002-0411(2008)01-0001-08
收稿时间:2007-01-24
修稿时间:2007-01-24

A Q-learning Method for Continuous Space Based on Self-organizing Fuzzy RBF Network
CHENG Yu-hu,WANG Xue-song,YI Jian-qiang,SUN Wei. A Q-learning Method for Continuous Space Based on Self-organizing Fuzzy RBF Network[J]. Information and Control, 2008, 37(1): 1-1
Authors:CHENG Yu-hu  WANG Xue-song  YI Jian-qiang  SUN Wei
Abstract:For reinforcement learning control in continuous spaces,a Q-learning method based on a self-organizing fuzzy RBF(radial basis function) network is proposed.Input of the fuzzy RBF network is state,and the outputs are continuous actions and the corresponding Q-values,which realizes the mapping from a continuous state space to a continuous action space.At first,the continuous action space is discretized into the discrete actions with definite number,and a completely greedy policy is used to select a discrete action with the maximum Q-value as the winning local actions of each fuzzy rule.Then a command fusion mechanism is adopted to weight the winning local actions of each fuzzy rule according to its utility value,and a continuous action is generated for the actual system.Moreover,in order to simplify the network structure and improve the learning speed,an improved resource allocating network(RAN) algorithm and a gradient descent algorithm are applied to adjust the structure and parameters of the fuzzy RBF network in an on-line and adaptive manner respectively.The effectiveness of the proposed Q-learning method is shown through simulation on the balancing control of an inverted pendulum system.
Keywords:self-organizing  fuzzy RBF(radial basis function) network  continuous space  Q-learning  Q-value
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《信息与控制》浏览原始摘要信息
点击此处可从《信息与控制》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号