首页 | 本学科首页   官方微博 | 高级检索  
     

倒立摆系统中强化学习的极限环问题
引用本文:郑宇,罗四维,吕子昂. 倒立摆系统中强化学习的极限环问题[J]. 计算机工程与应用, 2008, 44(10): 16-19. DOI: 10.3778/j.issn.1002-8331.2008.10.005
作者姓名:郑宇  罗四维  吕子昂
作者单位:北京交通大学,计算机与信息技术学院,北京,100044;北京交通大学,计算机与信息技术学院,北京,100044;北京交通大学,计算机与信息技术学院,北京,100044
基金项目:国家自然科学基金(the National Natural Science Foundation of China under Grant No.60373029)
摘    要:倒立摆系统是强化学习的一种重要的应用领域。首先分析指出在倒立摆系统中,常用的强化学习算法存在着极限环问题,算法无法正确收敛、控制策略不稳定。但是由于在简单的一级倒立摆系统中算法的控制策略不稳定的现象还不明显,因此极限环问题常常被忽视。针对强化学习算法中极限环问题,提出基于动作连续性准则的强化学习算法。算法采用修正强化信号和改进探索策略的方法克服极限环对倒立摆系统的影响。将提出的算法用于二级倒立摆的实际系统控制中,实验结果证明算法不仅能成功控制倒立摆,而且可以保持控制策略的稳定。

关 键 词:极限环  强化学习  倒立摆
文章编号:1002-8331(2008)10-0016-04
收稿时间:2007-11-21
修稿时间:2007-11-21

Limit cycles in inverted pendulum system by reinforcement learning
ZHENG Yu,LUO Si-wei,LV Ziang. Limit cycles in inverted pendulum system by reinforcement learning[J]. Computer Engineering and Applications, 2008, 44(10): 16-19. DOI: 10.3778/j.issn.1002-8331.2008.10.005
Authors:ZHENG Yu  LUO Si-wei  LV Ziang
Affiliation:School of Computer and Information Technology,Beijing Jiaotong University,Beijing 100044,China
Abstract:An important application of reinforcement learning in control systems is inverted pendulum.This paper points out that the common reinforcement learning algorithm will get into the limit cycles in the inverted pendulum system,which makes the algorithm incorrectly converge and destroy the stabilization of the optimal control policy.But the limit cycles problem is often ignored in many literatures as the goal of their algorithms is only to keep the pendulum stand in a given time.To overcome the limit cycles problem,this paper proposes a new reinforcement learning algorithm based on action continuity criterion.The algorithm revises the reinforcement signal and improves the exploration policy to overcome the negative effect of limit cycles in the inverted pendulum system.Simulation and actual control results of the double inverted pendulum system show the algorithm can not only control inverted pendulum successfully,but also keep the control policy stable.
Keywords:limit cycles  reinforcement learning  inverted pendulum
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号