首页 | 本学科首页   官方微博 | 高级检索  
     

基于强化学习的参数自整定及优化算法
引用本文:严家政,专祥涛,.基于强化学习的参数自整定及优化算法[J].智能系统学报,2022,17(2):341-347.
作者姓名:严家政  专祥涛  
作者单位:1. 武汉大学 电气与自动化学院,湖北 武汉 430072;2. 武汉大学 深圳研究院,广东 深圳 518057
摘    要:传统PID控制算法在非线性时滞系统的应用中,存在参数整定及性能优化过程繁琐、控制效果不理想的问题。针对该问题,提出了一种基于强化学习的控制器参数自整定及优化算法。该算法引入系统动态性能指标计算奖励函数,通过学习周期性阶跃响应的经验数据,无需辨识被控对象模型的具体数据,即可实现控制器参数的在线自整定及优化。以水箱液位控制系统为实验对象,对不同类型的PID控制器使用该算法进行参数整定及优化的对比实验。实验结果表明,相比于传统的参数整定方法,所提出的算法能省去繁琐的人工调参过程,有效优化控制器参数,减少被控量的超调量,提升控制器动态响应性能。

关 键 词:强化学习  整定  优化  学习算法  时滞  控制器  液位控制  动态响应

Parameter self-tuning and optimization algorithm based on reinforcement learning
YAN Jiazheng,ZHUAN Xiangtao,.Parameter self-tuning and optimization algorithm based on reinforcement learning[J].CAAL Transactions on Intelligent Systems,2022,17(2):341-347.
Authors:YAN Jiazheng  ZHUAN Xiangtao  
Affiliation:1. School of Electrical Engineering and Automation, Wuhan University, Wuhan 430072, China;2. Shenzhen Research Institute, Wuhan University, Shenzhen 518057, China
Abstract:To achieve better control performance in the nonlinear time-delay system, the traditional Proportional-Integral-Derivative (PID) control algorithm requires tuning and optimization, which complicates the controller design. First, we propose a new self-tuning and optimization algorithm for controller parameters based on reinforcement learning. Then, a reward function based on the system dynamic performance index is introduced by this algorithm. This function can learn the empirical data of periodic step response and realize the online optimization of controller parameters without identifying the model data of the controlled object. Finally, the algorithm is tested through experiments on a water tank level control system with different types of PID controllers. Experimental results show that, in contrast to the traditional parameter tuning method, the manual process is eliminated by the proposed algorithm, effectively optimizing the controller parameters, reducing the overshoot of the controlled quantity, and improving the dynamic response performance of the controller.
Keywords:reinforcement learning  tuning  optimization  learning algorithm  time delay  controller  level control  dynamic response
点击此处可从《智能系统学报》浏览原始摘要信息
点击此处可从《智能系统学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号