首页 | 本学科首页   官方微博 | 高级检索  
     

基于强化学习的部分线性离散时间系统的最优输出调节
引用本文:庞文砚, 范家璐, 姜艺, Lewis Frank Leroy. 基于强化学习的部分线性离散时间系统的最优输出调节. 自动化学报, 2022, 48(9): 2242−2253 doi: 10.16383/j.aas.c190853
作者姓名:庞文砚  范家璐  姜艺  LEWIS Frank Leroy
作者单位:1.东北大学流程工业综合自动化国家重点实验室 沈阳 110819 中国;;2.德克萨斯大学阿灵顿分校 沃斯堡 76118 美国
基金项目:国家自然科学基金(61533015, 61991404, 61991403)和辽宁省兴辽英才计划(XLYC2007135)资助
摘    要:针对同时具有线性外部干扰与非线性不确定性下的离散时间部分线性系统的最优输出调节问题, 提出了仅利用在线数据的基于强化学习的数据驱动控制方法. 首先, 该问题可拆分为一个受约束的静态优化问题和一个动态规划问题, 第一个问题可以解出调节器方程的解. 第二个问题可以确定出控制器的最优反馈增益. 然后, 运用小增益定理证明了存在非线性不确定性离散时间部分线性系统的最优输出调节问题的稳定性. 针对传统的控制方法需要准确的系统模型参数用来解决这两个优化问题, 提出了一种数据驱动离线策略更新算法, 该算法仅使用在线数据找到动态规划问题的解. 然后, 基于动态规划问题的解, 利用在线数据为静态优化问题提供了最优解. 最后, 仿真结果验证了该方法的有效性.

关 键 词:输出调节   离散时间系统   强化学习   非线性未知动态
收稿时间:2019-12-16

Optimal Output Regulation of Partially Linear Discrete-time Systems Using Reinforcement Learning
Pang Wen-Yan, Fan Jian-Lu, Jiang Yi, Lewis Frank Leroy. Optimal output regulation of partially linear discrete-time systems using reinforcement learning. Acta Automatica Sinica, 2022, 48(9): 2242−2253 doi: 10.16383/j.aas.c190853
Authors:PANG Wen-Yan  FAN Jia-Lu  JIANG Yi  LEWIS Frank Leroy
Affiliation:1. State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University, Shenyang 110819, China;;2. University of Texas at Arlington, Fort Worth 76118, USA
Abstract:A data-driven control method only using online data based on reinforcement learning is proposed for the optimal output regulation problem of discrete-time partially linear systems with both linear disturbance and nonlinear uncertainties. First, the problem can be split into a constrained static optimization problem and a dynamic one. The solution of the first problem is corresponding to the solution of the regulator equation. The second can determine the optimal feedback gain of the controller. Then the small-gain theorem is used to prove the stability of the optimal output regulation problem of discrete-time partially linear systems with nonlinear uncertainties. The traditional control method needs the dynamics of the system to solve the two problems. But for this problem, a data-driven off-policy algorithm is proposed using only the measured data to find the solution of the dynamic optimization problem. Then, based on the solution of the dynamic one, the solution of the static optimization problem can be found only using data online. Finally, simulation results verify the effectiveness of the proposed method.
Keywords:Output regulation  discrete-time system  reinforcement learning  nonlinear unknown dynamics
点击此处可从《自动化学报》浏览原始摘要信息
点击此处可从《自动化学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号