丢包扰动环境下基于强化学习的最优输出调节 Optimal output regulation based on reinforcement learning for systems with dropouts and disturbances期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

丢包扰动环境下基于强化学习的最优输出调节

引用本文：	崔云芳,范家璐.丢包扰动环境下基于强化学习的最优输出调节[J].控制与决策,2023,38(2):403-412.

作者姓名：	崔云芳范家璐

作者单位：	东北大学流程工业综合自动化国家重点实验室,沈阳 110004

基金项目：	辽宁省“兴辽英才计划”项目(XLYC2007135).

摘要：	针对存在线性外部干扰和状态反馈过程中发生丢包的网络控制系统的跟踪控制问题,采用输出调节的思想,提出基于离轨策略强化学习的数据驱动最优输出调节控制方法,实现仅利用在线数据即可求解控制策略.首先,对系统状态在网络传输过程存在丢包的情况,利用史密斯预估器重构系统的状态;然后基于输出调节控制框架,提出一种基于离轨策略强化学习的数据驱动最优控制算法,在系统状态发生丢包时仅利用在线数据计算反馈增益,在求解反馈增益过程中找到与求解输出调节问题的联系;接着基于求解反馈增益过程中得到的与输出调节问题中求解调节器方程相关的参数,计算前馈增益的无模型解;最后,通过仿真结果验证所提出方法的有效性.
关键词：	输出调节强化学习丢包史密斯预估器离轨策略跟踪控制
Optimal output regulation based on reinforcement learning for systems with dropouts and disturbances

CUI Yun-fang,FAN Jia-lu.Optimal output regulation based on reinforcement learning for systems with dropouts and disturbances[J].Control and Decision,2023,38(2):403-412.

Authors:	CUI Yun-fang FAN Jia-lu

Affiliation:	State Key Laboratory of Synthetical Automation for Process Industries,Northeastern University,Shenyang 110004,China

Abstract:	In this paper, a data-driven optimal output regulation control method using off-policy reinforcement learning is proposed for tracking control of discrete-time networked control systems with both linear disturbance and state dropouts in the feedback process. This method uses only measured online data to calculate control policies. First, in the environment where state dropouts exist, a restructured state of the system is established by using the Smith predictor. Then, under the output regulation framework, a data-driven optimal tracking control method using off-policy reinforcement learning is developed to calculate the feedback gain using only the measured data when dropout occurs. The connection with solving the output regulation problem is found in the process of solving the feedback gain. Based on the parameters related to solving the regulator equation in the process of solving the feedback gain, a model-free solution of forward gain is calculated. Finally, simulation results demonstrate the effectiveness of the proposed approach.

Keywords:

	点击此处可从《控制与决策》浏览原始摘要信息
	点击此处可从《控制与决策》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏