基于非策略Q-学习的网络控制系统最优跟踪控制 Off-policy Q-learning: Optimal tracking control for networked control systems期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于非策略Q-学习的网络控制系统最优跟踪控制

引用本文：	李金娜,尹子轩.基于非策略Q-学习的网络控制系统最优跟踪控制[J].控制与决策,2019,34(11):2343-2349.

作者姓名：	李金娜尹子轩

作者单位：	沈阳化工大学信息工程学院,沈阳110142;辽宁石油化工大学信息与控制工程学院,辽宁抚顺113001;东北大学流程工业综合自动化国家重点实验室,沈阳110004;沈阳化工大学信息工程学院,沈阳,110142

基金项目：	国家自然科学基金项目(61673280,61525302,61590922,61503257)；辽宁省高等学校创新人才项目(LR20 17006)；辽宁省自然基金计划重点领域联合开放基金项目(2019-KF-03-06)；辽宁石油化工大学基金项目(2018XJJ-005).

摘要：	针对具有数据包丢失的网络化控制系统跟踪控制问题,提出一种非策略Q-学习方法,完全利用可测数据,在系统模型参数未知并且网络通信存在数据丢失的情况下,实现系统以近似最优的方式跟踪目标.首先,刻画具有数据包丢失的网络控制系统,提出线性离散网络控制系统跟踪控制问题;然后,设计一个Smith预测器补偿数据包丢失对网络控制系统性能的影响,构建具有数据包丢失补偿的网络控制系统最优跟踪控制问题;最后,融合动态规划和强化学习方法,提出一种非策略Q-学习算法.算法的优点是:不要求系统模型参数已知,利用网络控制系统可测数据,学习基于预测器状态反馈的最优跟踪控制策略;并且该算法能够保证基于Q-函数的迭代Bellman方程解的无偏性.通过仿真验证所提方法的有效性.
关键词：	网络控制非策略Q-学习线性二次跟踪(LQT) 数据包丢失
Off-policy Q-learning: Optimal tracking control for networked control systems

LI Jin-na and YIN Zi-xuan.Off-policy Q-learning: Optimal tracking control for networked control systems[J].Control and Decision,2019,34(11):2343-2349.

Authors:	LI Jin-na and YIN Zi-xuan

Affiliation:	College of Information Engineering,Shenyang University of Chemical Technology,Shenyang110142,China;School of Information and Control Engineering, Liaoning Shihua University,Fushun113001,China;State Key Lab of Synthetical Automation for Process Industries,Northeastern University,Shenyang110004,China and College of Information Engineering,Shenyang University of Chemical Technology,Shenyang110142,China

Abstract:	This paper develops a novel off-policy Q-learning method for solving linear quadratic tracking (LQT) problem in discrete-time networked control systems with packet dropout. The proposed method can be implemented using measured data without requiring systems dynamics to be known a priori, and it also allows bounded packet loss. First, networked control systems with packet dropout are established, thus an optimal tracking problem of linear discrete-time networked control systems is further formulated. Then, a Smith predictor is designed to predict current state based on historical data measured on the communication network. On this basis, an optimal tracking problem with packet dropout compensation is put up. Finally, a novel off-policy Q-learning algorithm is developed by integrating dynamic programming with reinforcement learning. The merit of the proposed algorithm is that the optimal tracking control law based predicted states of systems can be learned using only measured data without the need of knowing system dynamics. Moreover, the unbiasedness of solution to Q-function based Bellman equation can be guaranteed by using off-policy Q-learning approach. The simulation results show that the proposed method has good tracking performance for the network control system with unknown dynamic state and packet dropout.

Keywords:
本文献已被万方数据等数据库收录！
	点击此处可从《控制与决策》浏览原始摘要信息
	点击此处可从《控制与决策》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏