基于局部策略交互探索的深度确定性策略梯度的工业过程控制方法 Industrial process control method based on local policy interaction exploration-based deep deterministic policy gradient期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于局部策略交互探索的深度确定性策略梯度的工业过程控制方法

引用本文：	邓绍斌,朱军,周晓锋,李帅,刘舒锐.基于局部策略交互探索的深度确定性策略梯度的工业过程控制方法[J].计算机应用,2022,42(5):1642-1648.

作者姓名：	邓绍斌朱军周晓锋李帅刘舒锐

作者单位：	中国科学院网络化控制系统重点实验室, 沈阳 110016 中国科学院沈阳自动化研究所, 沈阳 110169 中国科学院机器人与智能制造创新研究院, 沈阳 110169 中国科学院大学, 北京 100049

基金项目：	辽宁省“兴辽英才计划”项目(XLYC1808009)

摘要：	为了实现对非线性、滞后性和强耦合的工业过程稳定精确的控制，提出了一种基于局部策略交互探索的深度确定性策略梯度（LPIE-DDPG）的控制方法用于深度强化学习的连续控制。首先，使用深度确定性策略梯度（DDPG）算法作为控制策略，从而极大地减小控制过程中的超调和振荡现象；同时，使用原控制器的控制策略作为局部策略进行搜索，并以交互探索规则进行学习，提高了学习效率和学习稳定性；最后，在Gym框架下搭建青霉素发酵过程仿真平台并进行实验。仿真结果表明，相较于DDPG，LPIE-DDPG在收敛效率上提升了27.3%；相较于比例-积分-微分（PID），LPIE-DDPG在温度控制效果上有更少的超调和振荡现象，在产量上青霉素浓度提高了3.8%。可见所提方法能有效提升训练效率，同时提高工业过程控制的稳定性。
关键词：	工业过程控制深度强化学习深度确定性策略梯度局部策略交互探索青霉素发酵过程
收稿时间：	2021-05-07
修稿时间：	2021-09-27
Industrial process control method based on local policy interaction exploration-based deep deterministic policy gradient

Shaobin DENG,Jun ZHU,Xiaofeng ZHOU,Shuai LI,Shurui LIU.Industrial process control method based on local policy interaction exploration-based deep deterministic policy gradient[J].journal of Computer Applications,2022,42(5):1642-1648.

Authors:	Shaobin DENG Jun ZHU Xiaofeng ZHOU Shuai LI Shurui LIU

Affiliation:	Key Laboratory of Networked Control System，Chinese Academy of Sciences，Shenyang Liaoning 110016，China Shenyang Institute of Automation，Chinese Academy of Sciences，Shenyang Liaoning 110169，China Institutes for Robotics and Intelligent Manufacturing Innovation，Chinese Academy of Sciences，Shenyang Liaoning 110169，China University of Chinese Academy of Sciences，Beijing 100049，China

Abstract:	In order to achieve the stable and precise control of industrial processes with non-linearity， hysteresis， and strong coupling， a new control method based on Local Policy Interaction Exploration-based Deep Deterministic Policy Gradient （LPIE-DDPG） was proposed for the continuous control of deep reinforcement learning. Firstly， the Deep Deterministic Policy Gradient （DDPG） algorithm was used as the control strategy to greatly reduce the phenomena of overshoot and oscillation in the control process. At the same time， the control strategy of original controller was used as the local strategy for searching， and interactive exploration was used as the rule for learning， thereby improving the learning efficiency and stability. Finally， a penicillin fermentation process simulation platform was built under the framework of Gym and the experiments were carried out. Simulation results show that， compared with DDPG， the proposed LPIE-DDPG improves the convergence efficiency by 27.3%； compared with Proportion-Integration-Differentiation （PID）， the proposed LPIE-DDPG has fewer overshoot and oscillation phenomena on temperature control effect， and has the penicillin concentration increased by 3.8% in yield. In conclusion， the proposed method can effectively improve the training efficiency and improve the stability of industrial process control.

Keywords:	industrial process control deep reinforcement learning Deep Deterministic Policy Gradient (DDPG) Local Policy Interaction Exploration (LPIE) penicillin fermentation process

	点击此处可从《计算机应用》浏览原始摘要信息
	点击此处可从《计算机应用》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏