首页 | 本学科首页   官方微博 | 高级检索  
     

基于局部策略交互探索的深度确定性策略梯度的工业过程控制方法
引用本文:邓绍斌,朱军,周晓锋,李帅,刘舒锐.基于局部策略交互探索的深度确定性策略梯度的工业过程控制方法[J].计算机应用,2022,42(5):1642-1648.
作者姓名:邓绍斌  朱军  周晓锋  李帅  刘舒锐
作者单位:中国科学院 网络化控制系统重点实验室, 沈阳 110016
中国科学院 沈阳自动化研究所, 沈阳 110169
中国科学院 机器人与智能制造创新研究院, 沈阳 110169
中国科学院大学, 北京 100049
基金项目:辽宁省“兴辽英才计划”项目(XLYC1808009)
摘    要:为了实现对非线性、滞后性和强耦合的工业过程稳定精确的控制,提出了一种基于局部策略交互探索的深度确定性策略梯度(LPIE-DDPG)的控制方法用于深度强化学习的连续控制。首先,使用深度确定性策略梯度(DDPG)算法作为控制策略,从而极大地减小控制过程中的超调和振荡现象;同时,使用原控制器的控制策略作为局部策略进行搜索,并以交互探索规则进行学习,提高了学习效率和学习稳定性;最后,在Gym框架下搭建青霉素发酵过程仿真平台并进行实验。仿真结果表明,相较于DDPG,LPIE-DDPG在收敛效率上提升了27.3%;相较于比例-积分-微分(PID),LPIE-DDPG在温度控制效果上有更少的超调和振荡现象,在产量上青霉素浓度提高了3.8%。可见所提方法能有效提升训练效率,同时提高工业过程控制的稳定性。

关 键 词:工业过程控制  深度强化学习  深度确定性策略梯度  局部策略交互探索  青霉素发酵过程  
收稿时间:2021-05-07
修稿时间:2021-09-27

Industrial process control method based on local policy interaction exploration-based deep deterministic policy gradient
Shaobin DENG,Jun ZHU,Xiaofeng ZHOU,Shuai LI,Shurui LIU.Industrial process control method based on local policy interaction exploration-based deep deterministic policy gradient[J].journal of Computer Applications,2022,42(5):1642-1648.
Authors:Shaobin DENG  Jun ZHU  Xiaofeng ZHOU  Shuai LI  Shurui LIU
Affiliation:Key Laboratory of Networked Control System,Chinese Academy of Sciences,Shenyang Liaoning 110016,China
Shenyang Institute of Automation,Chinese Academy of Sciences,Shenyang Liaoning 110169,China
Institutes for Robotics and Intelligent Manufacturing Innovation,Chinese Academy of Sciences,Shenyang Liaoning 110169,China
University of Chinese Academy of Sciences,Beijing 100049,China
Abstract:In order to achieve the stable and precise control of industrial processes with non-linearity, hysteresis, and strong coupling, a new control method based on Local Policy Interaction Exploration-based Deep Deterministic Policy Gradient (LPIE-DDPG) was proposed for the continuous control of deep reinforcement learning. Firstly, the Deep Deterministic Policy Gradient (DDPG) algorithm was used as the control strategy to greatly reduce the phenomena of overshoot and oscillation in the control process. At the same time, the control strategy of original controller was used as the local strategy for searching, and interactive exploration was used as the rule for learning, thereby improving the learning efficiency and stability. Finally, a penicillin fermentation process simulation platform was built under the framework of Gym and the experiments were carried out. Simulation results show that, compared with DDPG, the proposed LPIE-DDPG improves the convergence efficiency by 27.3%; compared with Proportion-Integration-Differentiation (PID), the proposed LPIE-DDPG has fewer overshoot and oscillation phenomena on temperature control effect, and has the penicillin concentration increased by 3.8% in yield. In conclusion, the proposed method can effectively improve the training efficiency and improve the stability of industrial process control.
Keywords:industrial process control  deep reinforcement learning  Deep Deterministic Policy Gradient (DDPG)  Local Policy Interaction Exploration (LPIE)  penicillin fermentation process  
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号