首页 | 本学科首页   官方微博 | 高级检索  
     

基于改进近端策略优化算法的草酸钴合成过程优化
引用本文:贾润达,宁文彬,何大阔,褚菲,王福利.基于改进近端策略优化算法的草酸钴合成过程优化[J].控制与决策,2023,38(11):3075-3082.
作者姓名:贾润达  宁文彬  何大阔  褚菲  王福利
作者单位:东北大学 信息科学与工程学院,沈阳 110819;东北大学 流程工业综合自动化国家重点实验室,沈阳 110819;中国矿业大学 信息与控制工程学院,江苏 徐州 221116;中国矿业大学地下空间智能控制教育部工程研究中心,江苏 徐州 221116
基金项目:国家自然科学基金项目(61873049,61733003,62173078,61973304).
摘    要:金属钴被广泛用于电池和金属复合材料,草酸钴合成过程是影响产品质量的关键工序.针对草酸钴平均粒径的优化问题,提出一种基于改进的近端策略优化(PPO)算法的草酸钴合成过程优化方法.首先,根据草酸钴合成过程的优化目标及约束条件设计相应的奖励函数,通过建立过程的马尔科夫决策模型,将优化问题纳入强化学习框架;其次,针对策略网络在训练过程中出现的梯度消失问题,提出将残差网络作为PPO算法的策略网络;最后,针对过程连续状态空间导致PPO算法陷入局部最优策略问题,利用交错模仿学习对初始策略进行改进.将所提出的方法与传统PPO算法进行比较,改进的PPO算法在满足约束条件的同时,具有更好的优化效果和收敛性.

关 键 词:强化学习  近端策略优化  草酸钴合成过程  残差网络  交错模仿学习  间歇过程

Optimization of cobalt oxalate synthesis process based on modified proximal policy optimization algorithm
JIA Run-d,NING Wen-bin,HE Da-kuo,CHU Fei,WANG Fu-li.Optimization of cobalt oxalate synthesis process based on modified proximal policy optimization algorithm[J].Control and Decision,2023,38(11):3075-3082.
Authors:JIA Run-d  NING Wen-bin  HE Da-kuo  CHU Fei  WANG Fu-li
Affiliation:College of Information Science and Engineering,Northeastern University,Shenyang 110819,China;State Key Laboratory of Synthetical Automation for Process Industries,Northeastern University,Shenyang 110819,China;College of Information and Control Engineering,China University of Mining and Technology, Xuzhou 221116,China;Engineering Research Center of Ministry of Education for Intelligent Control of Underground Space,China University of Mining and Technology,Xuzhou 221116,China
Abstract:Cobalt is widely used in batteries and metal composite materials, and the cobalt oxalate synthesis process is a key process that affects product quality. To optimize the average particle size of cobalt oxalate, this work presents an optimization method for the cobalt oxalate synthesis process based on the modified proximal policy optimization (MPPO) algorithm. First, the reward function is designed according to the optimization objectives and constraints of the cobalt oxalate synthesis process, and the optimization problem is incorporated into the reinforcement learning framework by establishing the Markov decision model of the process. Secondly, to deal with the problem of gradient disappearance in the training process of the policy network, the residual network is proposed as the policy network of the PPO algorithm. Finally, to avoid the PPO algorithm falling into the local optimal strategy due to the continuous state space of the process, the initial strategy is improved by interleaved imitation learning. The proposed method is compared with the traditional PPO algorithm, and the modified PPO algorithm has a better optimization effect and convergence while satisfying the constraints.
Keywords:
点击此处可从《控制与决策》浏览原始摘要信息
点击此处可从《控制与决策》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号