基于改进近端策略优化算法的草酸钴合成过程优化<i class="icon-zqcb"></i> Optimization of cobalt oxalate synthesis process based on modified proximal policy optimization algorithm期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于改进近端策略优化算法的草酸钴合成过程优化

引用本文：	贾润达,宁文彬,何大阔,褚菲,王福利.基于改进近端策略优化算法的草酸钴合成过程优化[J].控制与决策,2023,38(11):3075-3082.

作者姓名：	贾润达宁文彬何大阔褚菲王福利

作者单位：	东北大学信息科学与工程学院,沈阳 110819;东北大学流程工业综合自动化国家重点实验室,沈阳 110819;中国矿业大学信息与控制工程学院,江苏徐州 221116;中国矿业大学地下空间智能控制教育部工程研究中心,江苏徐州 221116

基金项目：	国家自然科学基金项目(61873049,61733003,62173078,61973304).

摘要：	金属钴被广泛用于电池和金属复合材料,草酸钴合成过程是影响产品质量的关键工序.针对草酸钴平均粒径的优化问题,提出一种基于改进的近端策略优化(PPO)算法的草酸钴合成过程优化方法.首先,根据草酸钴合成过程的优化目标及约束条件设计相应的奖励函数,通过建立过程的马尔科夫决策模型,将优化问题纳入强化学习框架;其次,针对策略网络在训练过程中出现的梯度消失问题,提出将残差网络作为PPO算法的策略网络;最后,针对过程连续状态空间导致PPO算法陷入局部最优策略问题,利用交错模仿学习对初始策略进行改进.将所提出的方法与传统PPO算法进行比较,改进的PPO算法在满足约束条件的同时,具有更好的优化效果和收敛性.
关键词：	强化学习近端策略优化草酸钴合成过程残差网络交错模仿学习间歇过程
Optimization of cobalt oxalate synthesis process based on modified proximal policy optimization algorithm

JIA Run-d,NING Wen-bin,HE Da-kuo,CHU Fei,WANG Fu-li.Optimization of cobalt oxalate synthesis process based on modified proximal policy optimization algorithm[J].Control and Decision,2023,38(11):3075-3082.

Authors:	JIA Run-d NING Wen-bin HE Da-kuo CHU Fei WANG Fu-li

Affiliation:	College of Information Science and Engineering,Northeastern University,Shenyang 110819,China;State Key Laboratory of Synthetical Automation for Process Industries,Northeastern University,Shenyang 110819,China;College of Information and Control Engineering,China University of Mining and Technology, Xuzhou 221116,China;Engineering Research Center of Ministry of Education for Intelligent Control of Underground Space,China University of Mining and Technology,Xuzhou 221116,China

Abstract:	Cobalt is widely used in batteries and metal composite materials, and the cobalt oxalate synthesis process is a key process that affects product quality. To optimize the average particle size of cobalt oxalate, this work presents an optimization method for the cobalt oxalate synthesis process based on the modified proximal policy optimization (MPPO) algorithm. First, the reward function is designed according to the optimization objectives and constraints of the cobalt oxalate synthesis process, and the optimization problem is incorporated into the reinforcement learning framework by establishing the Markov decision model of the process. Secondly, to deal with the problem of gradient disappearance in the training process of the policy network, the residual network is proposed as the policy network of the PPO algorithm. Finally, to avoid the PPO algorithm falling into the local optimal strategy due to the continuous state space of the process, the initial strategy is improved by interleaved imitation learning. The proposed method is compared with the traditional PPO algorithm, and the modified PPO algorithm has a better optimization effect and convergence while satisfying the constraints.

Keywords:

	点击此处可从《控制与决策》浏览原始摘要信息
	点击此处可从《控制与决策》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏