基于 actor-critic 框架的在线积分强化学习算法研究 Research on online integral reinforcement learning algorithm based on actor-critic framework期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于 actor-critic 框架的在线积分强化学习算法研究

引用本文：	蔡军,苟文耀,刘颜.基于 actor-critic 框架的在线积分强化学习算法研究[J].电子测量与仪器学报,2023,37(3):194-201.

作者姓名：	蔡军苟文耀刘颜

作者单位：	1.重庆邮电大学自动化学院

基金项目：	重庆市教委科学技术研究项目(KJZD-M202200603)、重庆市自然科学基金项目(CSTB2022NSCQ-MSX0380)资助

摘要：	针对轮式移动机器人动力学系统难以实现无模型的最优跟踪控制问题，提出了一种基于actor-critic框架的在线积分强化学习控制算法。首先，构建RBF评价神经网络并基于近似贝尔曼误差设计该网络的权值更新律，以拟合二次型跟踪控制性能指标函数。其次，构建RBF行为神经网络并以最小化性能指标函数为目标设计权值更新律，补偿动力学系统中的未知项。最后，通过Lyapunov理论证明了所提出的积分强化学习控制算法可以使得价值函数，行为神经网络权值误差与评价神经网络权值误差一致最终有界。仿真和实验结果表明，该算法不仅可以实现对恒定速度以及时变速度的跟踪，还可以在嵌入式平台上进行实现。
关键词：	积分强化学习 RBF神经网络非线性仿射系统跟踪控制
Research on online integral reinforcement learning algorithm based on actor-critic framework

Cai Jun,Gou Wenyao,Liu Yan.Research on online integral reinforcement learning algorithm based on actor-critic framework[J].Journal of Electronic Measurement and Instrument,2023,37(3):194-201.

Authors:	Cai Jun Gou Wenyao Liu Yan

Affiliation:	1.School of Automation, Chongqing University of Posts and Telecommunications

Abstract:	For the problem that it is difficult to achieve model-free optimal tracking control in the dynamic system of wheeled mobile robot, a new online integral reinforcement learning control algorithm based on actor-critic framework is proposed in this paper. Firstly, the critic neural network based on RBF is constructed to fit the quadratic tracking control performance index function and the weight updating law of the network is designed based on the approximate Behrman error. Secondly, the RBF actor neural network is constructed to compensate the unknown terms in the dynamic system and the weight updating law is designed to minimize the performance index function. Finally, it is proved by Lyapunov theory that the proposed integral reinforcement learning control algorithm can make the value function, the critic and actor neural network weights error uniformly and finally bounded. Simulation and experimental results show that the algorithm not only realizes the tracking of constant or time-varying velocity, but also can be implemented on the embedded platform.

Keywords:	integral reinforcement learning RBF neural network nonlinear affine system tracking control

	点击此处可从《电子测量与仪器学报》浏览原始摘要信息
	点击此处可从《电子测量与仪器学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏