首页 | 本学科首页   官方微博 | 高级检索  
     

基于Q学习的异构多智能体系统最优一致性
引用本文:程薇燃,李金娜.基于Q学习的异构多智能体系统最优一致性[J].辽宁石油化工大学学报,2022,42(4):59.
作者姓名:程薇燃  李金娜
作者单位:辽宁石油化工大学 信息与控制工程学院,辽宁 抚顺 113001
基金项目:国家自然科学基金项目(62073158);辽宁省重点领域联合开放基金项目(2019?KF?03?06);辽宁省教育厅基本科研项目(LJKZ0401);辽宁石油化工大学研究基金项目(2018XJJ?005)
摘    要:对有领导者的异构离散多智能体系统的最优一致性问题,提出了一种无模型的基于非策略强化学习的控制协议设计方法。由于异构多智能体系统的状态矩阵不同,其局部邻居误差的动态表达式比较复杂。与现有的多智能体系统分布式控制方案相比,所提算法减少了计算的复杂性。首先,建立由增广变量构造的多智能体系统全局邻居误差动态表达式。其次,通过二次型形式的值函数得到耦合贝尔曼方程和Hamilton?Jacobi?Bellman(HJB)方程。再次,求解耦合HJB方程的最优解,得到多智能体最优一致性的纳什均衡解,并给出纳什均衡证明。从次,基于无模型的非策略Q学习算法,求解多智能体最优一致性的纳什均衡解。最后,利用批判神经网络结构,结合梯度下降法实现了所提出的算法,并通过仿真实例验证了算法的有效性。

关 键 词:多智能体系统  神经网络  强化学习  最优一致性  
收稿时间:2022-06-08

Optimal Consensus of Heterogeneous Multi?Agent Systems Based on Q?Learning
Weiran Cheng,Jinna Li.Optimal Consensus of Heterogeneous Multi?Agent Systems Based on Q?Learning[J].Journal of Liaoning University of Petroleum & Chemical Technology,2022,42(4):59.
Authors:Weiran Cheng  Jinna Li
Affiliation:School of Information and Control Engineering,Liaoning Petrochemical University,Fushun Liaoning 113001,China
Abstract:This paper proposes a model?free control protocol design method based on off?policy reinforcement learning for solving the optimal consensus problem of heterogeneous multi?agent systems with leaders. The dynamic expression of local neighborhood error is complicated for the heterogeneous multi?agent systems because of its different system state matrices. Compared with the existing solution of designing observer for distributed control of multi?agent system, the method of solving global neighborhood error state expression proposed in this paper reduces the complexity of calculation. Firstly, the dynamic expression of global neighborhood error of multi?agent system constructed from augmented variables is established. Secondly, the coupled Bellman equation and HJB equation are obtained through the value function of quadratic form. Then, the Nash equilibrium solution of the multi?agent optimal consensus is obtained by solving the optimal solution of the coupled HJB equation, and the Nash equilibrium proof is given. Thirdly, an off?policy Q?learning algorithm is proposed to learn the Nash equilibrium solution of the multi?agent optimal consensus. Then, the proposed algorithm is implemented by using the critic neural network structure and gradient descent method. Finally, a simulation example is given to verify the effectiveness of the proposed algorithm.
Keywords:Multi?agent system  Neural network  Reinforcement learning  Optimal consensus  
点击此处可从《辽宁石油化工大学学报》浏览原始摘要信息
点击此处可从《辽宁石油化工大学学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号