首页 | 本学科首页   官方微博 | 高级检索  
     

聚类与信息共享的多智能体深度强化学习协同控制交通灯
引用本文:杜同春, 王波, 程浩然, 罗乐, 曾能民. 聚类与信息共享的多智能体深度强化学习协同控制交通灯[J]. 电子与信息学报, 2024, 46(2): 538-545. doi: 10.11999/JEIT230857
作者姓名:杜同春  王波  程浩然  罗乐  曾能民
作者单位:1.安徽师范大学计算机与信息学院 芜湖 241008;;2.哈尔滨工程大学经济管理学院 哈尔滨 150001
摘    要:该文提出一种适用于多路口交通灯实时控制的多智能体深度循环Q-网络(MADRQN),目的是提高多个路口的联合控制效果。该方法将交通灯控制建模成马尔可夫决策过程,将每个路口的控制器作为智能体,根据位置和观测信息对智能体聚类,然后在聚类内部进行信息共享和中心化训练,并在每个训练过程结束时将评价值最高的值函数网络参数分享给其它智能体。在城市交通仿真软件(SUMO)下的仿真实验结果表明,所提方法能够减少通信的数据量,使得智能体之间的信息共享和中心化训练更加可行和高效,车辆平均等待时长少于当前最优的基于多智能体深度强化学习的交通灯控制方法,能够有效地缓解交通拥堵。

关 键 词:交通信号灯协同控制   集中训练分散执行   强化学习智能体聚类   生长型神经气   深度循环Q网络
收稿时间:2023-08-08
修稿时间:2023-12-04

Multi-Agent Deep Reinforcement Learning with Clustering and Information Sharing for Traffic Light Cooperative Control
DU Tongchun, WANG Bo, CHENG Haoran, LUO Le, ZENG Nengmin. Multi-Agent Deep Reinforcement Learning with Clustering and Information Sharing for Traffic Light Cooperative Control[J]. Journal of Electronics & Information Technology, 2024, 46(2): 538-545. doi: 10.11999/JEIT230857
Authors:DU Tongchun  WANG Bo  CHENG Haoran  LUO Le  ZENG Nengmin
Affiliation:1. School of Computer and Information, Anhui Normal University, Wuhu 241008, China;;2. College of Economics and Management, Harbin Engineering University, Harbin 150001, China
Abstract:In order to improve the joint control effect of multi-crossing, Multi-Agent Deep Recurrent Q-Network (MADRQN) for real-time control of multi-intersection traffic signals is proposed in this paper. Firstly, the traffic light control is modeled as a Markov decision process, wherein one controller at each crossing is considered as an agent. Secondly, agents are clustered according to their position and observation. Then, information sharing and centralized training are conducted within each cluster. Also the value function network parameters of agents with the highest critic value are shared with other agent at the end of every training process. The simulated experimental results under Simulation of Urban MObility (SUMO) show that the proposed method can reduce the amount of communication data, make information sharing of agents and centralized training more feasible and efficient. The average delay of vehicles is reduced obviously compared with the state-of-the-art traffic light control methods based on multi-agent deep reinforcement learning. The proposed method can effectively alleviate traffic congestion.
Keywords:Traffic light cooperative control  Centralized training with decentralized execution  Reinforcement learning agent cluster  Growing neural gas  Deep recurrent Q-network
点击此处可从《电子与信息学报》浏览原始摘要信息
点击此处可从《电子与信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号