结合状态预测的深度强化学习交通信号控制 State prediction based deep reinforcement learning for traffic signal control期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

结合状态预测的深度强化学习交通信号控制

引用本文：	唐慕尧,周大可,李涛.结合状态预测的深度强化学习交通信号控制[J].计算机应用研究,2022,39(8).

作者姓名：	唐慕尧周大可李涛

作者单位：	南京航空航天大学自动化学院,南京航空航天大学自动化学院,南京航空航天大学自动化学院

基金项目：	国家自然科学基金资助项目(62073164);南京航空航天大学研究生创新基地(实验室)开放基金资助项目(kfjj20200313)

摘要：	深度强化学习（deep reinforcement learning，DRL）可广泛应用于城市交通信号控制领域，但在现有研究中，绝大多数的DRL智能体仅使用当前的交通状态进行决策，在交通流变化较大的情况下控制效果有限。提出一种结合状态预测的DRL信号控制算法。首先，利用独热编码设计简洁且高效的交通状态；然后，使用长短期记忆网络（long short-term memory，LSTM）预测未来的交通状态；最后，智能体根据当前状态和预测状态进行最优决策。在SUMO（simulation of urban mobility）仿真平台上的实验结果表明，在单交叉口、多交叉口的多种交通流量条件下，与三种典型的信号控制算法相比，所提算法在平均等待时间、行驶时间、燃油消耗、CO2排放等指标上都具有最好的性能。
关键词：	交通信号控制状态预测深度强化学习深度Q网络长短期记忆网络
收稿时间：	2021/12/26 0:00:00
修稿时间：	2022/7/27 0:00:00
State prediction based deep reinforcement learning for traffic signal control

TANG Muyao,ZHOU Dake and LI Tao.State prediction based deep reinforcement learning for traffic signal control[J].Application Research of Computers,2022,39(8).

Authors:	TANG Muyao ZHOU Dake and LI Tao

Affiliation:	School of Automation Engineering, Nanjing University of Aeronautics and Astronautics,,

Abstract:	Urban traffic signal control can widely use deep reinforcement learning technique. However, in existing researches, most DRL agents only use the current traffic state to make decisions and have limited control effects when the traffic flow changes greatly. Aiming at the problem, this paper proposed a state prediction based deep reinforcement learning algorithm for traffic signal control. The algorithm used one-hot coding to design a concise and efficient traffic state, and then used a long short-term memory to predict the future state. The agent made optimal decisions based on the current state and the predicted state. The experimental results on the simulation platform SUMO show that compared with three typical signal control algorithms, the proposed algorithm has the best performance in terms of average waiting time, travel time, fuel consumption, CO2 emissions and cumulative reward both in a single intersection and multiple intersections under different flow conditions.

Keywords:	traffic signal control state prediction deep reinforcement learning deep Q network long short-term memory

	点击此处可从《计算机应用研究》浏览原始摘要信息
	点击此处可从《计算机应用研究》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏