首页 | 本学科首页   官方微博 | 高级检索  
     

基于策略梯度强化学习的高铁列车动态调度方法
引用本文:俞胜平,韩忻辰,袁志明,崔东亮. 基于策略梯度强化学习的高铁列车动态调度方法[J]. 控制与决策, 2022, 37(9): 2407-2417
作者姓名:俞胜平  韩忻辰  袁志明  崔东亮
作者单位:东北大学 流程工业综合自动化国家重点实验室,沈阳 110004hspace{3pt};中国铁道科学研究院集团有限公司 通信信号研究所,北京 100081
基金项目:国家自然科学基金项目(U1834211,61790574,61603262,61773269);辽宁省自然科学基金项目(2020-MS-093).
摘    要:高速铁路以其运输能力大、速度快、全天候等优势,取得了飞速蓬勃的发展.而恶劣天气等突发事件会导致列车延误晚点,更甚者延误会沿着路网不断传播扩散,其带来的多米诺效应将造成大面积列车无法按计划运行图运行.目前依靠人工经验的动态调度方式难以满足快速优化调整的实际要求.因此,针对突发事件造成高铁列车延误晚点的动态调度问题,设定所有列车在各站到发时间晚点总和最小为优化目标,构建高铁列车可运行情况下的混合整数非线性规划模型,提出基于策略梯度强化学习的高铁列车动态调度方法,包括交互环境建立、智能体状态及动作集合定义、策略网络结构及动作选择方法和回报函数建立,并结合具体问题对策略梯度强化学习(REINFORCE)算法进行误差放大和阈值设定两种改进.最后对算法收敛性及算法改进后的性能提升进行仿真研究,并与Q-learning算法进行比较,结果表明所提出的方法可以有效地对高铁列车进行动态调度,将突发事件带来的延误影响降至最小,从而提高列车的运行效率.

关 键 词:高铁列车  突发扰动  动态调度  强化学习  策略梯度  策略梯度强化学习

A policy gradient reinforcement learning algorithm for high-speed railway dynamic scheduling
YU Sheng-ping,HAN Xin-chen,YUAN Zhi-ming,CUI Dong-liang. A policy gradient reinforcement learning algorithm for high-speed railway dynamic scheduling[J]. Control and Decision, 2022, 37(9): 2407-2417
Authors:YU Sheng-ping  HAN Xin-chen  YUAN Zhi-ming  CUI Dong-liang
Affiliation:State Key Laboratory of Synthetical Automation for Process Industries,Northeastern University,Shenyang 110004,China;Signal & Communication Reseach Institute,China Academy of Railway Sciences Co., Ltd,Beijing 100081,China
Abstract:The high-speed railway has achieved vigorous development in recent years due to its advantages of large transport capacity, fast speed and all-weather. But unexpected events such as bad weather will cause train delays, and even the delay will continue to spread along the road network. The domino effect will cause large-area trains to fail to operate according to the plan. At present, the dynamic scheduling method relying on manual experience is difficult to meet the actual requirements. Therefore, this paper aims at the problem of dynamic scheduling of high-speed train, setting the minimum sum of the delays of all trains at each station as the optimization goal. At the same time, a mixed-integer nonlinear programming (MINLP) model under traversable conditions is constructed, and a policy gradient reinforcement learning method is proposed including establishment of environment, definition of state and action set, policy network, action selection method, reward function and combined with the specific problems, the error amplification and threshold setting of REINFORCE algorithm are improved. Finally, the convergence and the performance improvement of the algorithm are studied and compared with the Q-learning algorithm. The results show that the method proposed in this paper can effectively reschedule high-speed trains, minimize the impact of delays, and improve the efficiency of train operation.
Keywords:
点击此处可从《控制与决策》浏览原始摘要信息
点击此处可从《控制与决策》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号