首页 | 本学科首页   官方微博 | 高级检索  
     

基于元强化学习的自动列车定速控制
引用本文:颜罡,赵斐然,叶锋,吴俊博,游科友.基于元强化学习的自动列车定速控制[J].控制理论与应用,2022,39(10):1807-1814.
作者姓名:颜罡  赵斐然  叶锋  吴俊博  游科友
作者单位:中车株洲电力机车有限公司和大功率交流传动电力机车系统集成国家重点实验室,清华大学自动化系,中车株洲电力机车有限公司,中车株洲电力机车有限公司,清华大学自动化系
基金项目:国家自然科学基金重点项目
摘    要:本文考虑自动列车在路况变化下的定速控制问题. 由于铁路路况的复杂以及列车动力学的不确定性, 基于模型的控制器难以稳定、快速、精确地进行定速控制. 我们提出了一种无模型控制器, 其只需要很少的列车运行数据即可适应新的路况. 首先, 我们将列车的定速控制问题建模为一系列转移概率未知的静态连续马尔可夫过程. 然后, 我们应用元强化学习去求解该马尔可夫过程, 得到自适应神经网络控制器. 仿真说明该无模型控制器能够高效地进行定速控制, 并能迅速适应新的环境, 同时满足系统约束.

关 键 词:定速控制  马尔可夫过程  强化学习  元学习  神经网络
收稿时间:2021/7/6 0:00:00
修稿时间:2022/5/6 0:00:00

Meta-reinforcement learning based velocity regulation for automatic train operation
YAN Gang,ZHAO Fei-ran,YE Feng,WU Jun-bo and YOU Ke-you.Meta-reinforcement learning based velocity regulation for automatic train operation[J].Control Theory & Applications,2022,39(10):1807-1814.
Authors:YAN Gang  ZHAO Fei-ran  YE Feng  WU Jun-bo and YOU Ke-you
Affiliation:CRRC Zhuzhou Locomotive Co Ltd and The State Key Laboratory of Heavy Duty AC Drive Electric Locomotive Systems Integration,Tsinghua University, Department of Automation,CRRC Zhuzhou Locomotive Co Ltd,CRRC Zhuzhou Locomotive Co Ltd,Tsinghua University, Department of Automation
Abstract:This paper considers the velocity regulation problem for the automatic train operation system under time-variant railway conditions. Due to complex environment and uncertainites in system dynamics, this problem cannot be well solved by most model-based controllers. To this end, we propose a model-free controller, which only requires a ``small'' amount of data to adapt to the new railway condition. First, we formulate the velocity regulation problem for the automatic train as a sequence of stationary and continuous Markov decision processes (MDPs) with unknown transition probabilities. Then, we adopt the meta-reinforcement learning framework to solve the MDPs and to train an initial neural-network controller, which is able to adapt to new environment quickly using observed data. Finally, We illustrate via simulations that our model-free controller can regulate the train to the desired velocity and well adapt to the time-variant railway conditions, while satisfying the constraints in the dyamical system. Moreover, the experiments also show the robustness of our controller under uncertain dynamics.
Keywords:Velocity regulation  Markov decision procession  Reinforcement learning  Meta-learning  Neural network
点击此处可从《控制理论与应用》浏览原始摘要信息
点击此处可从《控制理论与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号