基于元强化学习的自动列车定速控制 Meta-reinforcement learning based velocity regulation for automatic train operation期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于元强化学习的自动列车定速控制

引用本文：	颜罡,赵斐然,叶锋,吴俊博,游科友.基于元强化学习的自动列车定速控制[J].控制理论与应用,2022,39(10):1807-1814.

作者姓名：	颜罡赵斐然叶锋吴俊博游科友

作者单位：	中车株洲电力机车有限公司和大功率交流传动电力机车系统集成国家重点实验室,清华大学自动化系,中车株洲电力机车有限公司,中车株洲电力机车有限公司,清华大学自动化系

基金项目：	国家自然科学基金重点项目

摘要：	本文考虑自动列车在路况变化下的定速控制问题. 由于铁路路况的复杂以及列车动力学的不确定性, 基于模型的控制器难以稳定、快速、精确地进行定速控制. 我们提出了一种无模型控制器, 其只需要很少的列车运行数据即可适应新的路况. 首先, 我们将列车的定速控制问题建模为一系列转移概率未知的静态连续马尔可夫过程. 然后, 我们应用元强化学习去求解该马尔可夫过程, 得到自适应神经网络控制器. 仿真说明该无模型控制器能够高效地进行定速控制, 并能迅速适应新的环境, 同时满足系统约束.
关键词：	定速控制马尔可夫过程强化学习元学习神经网络
收稿时间：	2021/7/6 0:00:00
修稿时间：	2022/5/6 0:00:00
Meta-reinforcement learning based velocity regulation for automatic train operation

YAN Gang,ZHAO Fei-ran,YE Feng,WU Jun-bo and YOU Ke-you.Meta-reinforcement learning based velocity regulation for automatic train operation[J].Control Theory & Applications,2022,39(10):1807-1814.

Authors:	YAN Gang ZHAO Fei-ran YE Feng WU Jun-bo and YOU Ke-you

Affiliation:	CRRC Zhuzhou Locomotive Co Ltd and The State Key Laboratory of Heavy Duty AC Drive Electric Locomotive Systems Integration,Tsinghua University, Department of Automation,CRRC Zhuzhou Locomotive Co Ltd,CRRC Zhuzhou Locomotive Co Ltd,Tsinghua University, Department of Automation

Abstract:	This paper considers the velocity regulation problem for the automatic train operation system under time-variant railway conditions. Due to complex environment and uncertainites in system dynamics, this problem cannot be well solved by most model-based controllers. To this end, we propose a model-free controller, which only requires a ``small'' amount of data to adapt to the new railway condition. First, we formulate the velocity regulation problem for the automatic train as a sequence of stationary and continuous Markov decision processes (MDPs) with unknown transition probabilities. Then, we adopt the meta-reinforcement learning framework to solve the MDPs and to train an initial neural-network controller, which is able to adapt to new environment quickly using observed data. Finally, We illustrate via simulations that our model-free controller can regulate the train to the desired velocity and well adapt to the time-variant railway conditions, while satisfying the constraints in the dyamical system. Moreover, the experiments also show the robustness of our controller under uncertain dynamics.

Keywords:	Velocity regulation Markov decision procession Reinforcement learning Meta-learning Neural network

	点击此处可从《控制理论与应用》浏览原始摘要信息
	点击此处可从《控制理论与应用》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏