首页 | 本学科首页   官方微博 | 高级检索  
     

基于强化学习的交通情景问题决策优化
引用本文:罗飞,白梦伟. 基于强化学习的交通情景问题决策优化[J]. 计算机应用, 2022, 42(8): 2361-2368. DOI: 10.11772/j.issn.1001-9081.2021061012
作者姓名:罗飞  白梦伟
作者单位:华东理工大学 计算机科学与工程系,上海 200237
基金项目:上海市2020年度“科技创新行动计划”项目(20DZ1201400)
摘    要:在复杂交通情景中求解出租车路径规划决策问题和交通信号灯控制问题时,传统强化学习算法在收敛速度和求解精度上存在局限性;因此提出一种改进的强化学习算法求解该类问题。首先,通过优化的贝尔曼公式和快速Q学习(SQL)机制,以及引入经验池技术和直接策略,提出一种改进的强化学习算法GSQL-DSEP;然后,利用GSQL-DSEP算法分别优化出租车路径规划决策问题中的路径长度与交通信号灯控制问题中的车辆总等待时间。相较于Q学习、快速Q学习(SQL)、、广义快速Q学习(GSQL)、Dyna-Q算法,GSQL-DSEP算法在性能测试中降低了至少18.7%的误差,在出租车路径规划决策问题中使决策路径长度至少缩短了17.4%,在交通信号灯控制问题中使车辆总等待时间最多减少了51.5%。实验结果表明,相较于对比算法,GSQL-DSEP算法对解决交通情景问题更具优势。

关 键 词:强化学习  交通情景  经验池  马尔可夫决策过程  决策优化  
收稿时间:2021-06-10
修稿时间:2021-10-13

Decision optimization of traffic scenario problem based on reinforcement learning
Fei LUO,Mengwei BAI. Decision optimization of traffic scenario problem based on reinforcement learning[J]. Journal of Computer Applications, 2022, 42(8): 2361-2368. DOI: 10.11772/j.issn.1001-9081.2021061012
Authors:Fei LUO  Mengwei BAI
Affiliation:Department of Computer Science and Engineering,East China University of Science and Technology,Shanghai 200237,China
Abstract:The traditional reinforcement learning algorithm has limitations in convergence speed and solution accuracy when solving the taxi path planning problem and the traffic signal control problem in traffic scenarios. Therefore, an improved reinforcement learning algorithm was proposed to solve this kind of problems. Firstly, by applying the optimized Bellman equation and Speedy Q-Learning (SQL) mechanism, and introducing experience pool technology and direct strategy, an improved reinforcement learning algorithm, namely Generalized Speedy Q-Learning with Direct Strategy and Experience Pool (GSQL-DSEP), was proposed. Then, GSQL-DSEP algorithm was applied to optimize the path length in the taxi path planning decision problem and the total waiting time of vehicles in the traffic signal control problem. The error of GSQL-DSEP algorithm was reduced at least 18.7% than those of the algorithms such as Q-learning, SQL, Generalized Speedy Q-Learning (GSQL) and Dyna-Q, the decision path length determined by GSQL-DSEP algorithm was reduced at least 17.4% than those determined by the compared algorithms, and the total waiting time of vehicles determined by GSQL-DSEP algorithm was reduced at most 51.5% than those determined by compared algorithms for the traffic signal control problem. Experimental results show that, GSQL-DSEP algorithm has advantages in solving traffic scenario problems over the compared algorithms.
Keywords:reinforcement learning  traffic scenario  experience pool  Markov Decision Process (MDP)  decision optimization  
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号