首页 | 本学科首页   官方微博 | 高级检索  
     

改进强化学习算法应用于移动机器人路径规划
引用本文:王科银,石振,杨正才,杨亚会,王思山. 改进强化学习算法应用于移动机器人路径规划[J]. 计算机工程与应用, 2021, 57(18): 270-274. DOI: 10.3778/j.issn.1002-8331.2011-0414
作者姓名:王科银  石振  杨正才  杨亚会  王思山
作者单位:1.湖北汽车工业学院 汽车工程学院,湖北 十堰 4420022.汽车动力传动与电子控制湖北省重点实验室(湖北汽车工业学院),湖北 十堰 4420023.湖北汽车工业学院 汽车工程师学院,湖北 十堰 442002
摘    要:为了解决传统的强化学习算法应用于移动机器人未知环境的路径规划时存在收敛速度慢、迭代次数多、收敛结果不稳定等问题,提出一种改进的Q-learning算法。在状态初始化时引入人工势场法,使得越靠近目标位置状态值越大,从而引导智能体朝目标位置移动,减少算法初始阶段因对环境探索产生的大量无效迭代;在智能体选择动作时改进[ε]-贪婪策略,根据算法的收敛程度动态调整贪婪因子[ε],从而更好地平衡探索和利用之间的关系,在加快算法收敛速度的同时提高收敛结果的稳定性。基于Python的Tkinter标准化库搭建的格栅地图仿真结果表明,改进的Q-learning算法相较于传统算法在路径规划时间上缩短85.1%,收敛前迭代次数减少74.7%,同时算法的收敛结果稳定性也得到了提升。

关 键 词:强化学习  人工势场  贪婪策略  移动机器人  路径规划  

Path Planning for Mobile Robot Using Improved Reinforcement Learning Algorithm
WANG Keyin,SHI Zhen,YANG Zhengcai,YANG Yahui,WANG Sishan. Path Planning for Mobile Robot Using Improved Reinforcement Learning Algorithm[J]. Computer Engineering and Applications, 2021, 57(18): 270-274. DOI: 10.3778/j.issn.1002-8331.2011-0414
Authors:WANG Keyin  SHI Zhen  YANG Zhengcai  YANG Yahui  WANG Sishan
Affiliation:1.School of Automotive Engineering, Hubei University of Automotive Technology, Shiyan, Hubei 442002, China2.Key Laboratory of Automotive Power Train and Electronics(Hubei University of Automotive Technology), Shiyan, Hubei 442002, China3.Institute of Automotive Engineers, Hubei University of Automotive Technology, Shiyan, Hubei 442002, China
Abstract:In order to solve the problem of slow convergence, large number of iterations and unstable convergence when the mobile robot plans a path in unknown environment by using traditional reinforcement learning algorithm, an improved Q-learning algorithm is proposed. The artificial potential field is used to initialize the state value to make the larger state value closer to the target position, so as to guide the agent to move towards the target position. In the early stage of the algorithm, a large number of invalid iterations due to the environment exploration are reduced. The improved [ε]-greedy strategy is employed as agent’s action selection, the greedy factor [ε] is adjusted dynamically according to the convergence degree of the algorithm so as to balance the relationship between exploration and exploitation better and accelerate the convergence rate of the algorithm and improve the stability of the convergence results. The proposed algorithm is simulated and verified in the grid map based on Python Tkinter standardized library. Simulation results show that, compared with the traditional Q-learning algorithm, the planning time of improved Q-learning algorithm is reduced by 85.1%, the number of iterations is reduced by 74.7% before convergence, and the stability of the convergence results is greatly improved.
Keywords:reinforcement learning  artificial potential field  greedy strategy  mobile robots  path planning  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号