改进强化学习算法应用于移动机器人路径规划 Path Planning for Mobile Robot Using Improved Reinforcement Learning Algorithm期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

改进强化学习算法应用于移动机器人路径规划

引用本文：	王科银,石振,杨正才,杨亚会,王思山. 改进强化学习算法应用于移动机器人路径规划[J]. 计算机工程与应用, 2021, 57(18): 270-274. DOI: 10.3778/j.issn.1002-8331.2011-0414

作者姓名：	王科银石振杨正才杨亚会王思山

作者单位：	1.湖北汽车工业学院汽车工程学院，湖北十堰 4420022.汽车动力传动与电子控制湖北省重点实验室（湖北汽车工业学院），湖北十堰 4420023.湖北汽车工业学院汽车工程师学院，湖北十堰 442002

摘要：	为了解决传统的强化学习算法应用于移动机器人未知环境的路径规划时存在收敛速度慢、迭代次数多、收敛结果不稳定等问题，提出一种改进的Q-learning算法。在状态初始化时引入人工势场法，使得越靠近目标位置状态值越大，从而引导智能体朝目标位置移动，减少算法初始阶段因对环境探索产生的大量无效迭代；在智能体选择动作时改进[ε]-贪婪策略，根据算法的收敛程度动态调整贪婪因子[ε]，从而更好地平衡探索和利用之间的关系，在加快算法收敛速度的同时提高收敛结果的稳定性。基于Python的Tkinter标准化库搭建的格栅地图仿真结果表明，改进的Q-learning算法相较于传统算法在路径规划时间上缩短85.1%，收敛前迭代次数减少74.7%，同时算法的收敛结果稳定性也得到了提升。
关键词：	强化学习人工势场贪婪策略移动机器人路径规划
Path Planning for Mobile Robot Using Improved Reinforcement Learning Algorithm

WANG Keyin,SHI Zhen,YANG Zhengcai,YANG Yahui,WANG Sishan. Path Planning for Mobile Robot Using Improved Reinforcement Learning Algorithm[J]. Computer Engineering and Applications, 2021, 57(18): 270-274. DOI: 10.3778/j.issn.1002-8331.2011-0414

Authors:	WANG Keyin SHI Zhen YANG Zhengcai YANG Yahui WANG Sishan

Affiliation:	1.School of Automotive Engineering, Hubei University of Automotive Technology, Shiyan, Hubei 442002, China2.Key Laboratory of Automotive Power Train and Electronics（Hubei University of Automotive Technology）, Shiyan, Hubei 442002, China3.Institute of Automotive Engineers, Hubei University of Automotive Technology, Shiyan, Hubei 442002, China

Abstract:	In order to solve the problem of slow convergence, large number of iterations and unstable convergence when the mobile robot plans a path in unknown environment by using traditional reinforcement learning algorithm, an improved Q-learning algorithm is proposed. The artificial potential field is used to initialize the state value to make the larger state value closer to the target position, so as to guide the agent to move towards the target position. In the early stage of the algorithm, a large number of invalid iterations due to the environment exploration are reduced. The improved [ε]-greedy strategy is employed as agent’s action selection, the greedy factor [ε] is adjusted dynamically according to the convergence degree of the algorithm so as to balance the relationship between exploration and exploitation better and accelerate the convergence rate of the algorithm and improve the stability of the convergence results. The proposed algorithm is simulated and verified in the grid map based on Python Tkinter standardized library. Simulation results show that, compared with the traditional Q-learning algorithm, the planning time of improved Q-learning algorithm is reduced by 85.1%, the number of iterations is reduced by 74.7% before convergence, and the stability of the convergence results is greatly improved.

Keywords:	reinforcement learning artificial potential field greedy strategy mobile robots path planning
本文献已被万方数据等数据库收录！
	点击此处可从《计算机工程与应用》浏览原始摘要信息
	点击此处可从《计算机工程与应用》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏