基于分层深度强化学习的移动机器人导航方法 Navigation method for mobile robot based on hierarchical deep reinforcement learning期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于分层深度强化学习的移动机器人导航方法

引用本文：	王童,李骜,宋海荦,刘伟,王明会. 基于分层深度强化学习的移动机器人导航方法[J]. 控制与决策, 2022, 37(11): 2799-2807

作者姓名：	王童李骜宋海荦刘伟王明会

作者单位：	中国科学技术大学信息科学技术学院,合肥230027

基金项目：	中国科学技术大学优秀引进人才基金项目(KY2100000021);国家自然科学基金项目(61971393, 61871361).

摘要：	针对现有基于深度强化学习(deep reinforcement learning, DRL)的分层导航方法在包含长廊、死角等结构的复杂环境下导航效果不佳的问题,提出一种基于option-based分层深度强化学习(hierarchical deep reinforcement learning, HDRL)的移动机器人导航方法.该方法的模型框架分为高层和低层两部分,其中低层的避障和目标驱动控制模型分别实现避障和目标接近两种行为策略,高层的行为选择模型可自动学习稳定、可靠的行为选择策略,从而有效避免对人为设计调控规则的依赖.此外,所提出方法通过对避障控制模型进行优化训练,使学习到的避障策略更加适用于复杂环境下的导航任务.在与现有DRL方法的对比实验中,所提出方法在全部仿真测试环境中均取得最高的导航成功率,同时在其他指标上也具有整体优势,表明所提出方法可有效解决复杂环境下导航效果不佳的问题,且具有较强的泛化能力.此外,真实环境下的测试进一步验证了所提出方法的潜在应用价值.
关键词：	深度强化学习分层深度强化学习移动机器人导航避障策略学习
Navigation method for mobile robot based on hierarchical deep reinforcement learning

WANG Tong,LI Ao,SONG Hai-luo,LIU Wei,WANG Ming-hui. Navigation method for mobile robot based on hierarchical deep reinforcement learning[J]. Control and Decision, 2022, 37(11): 2799-2807

Authors:	WANG Tong LI Ao SONG Hai-luo LIU Wei WANG Ming-hui

Affiliation:	School of Information Science and Technology,University of Science and Technology of China,Hefei 230027,China

Abstract:	In order to solve the problem that existing hierarchical navigation methods based on deep reinforcement learning (DRL) perform poorly in complex environments including the structures like long corridors and dead corners, we propose a navigation method for mobile robots based on option-based hierarchical deep reinforcement learning(HDRL). The framework of the proposed method consists of two level control models: a low level model is to obtain policies for avoiding obstacles and reaching the goal respectively, and a high-level behavior selection model is for automatically learning stable and reliable behavior selection policy, which does not rely on manually designed control rules. In addition, a training method for optimizing the obstacle avoidance control model is proposed, which makes the learned obstacle avoidance policy more suitable for the navigation task in complex environments. In comparison with existing DRL-based navigation methods, the proposed method achieves the highest navigation success rate in all simulated test environments used in this paper and shows better overall performance on other metrics, which demonstrates the proposed method can effectively solve the problem of poor navigation performance in complex environments and has strong generalization ability. Moreover, experiments in real-world environment also verify the potential application value of the proposed method.

Keywords:
本文献已被万方数据等数据库收录！
	点击此处可从《控制与决策》浏览原始摘要信息
	点击此处可从《控制与决策》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏