融合LSTM和PPO算法的移动机器人视觉导航 Visual navigation of mobile robots based on LSTM and PPO algorithms期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

融合LSTM和PPO算法的移动机器人视觉导航

引用本文：	张仪,冯伟,王卫军,杨之乐,张艳辉,朱子翰,谭勇. 融合LSTM和PPO算法的移动机器人视觉导航[J]. 电子测量与仪器学报, 2022, 36(8): 132-140

作者姓名：	张仪冯伟王卫军杨之乐张艳辉朱子翰谭勇

作者单位：	中国科学院深圳先进技术研究院深圳518055;中国科学院大学北京100049;中国科学院深圳先进技术研究院深圳518055;中国科学院大学北京100049;广东省机器人与智能系统重点实验室深圳 518055;上海诺倬力机电科技有限公司上海 200000

基金项目：	国家自然科学基金联合基金项目（U20A20283）、工信部冰雪器材加工成套装备项目（TC190H47P）、国家自然科学基金联合基金项目（U1813222）、深圳市国际合作研究项目（GJHZ20200731095009029）、广东特支计划科技创新青年拔尖人才项目（2019TQ05Z654）资助

摘要：	为提高移动机器人在无地图情况下的视觉导航能力,提升导航成功率,提出了一种融合长短期记忆神经网络( long shortterm memory, LSTM)和近端策略优化算法(proximal policy optimization, PPO)算法的移动机器人视觉导航模型。首先,该模型融合 LSTM 和 PPO 算法作为视觉导航的网络模型;其次,通过移动机器人动作,与目标距离,运动时间等因素设计奖励函数,用以训练目标;最后,以移动机器人第一视角获得的 RGB-D 图像及目标点的极性坐标为输入,以移动机器人的连续动作值为输出,实现无地图的端到端视觉导航任务,并根据推理到达未接受过训练的新目标。对比前序算法,该模型在模拟环境中收敛速度更快,旧目标的导航成功率平均提高 17. 7%,新目标的导航成功率提高 23. 3%,具有较好的导航性能。
关键词：	近端策略优化算法长短期记忆神经网络视觉导航
Visual navigation of mobile robots based on LSTM and PPO algorithms

Zhang Yi,Feng Wei,Wang Weijun,Yang Zhile,Zhang Yanhui,Zhu Zihan,Tan Yong. Visual navigation of mobile robots based on LSTM and PPO algorithms[J]. Journal of Electronic Measurement and Instrument, 2022, 36(8): 132-140

Authors:	Zhang Yi Feng Wei Wang Weijun Yang Zhile Zhang Yanhui Zhu Zihan Tan Yong

Affiliation:	1. Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences,2. University of Chinese Academy of Sciences;1. Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences,2. University of Chinese Academy of Sciences,3. Guangdong Provincial Key Lab of Robotics and Intelligent System, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences; 4. Shanghai Nozoli MachineTools Technology Co. , Ltd.

Abstract:	In order to improve the visual navigation ability of mobile robots without maps and improve the success rate of visualnavigation, a visual navigation model of mobile robots is proposed that integrates long short term memory (LSTM) and proximal policyoptimization (PPO) algorithms. Firstly, the model integrates LSTM and PPO as a network model for visual navigation. Secondly, a newreward function is designed to train the target through factors such as the action of mobile robots, the distance between the robots and thetarget, and the running time of robots. Finally, the RGB-D image obtained from the first perspective of mobile robots and the polarcoordinates of the target in mobile robots coordinate system are used as the model input, and the continuous motion of mobile robots isused as the model output to realize the task of end-to-end visual navigation without maps, and the new target that has not been trained isreached according to the model inference. Compared with the pre-order algorithms, the model has an average increase of 17. 7% in thenavigation success rate of the old target and 23. 3% of the new target in simulated environments, which has better navigationperformance.

Keywords:	proximal policy optimization algorithms long short term memory visual navigation
本文献已被万方数据等数据库收录！
	点击此处可从《电子测量与仪器学报》浏览原始摘要信息
	点击此处可从《电子测量与仪器学报》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏