首页 | 本学科首页   官方微博 | 高级检索  
     

基于目标导向行为和空间拓扑记忆的视觉导航方法
引用本文:阮晓钢,李鹏,朱晓庆,刘鹏飞.基于目标导向行为和空间拓扑记忆的视觉导航方法[J].计算机学报,2021,44(3):594-608.
作者姓名:阮晓钢  李鹏  朱晓庆  刘鹏飞
作者单位:北京工业大学信息学部 北京 100124;计算智能与智能系统北京市重点实验室 北京 100124
基金项目:北京市自然科学基金;北京市教 育委员会科技计划一般项目;本课题得到国家自然科学基金
摘    要:针对在具有动态因素且视觉丰富环境中的导航问题,受路标机制空间记忆方式启发,提出一种可同步学习目标导向行为和记忆空间结构的视觉导航方法.首先,为直接从原始输入中学习控制策略,以深度强化学习为基本导航框架,同时添加碰撞预测作为模型辅助任务;然后,在智能体学习导航过程中,利用时间相关性网络祛除冗余观测及寻找导航节点,实现通过情景记忆递增描述环境结构;最后,将空间拓扑地图作为路径规划模块集成到模型中,并结合动作网络用于获取更加通用的导航方法.实验在3D仿真环境DMlab中进行,实验结果表明,本文方法可从视觉输入中学习目标导向行为,在所有测试环境中均展现出更高效的学习方法和导航策略,同时减少构建地图所需数据量;而在包含动态堵塞的环境中,该模型可使用拓扑地图动态规划路径,从而引导绕路行为完成导航任务,展现出良好的环境适应性.

关 键 词:目标导向行为  深度强化学习  碰撞预测  时间相关性网络  空间拓扑地图  动作网络

A Visual Navigation Method Based on Goal-Driven Behavior and Space Topological Memory
RUAN Xiao-Gang,LI Peng,ZHU Xiao-Qing,LIU Peng-Fei.A Visual Navigation Method Based on Goal-Driven Behavior and Space Topological Memory[J].Chinese Journal of Computers,2021,44(3):594-608.
Authors:RUAN Xiao-Gang  LI Peng  ZHU Xiao-Qing  LIU Peng-Fei
Affiliation:(Faulty of In formation Technology,Beijing University of Technology,Beijing 100124;Beijing Key Laboratory of Computational Intelligence and Intelligent System,Beijing 100124)
Abstract:Everyone knows it is impossible for agents to reach the goal efficiently until it has sufficiently explored the environment or constructed cognitive model of the world,but the essential question is how to generate goal-driven behaviour.Organisms can spontaneously explore the environment with rare or deceptive reward and build map-like representation to support subsequent actions,such as finding food,shelters or mates.What we want to know is whether the robot can imitate such cognitive mechanism to complete navigational tasks?Obviously,relying on high precision sensors as a source to recall the structure of environment is not practical in real world,so we perceive the state space and learn control policy with visual inputs.And to deal with the problems stem from dimension disaster,the deep learning is also used in our method.The navigation systems developed in robotics can typically be divided into two classes:one reach the goal by encoding the structure of environment,it can utilize multiple sensor information as input and provide high-quality environment maps;and the other one is map-less approach,which maintain a control policy in the learning process and use it to finish goal reaching tasks,each of them has their pros and cons.In this paper,we proposed a visual navigation method which can learn goal-driven behavior and encode space structure synchronously.Firstly,in order to learn control policy from raw visual information,we take deep reinforcement learning as basic navigation framework,it provides an end-to-end framework and allow our approach directly predict control signal from high-dimensional sensory inputs.Meanwhile,due to the environment contains a much wider variety of possible training signals,an auxiliary task named collision prediction is added to the model.Then,in the process of exploration,the agent throughout the environment numerous times and observe a lot of states,but much of them are repetitive,the temporal correlation network is used to remove these redundant observation and search for waypoints.Because the various perspective of agent,instead of using hand-designed features,we use temporal distance,which only related to environment steps to compute the similarity between states.And inspired by the researches about cognitive mechanism of animals,we learned that many mammals are able to utilize an observation,especially the one include landmarks,to represent a neighboring state space,thus encoding the environment in a simpler and efficient way.So we use waypoint,which discovered in exploration sequences and can represent an adjacent state space that within a certain temporal distance,to describc the structure of environment gradually.Finally,the space topological map is integrated into the model as a path planning module,and combines with locomotion network to obtain a more general navigation method.The experiment was conducted in 3 D simulation environment DMlab.The experiment results show this navigation method can learn goal-driven behavior from visual inputs,and show more efficient learning approach and navigation policy in all test environments,and reduce the amount of data required to build map.Furthermore,by placing the agent in dynamically blocked environment,the model can take advantage of topological map to guide detour behavior and complete navigational tasks,showing better environmental adaptability.
Keywords:goal-driven behavior  deep reinforcement learning  collision prediction  temporal correlation network  space topological map  locomotion network
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号