人群环境中基于深度强化学习的移动机器人避障算法 Obstacle Avoidance Algorithm for Mobile Robot Based on Deep Reinforcement Learning in Crowd Environment期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

人群环境中基于深度强化学习的移动机器人避障算法

引用本文：	孙立香,孙晓娴,刘成菊,靖文.人群环境中基于深度强化学习的移动机器人避障算法[J].信息与控制,2022,51(1):107-118.

作者姓名：	孙立香孙晓娴刘成菊靖文

作者单位：	1. 盐城工业职业技术学院智能制造学院, 江苏盐城 224005;2. 同济人工智能(苏州)研究院, 江苏苏州 215131;3. 同济大学电子与信息工程学院, 上海 201804

基金项目：	国家重点研究开发计划（2016YFD0700905）；2020年江苏省产学研合作项目（BY2020338）；2020年江苏省大学生创新创业训练计划项目（202013752028Y）

摘要：	为了控制移动机器人在人群密集的复杂环境中高效友好地完成避障任务，本文提出了一种人群环境中基于深度强化学习的移动机器人避障算法。首先，针对深度强化学习算法中值函数网络学习能力不足的情况，基于行人交互（crowd interaction）对值函数网络做了改进，通过行人角度网格（angel pedestrian grid）对行人之间的交互信息进行提取，并通过注意力机制（attention mechanism）提取单个行人的时序特征，学习得到当前状态与历史轨迹状态的相对重要性以及对机器人避障策略的联合影响，为之后多层感知机的学习提供先验知识；其次，依据行人空间行为（human spatial behavior）设计强化学习的奖励函数，并对机器人角度变化过大的状态进行惩罚，实现了舒适避障的要求；最后，通过仿真实验验证了人群环境中基于深度强化学习的移动机器人避障算法在人群密集的复杂环境中的可行性与有效性。
关键词：	深度强化学习人机共融行人空间行为移动机器人避障
收稿时间：	2021-04-06
Obstacle Avoidance Algorithm for Mobile Robot Based on Deep Reinforcement Learning in Crowd Environment

SUN Lixiang,SUN Xiaoxian,LIU Chengju,JING Wen.Obstacle Avoidance Algorithm for Mobile Robot Based on Deep Reinforcement Learning in Crowd Environment[J].Information and Control,2022,51(1):107-118.

Authors:	SUN Lixiang SUN Xiaoxian LIU Chengju JING Wen

Affiliation:	1. Institute of Intelligent Manufacturing, Yancheng Polytechnic College, Yancheng 224005, China;2. Tongji Artifical Intelligence Research Institute, Suzhou 215131, China;3. School of Electronics and Information Engineering, Tongji University, Shanghai 201804, China

Abstract:	To control mobile robots to efficiently perform obstacle avoidance in crowded and complex environments, a mobile robot obstacle avoidance algorithm based on deep reinforcement learning in the human-robot integration environment is proposed. First, in response to the lack of learning capability of the value network of deep reinforcement learning algorithms, the value function network is improved based on crowd interaction. The crowd information is exchanged through the angel pedestrian grid. The temporal characteristics of a single pedestrian are then extracted through an attention mechanism, which learns the relative importance of historical trajectory state and joint impact on the obstacle avoidance strategy of the robot, providing a first step for the subsequent learning of the multilayer perceptron. Next, a reward function was developed for reinforcement learning based on human spatial behavior. The state where the robot angle changes significantly is punished to achieve the requirements of comfortable obstacle avoidance. Finally, the feasibility and effectiveness of the proposed algorithm in crowded and complex environments are verified through simulation experiments.

Keywords:	deep reinforcement learning human-robot integration human spatial behavior obstacle avoidance for mobile robot

	点击此处可从《信息与控制》浏览原始摘要信息
	点击此处可从《信息与控制》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏