结构交互驱动的机器人深度强化学习控制方法 Structure-motivated Interactive Deep Reinforcement Learning for Robotic Control期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

结构交互驱动的机器人深度强化学习控制方法

引用本文：	余超,董银昭,郭宪,冯旸赫,卓汉逵,张强.结构交互驱动的机器人深度强化学习控制方法[J].软件学报,2023,34(4):1749-1764.

作者姓名：	余超董银昭郭宪冯旸赫卓汉逵张强

作者单位：	中山大学计算机学院, 广东广州 510006;大连理工大学计算机科学与技术学院, 辽宁大连 116081;南开大学人工智能学院, 天津 300354;国防科技大学系统工程学院, 湖南长沙 410073

基金项目：	国家自然科学基金(U1908214, 62076259); 腾讯犀牛鸟基金(JR202063)

摘要：	针对深度强化学习在高维机器人行为控制中训练效率低下和策略不可解释等问题,提出一种基于结构交互驱动的机器人深度强化学习方法(structure-motivated interactive deep reinforcement learning, SMILE).首先,利用结构分解方法将高维的单机器人控制问题转化为低维的多关节控制器协同学习问题,从而缓解连续运动控制的维度灾难难题;其次,通过两种协同图模型(ATTENTION和PODT)动态推理控制器之间的关联关系,实现机器人内部关节的信息交互和协同学习;最后,为了平衡ATTENTION和PODT协同图模型的计算复杂度和信息冗余度,进一步提出两种协同图模型更新方法 APDODT和PATTENTION,实现控制器之间长期关联关系和短期关联关系的动态自适应调整.实验结果表明,基于结构驱动的机器人强化学习方法能显著提升机器人控制策略学习效率.此外,基于协同图模型的关系推理及协同机制,可为最终学习策略提供更为直观和有效的解释.
关键词：	机器人控制深度强化学习结构分解可解释性
收稿时间：	2021/9/30 0:00:00
修稿时间：	2022/3/30 0:00:00
Structure-motivated Interactive Deep Reinforcement Learning for Robotic Control

YU Chao,DONG Yin-Zhao,GUO Xian,FENG Yang-He,ZHUO Han-Kui,ZHANG Qiang.Structure-motivated Interactive Deep Reinforcement Learning for Robotic Control[J].Journal of Software,2023,34(4):1749-1764.

Authors:	YU Chao DONG Yin-Zhao GUO Xian FENG Yang-He ZHUO Han-Kui ZHANG Qiang

Affiliation:	School of Computer Science and Engineering, Sun Yat-Sen University, Guangzhou 510275, China;School of Computer Science and Technology, Dalian University of Technology, Dalian 116081, China;School of Artificial Intelligence, Nankai University, Tianjin 300354, China;School of Systems Engineering, University of National Defense Science and Technology, Changsha 410073, China

Abstract:	This study proposes structure-motivated interactive deep reinforcement learning (SMILE) method to solve the problems of low training efficiency and inexplicable strategy of deep reinforcement learning (DRL) in high-dimensional robot behavior control. First, the high-dimensional single robot control problem is transformed into a low-dimensional multi-controllers control problem according to some structural decomposition schemes, so as to solve the curse of dimensionality in continuous control. In addition, SMILE dynamically outputs the dependency among the controllers through two coordination graph (CG) models, ATTENTION and PODT, in order to realize the information exchange and coordinated learning among the internal joints of the robot. In order to balance the computational complexity and information redundancy of the above two CG models, two different models, APODT and PATTENTION, are then proposed to update the CG, which can realize the dynamic adaptation between the short-term dependency and long-term dependency among the controllers. The experimental results show that this kind of structurally decomposed learning can improve the learning efficiency substantially, and more explicit interpretations of the final learned policy can be achieved through the relational inference and coordinated learning among the components of a robot.

Keywords:	robotic control deep reinforcement learning structural decomposition interpretation

	点击此处可从《软件学报》浏览原始摘要信息
	点击此处可从《软件学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏