首页 | 本学科首页   官方微博 | 高级检索  
     

结构交互驱动的机器人深度强化学习控制方法
引用本文:余超,董银昭,郭宪,冯旸赫,卓汉逵,张强.结构交互驱动的机器人深度强化学习控制方法[J].软件学报,2023,34(4):1749-1764.
作者姓名:余超  董银昭  郭宪  冯旸赫  卓汉逵  张强
作者单位:中山大学计算机学院, 广东 广州 510006;大连理工大学 计算机科学与技术学院, 辽宁 大连 116081;南开大学 人工智能学院, 天津 300354;国防科技大学 系统工程学院, 湖南 长沙 410073
基金项目:国家自然科学基金(U1908214, 62076259); 腾讯犀牛鸟基金(JR202063)
摘    要:针对深度强化学习在高维机器人行为控制中训练效率低下和策略不可解释等问题,提出一种基于结构交互驱动的机器人深度强化学习方法(structure-motivated interactive deep reinforcement learning, SMILE).首先,利用结构分解方法将高维的单机器人控制问题转化为低维的多关节控制器协同学习问题,从而缓解连续运动控制的维度灾难难题;其次,通过两种协同图模型(ATTENTION和PODT)动态推理控制器之间的关联关系,实现机器人内部关节的信息交互和协同学习;最后,为了平衡ATTENTION和PODT协同图模型的计算复杂度和信息冗余度,进一步提出两种协同图模型更新方法 APDODT和PATTENTION,实现控制器之间长期关联关系和短期关联关系的动态自适应调整.实验结果表明,基于结构驱动的机器人强化学习方法能显著提升机器人控制策略学习效率.此外,基于协同图模型的关系推理及协同机制,可为最终学习策略提供更为直观和有效的解释.

关 键 词:机器人控制  深度强化学习  结构分解  可解释性
收稿时间:2021/9/30 0:00:00
修稿时间:2022/3/30 0:00:00

Structure-motivated Interactive Deep Reinforcement Learning for Robotic Control
YU Chao,DONG Yin-Zhao,GUO Xian,FENG Yang-He,ZHUO Han-Kui,ZHANG Qiang.Structure-motivated Interactive Deep Reinforcement Learning for Robotic Control[J].Journal of Software,2023,34(4):1749-1764.
Authors:YU Chao  DONG Yin-Zhao  GUO Xian  FENG Yang-He  ZHUO Han-Kui  ZHANG Qiang
Affiliation:School of Computer Science and Engineering, Sun Yat-Sen University, Guangzhou 510275, China;School of Computer Science and Technology, Dalian University of Technology, Dalian 116081, China;School of Artificial Intelligence, Nankai University, Tianjin 300354, China;School of Systems Engineering, University of National Defense Science and Technology, Changsha 410073, China
Abstract:This study proposes structure-motivated interactive deep reinforcement learning (SMILE) method to solve the problems of low training efficiency and inexplicable strategy of deep reinforcement learning (DRL) in high-dimensional robot behavior control. First, the high-dimensional single robot control problem is transformed into a low-dimensional multi-controllers control problem according to some structural decomposition schemes, so as to solve the curse of dimensionality in continuous control. In addition, SMILE dynamically outputs the dependency among the controllers through two coordination graph (CG) models, ATTENTION and PODT, in order to realize the information exchange and coordinated learning among the internal joints of the robot. In order to balance the computational complexity and information redundancy of the above two CG models, two different models, APODT and PATTENTION, are then proposed to update the CG, which can realize the dynamic adaptation between the short-term dependency and long-term dependency among the controllers. The experimental results show that this kind of structurally decomposed learning can improve the learning efficiency substantially, and more explicit interpretations of the final learned policy can be achieved through the relational inference and coordinated learning among the components of a robot.
Keywords:robotic control  deep reinforcement learning  structural decomposition  interpretation
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号