首页 | 本学科首页   官方微博 | 高级检索  
     

面向机器人系统的虚实迁移强化学习综述
引用本文:林谦,余超,伍夏威,董银昭,徐昕,张强,郭宪.面向机器人系统的虚实迁移强化学习综述[J].软件学报,2024,35(2):711-738.
作者姓名:林谦  余超  伍夏威  董银昭  徐昕  张强  郭宪
作者单位:中山大学 计算机学院, 广东 广州 510006;香港大学 机械工程系, 香港 999077;国防科技大学 智能科学学院, 湖南 长沙 410073;大连理工大学 计算机科学与技术学院, 辽宁 大连 116081;南开大学 人工智能学院, 天津 300350
基金项目:国家自然科学基金面上项目(62076259,62073176);国家自然科学联合基金重点项目(U1908214);科技创新2030—新一代人工智能重大项目(2021ZD0112400)
摘    要:近年来,基于环境交互的强化学习方法在机器人相关应用领域取得巨大成功,为机器人行为控制策略优化提供一个现实可行的解决方案.但在真实世界中收集交互样本存在高成本以及低效率等问题,因此仿真环境被广泛应用于机器人强化学习训练过程中.通过在虚拟仿真环境中以较低成本获取大量训练样本进行策略训练,并将学习策略迁移至真实环境,能有效缓解真实机器人训练中存在的安全性、可靠性以及实时性等问题.然而,由于仿真环境与真实环境存在差异,仿真环境中训练得到的策略直接迁移到真实机器人往往难以获得理想的性能表现.针对这一问题,虚实迁移强化学习方法被提出用以缩小环境差异,进而实现有效的策略迁移.按照迁移强化学习过程中信息的流动方向和智能化方法作用的不同对象,提出一个虚实迁移强化学习系统的流程框架,并基于此框架将现有相关工作分为3大类:基于真实环境的模型优化方法、基于仿真环境的知识迁移方法、基于虚实环境的策略迭代提升方法,并对每一分类中的代表技术与关联工作进行阐述.最后,讨论虚实迁移强化学习研究领域面临的机遇和挑战.

关 键 词:强化学习  迁移学习  虚实迁移  现实差距  机器人控制
收稿时间:2023/1/13 0:00:00
修稿时间:2023/6/22 0:00:00

Survey on Sim-to-real Transfer Reinforcement Learning in Robot Systems
LIN Qian,YU Chao,WU Xia-Wei,DONG Yin-Zhao,XU Xin,ZHANG Qiang,GUO Xian.Survey on Sim-to-real Transfer Reinforcement Learning in Robot Systems[J].Journal of Software,2024,35(2):711-738.
Authors:LIN Qian  YU Chao  WU Xia-Wei  DONG Yin-Zhao  XU Xin  ZHANG Qiang  GUO Xian
Affiliation:School of Computer Science and Engineering, Sun Yat-Sen University, Guangzhou 510006, China;Department of Mechanical Engineering, University of Hong Kong, Hong Kong 999077, China;College of Intelligence Science and Technology, University of National Defense Science and Technology, Changsha 410073, China;School of Computer Science and Technology, Dalian University of Technology, Dalian 116081, China; College of Artificial Intelligence, Nankai University, Tianjin 300350, China
Abstract:In recent years, reinforcement learning methods based on environmental interactions have achieved great success in robotic applications, providing a practical and feasible solution for optimizing the behavior control strategies of robots. However, collecting interactive samples in the real world can lead to problems such as high cost and low efficiency. Therefore, the simulation environment is widely used in the training process of robot reinforcement learning. By obtaining a large number of training samples at a low cost in the virtual simulation environment for strategy training and transferring learning strategies to the real world, the security, reliability, and real-time problems in the real robot training process can be alleviated. However, due to the difference between the simulation environment and the real environment, it is often difficult to obtain ideal performance when directly transferring the strategy trained in the simulation environment to the real robot. To solve this problem, sim-to-real transfer reinforcement learning methods are proposed to reduce the environmental gap, so as to achieve effective strategy transfer. According to the direction of information flow in the process of transfer reinforcement learning and the different objects targeted by intelligent methods, this survey first proposes a sim-to-real transfer reinforcement learning framework, based on which the existing related work is then divided into three categories: the model optimization methods focusing on the real environment, the knowledge transfer methods focusing on the simulation environment, and the iterative policy promotion methods focusing on both simulation and real environments. Then, the representative technologies and related work in each category are described. Finally, the opportunities and challenges in this field are briefly discussed.
Keywords:reinforcement learning (RL)  transfer learning  sim-to-real transfer  reality gap  robotic control
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号