首页 | 官方网站   微博 | 高级检索  
     

基于深度强化学习的动态装配算法
引用本文:王竣禾,,,姜勇,.基于深度强化学习的动态装配算法[J].智能系统学报,2023,18(1):2-11.
作者姓名:王竣禾      姜勇  
作者单位:1. 中国科学院沈阳自动化研究所 机器人学国家重点实验室,辽宁 沈阳 110016;2. 中国科学院机器人与智能制造创新研究院,辽宁 沈阳 110169;3. 中国科学院大学,北京 100049
摘    要:针对动态装配环境中存在的复杂、动态的噪声扰动,提出一种基于深度强化学习的动态装配算法。将一段时间内的接触力作为状态,通过长短时记忆网络进行运动特征提取;定义序列贴现因子,对之前时刻的分奖励进行加权得到当前时刻的奖励值;模型输出的动作为笛卡尔空间位移,使用逆运动学调整机器人到达期望位置。与此同时,提出一种对带有资格迹的时序差分算法改进的神经网络参数更新方法,可缩短模型训练时间。在实验部分,首先在圆孔–轴的简单环境中进行预训练,随后在真实场景下继续训练。实验证明提出的方法可以很好地适应动态装配任务中柔性、动态的装配环境。

关 键 词:柔索模型  动态噪声  动态装配  深度强化学习  长短时记忆网络  序列贴现因子  带有资格迹的时序差分算法  预训练

Dynamic assembly algorithm based on deep reinforcement learning
WANG Junhe,,,JIANG Yong,.Dynamic assembly algorithm based on deep reinforcement learning[J].CAAL Transactions on Intelligent Systems,2023,18(1):2-11.
Authors:WANG Junhe      JIANG Yong  
Affiliation:1. State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110016, China;2. Institutes for Robotics and Intelligent Manufacturing, Chinese Academy of Sciences, Shenyang 110169, China;3. University of Chinese Academy of Sciences, Beijing 100049, China
Abstract:A dynamic assembly algorithm based on deep reinforcement learning is proposed for complex dynamic noise perturbations in the dynamic assembly environment. Taking the contact force in a period of time as a state, the motion features are extracted through the long short-term memory. Define the sequence discount factor, and obtain the reward value at a certain moment through weighting the sub-reward at the previous moment. The robot can be adjusted to the desired position using inverse kinematics, with the action of model output as the Cartesian space displacement. In the meanwhile, an improved neural network parameter update method is proposed based on the temporal difference (λ) algorithm to shorten the model training time. Experimentally, training was conducted in the real scene upon pre-training in the simple environment with the circular hole-axis. According to the experiments, the proposed algorithm can well adapt to the flexible and dynamic assembly environment in a dynamic assembly task.
Keywords:flexible cable model  dynamic noise  dynamic assembly  deep reinforcement learning  long short-term memory  sequential discount factor  temporal difference(λ)  pre-training
点击此处可从《智能系统学报》浏览原始摘要信息
点击此处可从《智能系统学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号