首页 | 本学科首页   官方微博 | 高级检索  
     

基于深度强化学习的机械臂控制快速训练方法
引用本文:赵寅甫,冯正勇. 基于深度强化学习的机械臂控制快速训练方法[J]. 计算机工程, 2022, 48(8): 113-120. DOI: 10.19678/j.issn.1000-3428.0061575
作者姓名:赵寅甫  冯正勇
作者单位:西华师范大学电子信息工程学院,四川南充637009
基金项目:西华师范大学英才基金(17YC046);西华师范大学博士科研启动项目“异构无线网络流媒体传输QOE优化”(13E003)。
摘    要:人工智能在机器人控制中得到广泛应用,机器人控制算法也逐渐从模型驱动转变为数据驱动。深度强化学习算法可在复杂环境中感知并决策,能够解决高维度和连续状态空间下的机械臂控制问题。然而,目前深度强化学习中数据驱动的训练过程非常依赖计算机GPU算力,且训练时间成本较大。提出基于深度强化学习的先简化模型(2D模型)再复杂模型(3D模型)的机械臂控制快速训练方法。采用深度确定性策略梯度算法代替机械臂传统控制算法中的逆运动学解算方法,直接通过数据驱动的训练过程控制机械臂末端到达目标位置,从而减小训练时间成本。同时,对于状态向量和奖励函数形式,使用不同的设置方式。将最终训练得到的算法模型在真实机械臂上进行实现和验证,结果表明,其控制效果达到了分拣物品的应用要求,相比于直接在3D模型中的训练,能够缩短近52%的平均训练时长。

关 键 词:机械臂  位置控制  人工智能  深度强化学习  深度确定性策略梯度算法
收稿时间:2021-05-08
修稿时间:2021-08-16

Fast Training Method for Manipulator Control Based on Deep Reinforcement Learning
ZHAO Yinfu,FENG Zhengyong. Fast Training Method for Manipulator Control Based on Deep Reinforcement Learning[J]. Computer Engineering, 2022, 48(8): 113-120. DOI: 10.19678/j.issn.1000-3428.0061575
Authors:ZHAO Yinfu  FENG Zhengyong
Affiliation:School of Electronic Information Engineering, China West Normal University, Nanchong, Sichan 637009, China
Abstract:Artificial Intelligence(AI) is widely used in robot control, and the algorithms of robot control are gradually shifting from model-driven to data-driven.Deep reinforcement learning can perceive and make decisions in complex environments and solve manipulator control problems in high-dimensional and continuous state spaces.The current data-driven training process in deep reinforcement learning relies heavily on GPU computing power and requires a significant amount of training time.To address this problem, this study proposes a fast training method for manipulator control based on deep reinforcement learning of simplified model(2D model) followed by complex model(3D model).A Deep Deterministic Policy Gradient(DDPG) algorithm is used to control the end of the manipulator to reach the target position directly through data-driven training instead of the traditional inverse kinematic solving method, thereby reducing the amount of training time.However, at different settings for the state vector and reward function forms, the final trained algorithm model is implemented and verified on a real manipulator.The results show that the control effect meets the application requirements of sorting items and is able to shorten the average training time by nearly 52% compared with that obtained by training directly in the 3D model.
Keywords:manipulator  position control  Artificial Intelligence(AI)  deep reinforcement learning  Deep Deterministic Policy Gradient (DDPG) algorithm  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机工程》浏览原始摘要信息
点击此处可从《计算机工程》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号