首页 | 本学科首页   官方微博 | 高级检索  
     

基于强化学习的煤矸石分拣机械臂智能控制算法研究
引用本文:张永超,于智伟,丁丽林.基于强化学习的煤矸石分拣机械臂智能控制算法研究[J].工矿自动化,2021,47(1):36-42.
作者姓名:张永超  于智伟  丁丽林
作者单位:山东科技大学机械电子工程学院,山东青岛266590;山东科技大学机械电子工程学院,山东青岛266590;山东科技大学机械电子工程学院,山东青岛266590
基金项目:山东省自然科学基金项目(ZR2018MEE036)。
摘    要:针对传统煤矸石分拣机械臂控制算法如抓取函数法、基于费拉里法的动态目标抓取算法等依赖于精确的环境模型、且控制过程缺乏自适应性,传统深度确定性策略梯度(DDPG)等智能控制算法存在输出动作过大及稀疏奖励容易被淹没等问题,对传统DDPG算法中的神经网络结构和奖励函数进行了改进,提出了一种适合处理六自由度煤矸石分拣机械臂的基于强化学习的改进DDPG算法。煤矸石进入机械臂工作空间后,改进DDPG算法可根据相应传感器返回的煤矸石位置及机械臂状态进行决策,并向相应运动控制器输出一组关节角状态控制量,根据煤矸石位置及关节角状态控制量控制机械臂运动,使机械臂运动到煤矸石附近,实现煤矸石分拣。仿真实验结果表明:改进DDPG算法相较于传统DDPG算法具有无模型通用性强及在与环境交互中可自适应学习抓取姿态的优势,可率先收敛于探索过程中所遇的最大奖励值,利用改进DDPG算法控制的机械臂所学策略泛化性更好、输出的关节角状态控制量更小、煤矸石分拣效率更高。

关 键 词:选煤  煤矸石分拣  分拣机器人  机械臂  关节角状态控制  强化学习  奖励函数  DDPG算法

Research on intelligent control algorithm of coal gangue sorting robot arm based on reinforcement learning
ZHANG Yongchao,YU Zhiwei,DING Lilin.Research on intelligent control algorithm of coal gangue sorting robot arm based on reinforcement learning[J].Industry and Automation,2021,47(1):36-42.
Authors:ZHANG Yongchao  YU Zhiwei  DING Lilin
Affiliation:(College of Mechanical and Electronic Engineering,Shandong University of Science and Technology,Qingdao 266590,China)
Abstract:The problems of the traditional gangue sorting robot arm control algorithms such as the grasping function method and the dynamic target grasping algorithm based on Ferrary method are relying on an accurate environment model and lacking adaptivity in the control process.At the same time,the problems of the traditional intelligent control algorithms such as deep deterministic policy gradient(DDPG)are excessive output actions and sparse rewards that are easily covered.In order to solve these problems,this study improves the neural network structure and reward function in the traditional DDPG algorithm,and proposes an improved DDPG algorithm based on reinforcement learning,which is suitable for handling six-degree-of-freedom gangue sorting robot arms.After the gangue enters the working space of the robot arm,the improved DDPG algorithm can make decisions according to the gangue position and robot arm state returned by the corresponding sensor,and can output a set of joint angle state control quantity to the corresponding motion controller.The algorithm can control the movement of the robot arm according to the gangue position and joint angle state control quantity,so that the robot arm moves to the nearby gangue to conduct gangue sorting.The simulation results show that compared with the traditional DDPG algorithm,the improved DDPG algorithm has the advantages of model-free versatility and adaptive learning of grasping pose in interaction with the environment.Moreover,the improved algorithm can be the first to converge to the maximum reward value encountered during exploration.The robot arm controlled by the improved DDPG algorithm has better policy generalization,smaller joint angle state control output and higher gangue sorting efficiency.
Keywords:coal preparation  coal gangue sorting  sorting robot  robot arm  joint angle state control  reinforcement learning  reward function  DDPG algorithm
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号