首页 | 本学科首页   官方微博 | 高级检索  
     

基于强化学习的机器人曲面恒力跟踪研究
引用本文:张铁,肖蒙,邹焱飚,肖佳栋. 基于强化学习的机器人曲面恒力跟踪研究[J]. 浙江大学学报(工学版), 2019, 53(10): 1865-1873. DOI: 10.3785/j.issn.1008-973X.2019.10.003
作者姓名:张铁  肖蒙  邹焱飚  肖佳栋
作者单位:华南理工大学 机械与汽车工程学院,广东 广州 510640
摘    要:针对机器人末端执行器和曲面工件接触时难以得到恒定接触力的问题,建立机器人末端执行器与曲面工件的接触模型.构建曲面接触力坐标系与机器人传感器测量坐标系之间的关系,利用基于概率动力学模型的强化学习(PILCO)算法对模型输出参数与接触状态的关系进行学习,对部分接触状态进行预测,强化学习根据预测的状态优化机器人位移输入参数,得到期望跟踪力信号. 实验中,将强化学习的输入状态改为一段时间内的状态平均值以减少接触状态下信号的干扰. 实验结果表明,利用PILCO算法在迭代8次后能够得到较稳定的力,相比于模糊迭代算法收敛速度较快,力误差绝对值的平均值减少了29%.

关 键 词:机器人  曲面跟踪  力控制  基于概率动力学模型的强化学习(PILCO)  强化学习  

Research on robot constant force control of surface tracking based on reinforcement learning
Tie ZHANG,Meng XIAO,Yan-biao ZOU,Jia-dong XIAO. Research on robot constant force control of surface tracking based on reinforcement learning[J]. Journal of Zhejiang University(Engineering Science), 2019, 53(10): 1865-1873. DOI: 10.3785/j.issn.1008-973X.2019.10.003
Authors:Tie ZHANG  Meng XIAO  Yan-biao ZOU  Jia-dong XIAO
Abstract:The contact model between robot end-effector and surface was established in order to solve the problem that it is difficult to obtain contact force when a robot end effector contacts with the curved workpiece. The relationship between the contact force coordinate system of the curved surface and the measuring coordinate system of the robot sensor was constructed. The relationship between the output parameters of the model and the contact state was analyzed based on probabilistic inference and learning for control (PILCO) which was a reinforcement learning algorithm based on a probabilistic dynamics model. The partial contact state was forecasted according to the output state, and the displacement input parameters of the robot were optimized to achieve a constant force by the reinforcement learning algorithm. The input state of the reinforcement learning was modified to an average state value over a period of time, which reduced the interference to the input state value during experiments. The experimental results showed that the algorithm obtained stable force after 8 iterations. The convergence speed was faster compared with the fuzzy iterative algorithm, and the average absolute value of the force error was reduced by 29%.
Keywords:robot  contour tracking  force control  probabilistic inference and learning for control (PILCO)  reinforcement learning  
本文献已被 CNKI 等数据库收录!
点击此处可从《浙江大学学报(工学版)》浏览原始摘要信息
点击此处可从《浙江大学学报(工学版)》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号