首页 | 官方网站   微博 | 高级检索  
     

采用DDPG的双足机器人自学习步态规划方法
引用本文:周友行,赵晗妘,刘汉江,李昱泽,肖雨琴.采用DDPG的双足机器人自学习步态规划方法[J].计算机工程与应用,2021,57(6):254-259.
作者姓名:周友行  赵晗妘  刘汉江  李昱泽  肖雨琴
作者单位:湘潭大学 机械工程学院,湖南 湘潭 411105
摘    要:为解决多自由度双足机器人步行控制中高维非线性规划难题,挖掘不确定环境下双足机器人自主运动潜力,提出了一种改进的基于深度确定性策略梯度算法(DDPG)的双足机器人步态规划方案。把双足机器人多关节自由度控制问题转化为非线性函数的多目标优化求解问题,采用DDPG算法来求解。为解决全局逼近网络求解过程收敛慢的问题,采用径向基(RBF)神经网络进行非线性函数值的计算,并采用梯度下降算法更新神经网络权值,采用SumTree来筛选优质样本。通过ROS、Gazebo、Tensorflow的联合仿真平台对双足机器人进行了模拟学习训练。经数据仿真验证,改进后的DDPG算法平均达到最大累积奖励的时间提前了45.7%,成功率也提升了8.9%,且经训练后的关节姿态角度具有更好的平滑度。

关 键 词:双足机器人  步态规划  深度确定性策略梯度算法(DDPG)  径向基函数(RBF)神经网络  SumTree  Gazebo  

Self-Learning Gait Planning Method for Biped Robot Using DDPG
ZHOU Youhang,ZHAO Hanyun,LIU Hanjiang,LI Yuze,XIAO Yuqin.Self-Learning Gait Planning Method for Biped Robot Using DDPG[J].Computer Engineering and Applications,2021,57(6):254-259.
Authors:ZHOU Youhang  ZHAO Hanyun  LIU Hanjiang  LI Yuze  XIAO Yuqin
Affiliation:School of Mechanical Engineering, Xiangtan University, Xiangtan, Hunan 411105, China
Abstract:In order to solve the problem of high-dimensional nonlinear programming in walking control of a multi-degree-of-freedom biped robot, and to tap the autonomous motion potential of the biped robot in an uncertain environment, an improved biped robot step based on the Deep Deterministic Policy Gradient algorithm(DDPG) is proposed. The multi-joint DOF control problem of the two-joint robot is transformed into a non-linear function multi-objective optimization problem and solved by the DDPG algorithm. To solve the problem of slow convergence of the global approximation network solution process, Radial Basis Function(RBF) neural network is used for nonlinear function. The value is calculated and the gradient weighting algorithm is used to update the neural network weights, and SumTree is used to screen the quality samples. The simulation learning training of the biped robot is carried out through the joint simulation platform of ROS, Gazebo and Tensorflow. According to the data simulation, the time after the improved DDPG algorithm reaches the maximum cumulative reward is 45.7%, the success rate is also increased by 8.9%, and the joint posture angle after training is better.
Keywords:biped robot  gait planning  Deep Deterministic Policy Gradient(DDPG)  Radial Basis Function(RBF) neural network  SumTree  Gazebo  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号