采用DDPG的双足机器人自学习步态规划方法 Self-Learning Gait Planning Method for Biped Robot Using DDPG期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

采用DDPG的双足机器人自学习步态规划方法

引用本文：	周友行,赵晗妘,刘汉江,李昱泽,肖雨琴.采用DDPG的双足机器人自学习步态规划方法[J].计算机工程与应用,2021,57(6):254-259.

作者姓名：	周友行赵晗妘刘汉江李昱泽肖雨琴

作者单位：	湘潭大学机械工程学院，湖南湘潭 411105

摘要：	为解决多自由度双足机器人步行控制中高维非线性规划难题，挖掘不确定环境下双足机器人自主运动潜力，提出了一种改进的基于深度确定性策略梯度算法（DDPG）的双足机器人步态规划方案。把双足机器人多关节自由度控制问题转化为非线性函数的多目标优化求解问题，采用DDPG算法来求解。为解决全局逼近网络求解过程收敛慢的问题，采用径向基（RBF）神经网络进行非线性函数值的计算，并采用梯度下降算法更新神经网络权值，采用SumTree来筛选优质样本。通过ROS、Gazebo、Tensorflow的联合仿真平台对双足机器人进行了模拟学习训练。经数据仿真验证，改进后的DDPG算法平均达到最大累积奖励的时间提前了45.7%，成功率也提升了8.9%，且经训练后的关节姿态角度具有更好的平滑度。
关键词：	双足机器人步态规划深度确定性策略梯度算法（DDPG）径向基函数（RBF）神经网络 SumTree Gazebo
Self-Learning Gait Planning Method for Biped Robot Using DDPG

ZHOU Youhang,ZHAO Hanyun,LIU Hanjiang,LI Yuze,XIAO Yuqin.Self-Learning Gait Planning Method for Biped Robot Using DDPG[J].Computer Engineering and Applications,2021,57(6):254-259.

Authors:	ZHOU Youhang ZHAO Hanyun LIU Hanjiang LI Yuze XIAO Yuqin

Affiliation:	School of Mechanical Engineering, Xiangtan University, Xiangtan, Hunan 411105, China

Abstract:	In order to solve the problem of high-dimensional nonlinear programming in walking control of a multi-degree-of-freedom biped robot, and to tap the autonomous motion potential of the biped robot in an uncertain environment, an improved biped robot step based on the Deep Deterministic Policy Gradient algorithm（DDPG） is proposed. The multi-joint DOF control problem of the two-joint robot is transformed into a non-linear function multi-objective optimization problem and solved by the DDPG algorithm. To solve the problem of slow convergence of the global approximation network solution process, Radial Basis Function（RBF） neural network is used for nonlinear function. The value is calculated and the gradient weighting algorithm is used to update the neural network weights, and SumTree is used to screen the quality samples. The simulation learning training of the biped robot is carried out through the joint simulation platform of ROS, Gazebo and Tensorflow. According to the data simulation, the time after the improved DDPG algorithm reaches the maximum cumulative reward is 45.7%, the success rate is also increased by 8.9%, and the joint posture angle after training is better.

Keywords:	biped robot gait planning Deep Deterministic Policy Gradient（DDPG） Radial Basis Function（RBF） neural network SumTree Gazebo
本文献已被万方数据等数据库收录！
	点击此处可从《计算机工程与应用》浏览原始摘要信息
	点击此处可从《计算机工程与应用》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏