融合元学习和PPO算法的四足机器人运动技能学习方法 A quadruped robot kinematic skill learning method integrating meta-learning and PPO algorithms期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

融合元学习和PPO算法的四足机器人运动技能学习方法

引用本文：	朱晓庆,刘鑫源,阮晓钢,张思远,李春阳,李鹏.融合元学习和PPO算法的四足机器人运动技能学习方法[J].控制理论与应用,2024,41(1):155-162.

作者姓名：	朱晓庆刘鑫源阮晓钢张思远李春阳李鹏

作者单位：	北京工业大学,北京工业大学,北京工业大学,北京工业大学,北京工业大学,北京工业大学

基金项目：	国家自然科学基金项目(62103009), 北京市自然科学基金项目(4202005)资助.

摘要：	具备学习能力是高等动物智能的典型表现特征, 为探明四足动物运动技能学习机理, 本文对四足机器人步态学习任务进行研究, 复现了四足动物的节律步态学习过程. 近年来, 近端策略优化(PPO)算法作为深度强化学习的典型代表, 普遍被用于四足机器人步态学习任务, 实验效果较好且仅需较少的超参数. 然而, 在多维输入输出场景下, 其容易收敛到局部最优点, 表现为四足机器人学习到步态节律信号杂乱且重心震荡严重. 为解决上述问题, 在元学习启发下, 基于元学习具有刻画学习过程高维抽象表征优势, 本文提出了一种融合元学习和PPO思想的元近端策略优化(MPPO)算法, 该算法可以让四足机器人进化学习到更优步态. 在PyBullet仿真平台上的仿真实验结果表明, 本文提出的算法可以使四足机器人学会行走运动技能, 且与柔性行动者评价器(SAC)和PPO算法的对比实验显示, 本文提出的MPPO算法具有步态节律信号更规律、行走速度更快等优势.
关键词：	四足机器人步态学习强化学习元学习
收稿时间：	2022/9/27 0:00:00
修稿时间：	2023/4/7 0:00:00
A quadruped robot kinematic skill learning method integrating meta-learning and PPO algorithms

ZHU Xiao-qing,LIU Xin-yuan,RUAN Xiao-gang,ZHANG Si-yuan,LI Chun-yang and LI Peng.A quadruped robot kinematic skill learning method integrating meta-learning and PPO algorithms[J].Control Theory & Applications,2024,41(1):155-162.

Authors:	ZHU Xiao-qing LIU Xin-yuan RUAN Xiao-gang ZHANG Si-yuan LI Chun-yang and LI Peng

Affiliation:	Beijing University of Technology,Beijing University of Technology,Beijing University of Technology,Beijing University of Technology,Beijing University of Technology,Beijing University of Technology

Abstract:	Learning ability is a typical characteristic of higher animal intelligence. In order to explore the learning mechanism of quadruped motor skills, this paper studies the gait learning task of quadruped robots, and reproduces the rhythmic gait learning process of quadruped animals from scratch. In recent years, proximal policy optimization (PPO) algorithm, as a typical representative algorithm of deep reinforcement learning, has been widely used in gait learning tasks for quadruped robots, with good experimental results and fewer hyperparameters required. However, in the multidimensional input and output scenario, it is easy to converge to the local optimum point, in the experimental environment of this study, the gait rhythm signals of the trained quadruped robot were irregular, and the center of gravity oscillates. To solve the above problems, inspired by meta-learning, based on the advantage of meta-learning in characterizing the high-dimensional abstract representation of learning processes, this paper proposes an meta proximal policy optimization (MPPO) algorithm that combines meta-learning and PPO algorithms. This algorithm can enable quadruped robots to learn better gait. The simulation results on the PyBullet simulation platform show that the algorithm proposed in this paper can enable quadruped robots to learn walking skills. Compared with soft actor-critic (SAC) and PPO algorithms, the MPPO algorithm proposed in this paper has advantages such as more regular gait rhythm signals and faster walking speed.

Keywords:	quadruped robot gait learning reinforcement learning meta-learning

	点击此处可从《控制理论与应用》浏览原始摘要信息
	点击此处可从《控制理论与应用》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏