基于D-DQN强化学习算法的双足机器人智能控制研究 Research on Intelligent Control of Biped Robot Based on D-DQN Reinforcement Learning Algorithm期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于D-DQN强化学习算法的双足机器人智能控制研究

引用本文：	李丽霞,陈艳.基于D-DQN强化学习算法的双足机器人智能控制研究[J].计算机测量与控制,2024,32(3):181-187.

作者姓名：	李丽霞陈艳

作者单位：	广州华商学院

基金项目：	2022年度广州华商学院高等教育教学改革项目（HS2022ZLGC71）

摘要：	针对现有双足机器人智能控制算法存在的轨迹偏差大、效率低等问题，提出了一种基于D-DQN强化学习的控制算法；先分析双足机器人运动中的坐标变换关系和关节连杆补偿过程，然后基于Q值网络实现对复杂运动非线性过程降维处理，采用了Q值网络权值和辅助权值的双网络权值设计方式，进一步强化DQN网络性能，并以Tanh函数作为神经网络的激活函数，提升DQN网络的数值训练能力；在数据训练和交互中经验回放池发挥出关键的辅助作用，通过将奖励值输入到目标函数中，进一步提升对双足机器人的控制精度，最后通过虚拟约束控制的方式提高双足机器人运动中的稳定性；实验结果显示：在D-DQN强化学习的控制算法，机器人完成第一阶段测试的时间仅为115 s,综合轨迹偏差0.02 m,而且步态切换极限环测试的稳定性良好。
关键词：	D-DQN 强化学习双足机器人智能控制经验回放池虚拟约束控制
收稿时间：	2023/8/22 0:00:00
修稿时间：	2023/9/8 0:00:00
Research on Intelligent Control of Biped Robot Based on D-DQN Reinforcement Learning Algorithm

Abstract:	Aiming at the problems of large trajectory deviation and low efficiency of existing intelligent control algorithms for biped robots, a control algorithm based on D-DQN reinforcement learning is proposed. Firstly, the coordinate transformation relationship in the motion of biped robot and the compensation process of joint and link are analyzed, and then the dimensionality reduction of complex nonlinear motion process is realized based on Q-value network. The double weight design method of Q-value network weight and auxiliary weight is adopted to strengthen the performance of DQN network, and Tanh function is used as the activation function of neural network to improve the numerical training ability of DQN network. The experience playback pool plays a key auxiliary role in data training and interaction. By inputting the reward value into the objective function, the control accuracy of the biped robot is further improved. Finally, the stability of the biped robot is improved by virtual constraint control. The experimental results show that under the D-DQN reinforcement learning control algorithm, the time of the robot to complete the first stage test is only 115s, the comprehensive trajectory deviation is 0.02m, and the stability of the gait switching limit cycle test is good.

Keywords:	D-DQN Reinforcement learning Bipedal robot Intelligent control Experience playback pool Virtual constraint control

	点击此处可从《计算机测量与控制》浏览原始摘要信息
	点击此处可从《计算机测量与控制》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏