期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

基于深度强化学习的双足机器人斜坡步态控制方法 总被引：1，自引：0，他引：1

吴晓光刘绍维杨磊邓文强贾哲恒《自动化学报》2021,47(8):1976-1987

为提高准被动双足机器人斜坡步行稳定性, 本文提出了一种基于深度强化学习的准被动双足机器人步态控制方法. 通过分析准被动双足机器人的混合动力学模型与稳定行走过程, 建立了状态空间、动作空间、episode过程与奖励函数. 在利用基于DDPG改进的Ape-X DPG算法持续学习后, 准被动双足机器人能在较大斜坡范围内实现稳定行走. 仿真实验表明, Ape-X DPG无论是学习能力还是收敛速度均优于基于PER的DDPG. 同时, 相较于能量成型控制, 使用Ape-X DPG的准被动双足机器人步态收敛更迅速、步态收敛域更大, 证明Ape-X DPG可有效提高准被动双足机器人的步行稳定性. 相似文献

2.

基于D-DQN强化学习算法的双足机器人智能控制研究

下载免费PDF全文

李丽霞陈艳《计算机测量与控制》2024,32(3):181-187

针对现有双足机器人智能控制算法存在的轨迹偏差大、效率低等问题,提出了一种基于D-DQN强化学习的控制算法;先分析双足机器人运动中的坐标变换关系和关节连杆补偿过程,然后基于Q值网络实现对复杂运动非线性过程降维处理,采用了Q值网络权值和辅助权值的双网络权值设计方式,进一步强化DQN网络性能,并以Tanh函数作为神经网络的激活函数,提升DQN网络的数值训练能力;在数据训练和交互中经验回放池发挥出关键的辅助作用,通过将奖励值输入到目标函数中,进一步提升对双足机器人的控制精度,最后通过虚拟约束控制的方式提高双足机器人运动中的稳定性;实验结果显示：在D-DQN强化学习的控制算法,机器人完成第一阶段测试的时间仅为115 s,综合轨迹偏差0.02 m,而且步态切换极限环测试的稳定性良好。相似文献

3.

基于环境分层方法的双足机器人路径规划研究

闵伟伟刘国栋莫栋成《计算机应用研究》2013,30(2):389-391

为了实现双足机器人在障碍环境中的路径规划,提出一种将三维环境分层的方法,用两个截面将环境分为高于机器人身高障碍层、低于机器人抬脚高度障碍层和中间障碍层.首先在中间障碍层进行机器人轨迹规划,再根据机器人各种步态的不同损耗构建代价函数,把规划好的轨迹放到最底层进行规划修改,最终得到双足机器人在规划路径上代价最小的一系列连续的动作,通过计算机仿真实验验证了方法的有效性. 相似文献

4.

基于强化学习的类人机器人步行参数训练算法

下载免费PDF全文

梁志伟朱松豪《计算机工程》2012,38(8):13-15

基于轨迹规划的类人机器人在合理的参数组合下可实现快速稳定的行走。为优化步行参数,提出一种基于强化学习的步行参数训练算法。对步行参数进行降阶处理,利用强化学习算法优化参数,并设置奖惩机制。在Robocup3D仿真平台上进行实验,结果证明了该算法的有效性。相似文献

5.

Neuro-fuzzy gait synthesis with reinforcement learning for a biped walking robot

C. Zhou 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2000,4(4):238-250

A reinforcement learning-based neuro-fuzzy gait synthesizer, which is based on the GARIC (Generalized Approximate Reasoning for Intelligent Control) architecture, is proposed for the problem of biped dynamic balance. We modify the GARIC architecture to enable it to generate the trunk trajectory in both sagittal and frontal plane. The proposed gait synthesizer is trained by reinforcement learning that uses a multi-valued scalar signal to evaluate the degrees of failure or success for the biped locomotion by means of the ZMP (Zero Moment Point). It can form the initial dynamic balancing gait from linguistic rules, which are obtained from human intuitive balancing knowledge and biomechanics studies, and accumulate dynamic balancing knowledge through reinforcement learning, and thus constantly improve its gait during walking. The feasibility of the proposed method is verified through a 5-link biped robot simulation. 相似文献

6.

基于Adams和Matlab联合仿真的双足机器人越障研究

于薇薇 C. SABOURIN K. MADANI 闰杰《计算机测量与控制》2008,16(11):1741-1743,1769

与轮行机器人相比,双足机器人具有更灵活的机械结构,具有跨越静态或动态障碍物的能力,使其可以在更复杂的环境中工作;以往的双足机器人路径规划控制策略只能解决静止或以可预测速度运动的障碍物的越障问题,提出了一种基于模糊Q学习算法的路径规划策略,在Adams软件中建立机器人的三维虚拟样机模型,在Matlab软件中设计控制器,进行联合仿真;仿真结果表明所设计的控制策略可以有效地克服机器人在线学习时间长的问题,并且可以成功跨越速度不可预测的运动障碍物,有很好的鲁棒性。相似文献

7.

双足机器人自然ZMP轨迹生成方法研究 总被引：1，自引：0，他引：1

李春光刘国栋《计算机工程与应用》2014,50(19):53-57

为了实现双足机器人类人行走,提出了一种基于自然ZMP轨迹的双足机器人步行模式生成方法。在单腿支撑相,根据基于三维线性倒立摆模型,在设定从脚跟到脚趾移动的自然ZMP轨迹后,得到质心轨迹方程;在双腿支撑相采用线性摆模型生成质心轨迹方程。同时给出了在统一坐标系中的多步规划质心轨迹方程。在RoboCup 3D仿真平台实现了采用自然ZMP轨迹的双足机器人类人稳定步行,实验和竞赛结果都验证了该方法的有效性。相似文献

8.

双足机器人跨越动态障碍物在线控制系统设计

于薇薇 Sabourin C Madani K 闫杰《计算机测量与控制》2008,16(10):1441-1443,1447

在双足机器人跨越动态障碍物的在线控制问题中,脚步规划和步态控制的学习时间是关键问题;提出了一种将机器人的步态控制和脚步规划分别独立设计的控制策略;步态控制目的是产生关节点轨迹并控制对理想轨迹的跟踪,考虑到双足机器人关节点轨迹的不连续性,应用小脑模型连接控制CMAC记忆特征步态的关节点轨迹;脚步规划的控制目标是通过对环境的视觉感知预测机器人的运动路径,算法是基于无需对动态环境精确建模的模糊Q学习算法;仿真结果表明该控制策略的可行性,并且可以有效缩短在线学习时间。相似文献

9.

基于deep Q-network双足机器人非平整地面行走稳定性控制方法

赵玉婷韩宝玲罗庆生《计算机应用》2018,38(9):2459-2463

针对双足机器人在非平整地面行走时容易失去运动稳定性的问题,提出一种基于一种基于价值的深度强化学习算法DQN（Deep Q-Network）的步态控制方法。首先通过机器人步态规划得到针对平整地面环境的离线步态,然后将双足机器人视为一个智能体,建立机器人环境空间、状态空间、动作空间及奖惩机制,该过程与传统控制方法相比无需复杂的动力学建模过程,最后经过多回合训练使双足机器人学会在不平整地面进行姿态调整,保证行走稳定性。在V-Rep仿真环境中进行了算法验证,双足机器人在非平整地面行走过程中,通过DQN步态调整学习算法,姿态角度波动范围在3°以内,结果表明双足机器人行走稳定性得到明显改善,实现了机器人的姿态调整行为学习,证明了该方法的有效性。相似文献

10.

Ascending and descending stairs for a biped robot

Ching-Long Shih 《IEEE transactions on systems, man, and cybernetics. Part A, Systems and humans : a publication of the IEEE Systems, Man, and Cybernetics Society》1999,29(3):255-268

This paper synthesizes an efficient walking pattern for a practical biped robot when ascending and descending stairs. The main features of the biped robot include variable length legs and a translatable balance weight in the body. The biped robot's walk is a mixture of both statically stable and dynamically stable modes and relies on some degrees of static stability provided by large feet and by carefully controlling the center of gravity's position. The paper describes the design and experiment of a 7-DOF practical biped which is capable of ascending and descending stairs, and the synthesis of an efficient walking gait for ascending and descending stairs. Our biped robot is one of the few biped walking machines capable of ascending and descending stairs 相似文献

11.

基于T-S模糊再励学习的稳定双足步态生成算法 总被引：2，自引：0，他引：2

胡凌云孙增圻《机器人》2004,26(5):461-466

提出了一种基于T S模糊再励学习的稳定双足步态生成算法 .将再励学习引入T S模糊神经网学习增益参数 ,从而采用较少的模糊规则充分逼近了由ZMP曲线到髋关节轨迹的非线性变化关系 ,并将连续空间的多变量变化转换为一维独立动作增益的并行搜索 .仿真结果和双足机器人Luna的实验数据都验证了算法的可行性 . 相似文献

12.

深度强化学习及在路径规划中的研究进展

下载免费PDF全文

张荣霞武长旭孙同超赵增顺《计算机工程与应用》2021,57(19):44-56

路径规划的目的是让机器人在移动过程中既能避开障碍物,又能快速规划出最短路径。在分析基于强化学习的路径规划算法优缺点的基础上,引出能够在复杂动态环境下进行良好路径规划的典型深度强化学习DQN（Deep Q-learning Network）算法。深入分析了DQN算法的基本原理和局限性,对比了各种DQN变种算法的优势和不足,进而从训练算法、神经网络结构、学习机制、AC（Actor-Critic）框架的多种变形四方面进行了分类归纳。提出了目前基于深度强化学习的路径规划方法所面临的挑战和亟待解决的问题,并展望了未来的发展方向,可为机器人智能路径规划及自动驾驶等方向的发展提供参考。相似文献

13.

一种小样本支持向量机控制器在两足机器人步态控制的研究

王丽杨刘治赵之光章云《控制理论与应用》2011,28(8):1133-1139

神经网络等传统的机器学习方法是基于样本数目无穷大的经验风险最小化原则,这对非确定环境下有限样本的步态学习控制非常不利．针对两足机器人面临的非确定环境适应性难题,提出了一种基于支持向量机（SVM）的两足机器人步态控制方法,解决了小样本条件下的步态学习控制问题．提出了一种基于混合核的步态回归方法,仿真研究表明了这种方法比全局核和局部核分别单独用于步态学习时有优越性．SVM以踝关节及髋关节的轨迹作为输入,相应的满足ZMP判据的上体轨迹作为输出,利用有限的理想步态样本对机器人上体轨迹与腿部轨迹之间的动态运动关系进行学习,然后将训练好的SVM置入机器人控制系统,从而增强了步态控制的鲁棒性,有利于实现两足机器人在非结构环境下的稳定步行．仿真结果表明了所提方法的优越性．相似文献

14.

The motion control of a statically stable biped robot on an unevenfloor

Ching-Long Shih Chien-Jung Chiou 《IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics》1998,28(2):244-249

This work studies the motion control of a statically stable biped robot having seven degrees of freedom. Statically stable walking of the biped robot is realized by maintaining the center-of-gravity inside the convex region of the supporting foot and/or feet during both single-support and double-support phases. The main points of this work are framing the stability in an easy and correct way, the design of a bipedal statically stable walker, and walking on sloping surfaces and stairs. 相似文献

15.

Dynamically-Stable Motion Planning for Humanoid Robots 总被引：9，自引：0，他引：9

James J. Kuffner Jr. Satoshi Kagami Koichi Nishiwaki Masayuki Inaba Hirochika Inoue 《Autonomous Robots》2002,12(1):105-118

We present an approach to path planning for humanoid robots that computes dynamically-stable, collision-free trajectories from full-body posture goals. Given a geometric model of the environment and a statically-stable desired posture, we search the configuration space of the robot for a collision-free path that simultaneously satisfies dynamic balance constraints. We adapt existing randomized path planning techniques by imposing balance constraints on incremental search motions in order to maintain the overall dynamic stability of the final path. A dynamics filtering function that constrains the ZMP (zero moment point) trajectory is used as a post-processing step to transform statically-stable, collision-free paths into dynamically-stable, collision-free trajectories for the entire body. Although we have focused our experiments on biped robots with a humanoid shape, the method generally applies to any robot subject to balance constraints (legged or not). The algorithm is presented along with computed examples using both simulated and real humanoid robots. 相似文献

16.

面向轨迹规划的深度强化学习奖励函数设计

下载免费PDF全文

李跃邵振洲赵振东施智平关永《计算机工程与应用》2020,56(2):226-232

现有基于深度强化学习的机械臂轨迹规划方法在未知环境中学习效率偏低,规划策略鲁棒性差。为了解决上述问题,提出了一种基于新型方位奖励函数的机械臂轨迹规划方法A-DPPO,基于相对方向和相对位置设计了一种新型方位奖励函数,通过降低无效探索,提高学习效率。将分布式近似策略优化（DPPO）首次用于机械臂轨迹规划,提高了规划策略的鲁棒性。实验证明相比现有方法,A-DPPO有效地提升了学习效率和规划策略的鲁棒性。相似文献

17.

双足机器人的鲁棒控制 总被引：4，自引：0，他引：4

周云龙徐心和《机器人》2004,26(4):357-361

利用拉格朗日法建立了双足机器人的动力学模型．在双脚支撑地时，运动学方程的约束造成双足机器人自由度的冗余，本文引入拉格朗日因子消除了双足机器人的冗余自由度．采用鲁棒控制法对双足机器人的轨迹跟踪进行控制，仿真实验结果证明, 鲁棒控制法对模型不精确或外部干扰对双足机器人产生的影响有很好的抑制作用，对双足机器人轨迹跟踪控制是有效的．相似文献

18.

Specifying and optimizing robotic motion for visual quality inspection

《Robotics and Computer》2021

Installation or even just modification of robot-supported production and quality inspection is a tedious process that usually requires full-time human expert engagement. The resulting parameters, e.g. robot velocities specified by an expert, are often subjective and produce suboptimal results. In this paper, we propose a new approach for specifying visual inspection trajectories based on CAD models of workpieces to be inspected. The expert involvement is required only to select – in a CAD system – the desired points on the inspection path along which the robot should move the camera. The rest of the approach is fully automatic. From the selected path data, the system computes temporal parametrization of the path, which ensures smoothness of the resulting robot trajectory for visual inspection. We then apply a new learning method for the optimization of robot speed along the specified path. The proposed approach combines iterative learning control and reinforcement learning. It takes a numerical estimate of image quality as input and produces the fastest possible motion that does not result in the degradation of image quality as output. In our experiments, the algorithm achieved up to 53% cycle time reduction from an initial, manually specified motion, without degrading the image quality. We show experimentally that the proposed algorithm achieves better results compared to some other policy learning approaches. The described approach is general and can be used with different types of learning and feedback signals. 相似文献

19.

Biped dynamic walking using reinforcement learning

Hamid Benbrahim Judy A. Franklin 《Robotics and Autonomous Systems》1997,22(3-4):283-302

This paper presents some results from a study of biped dynamic walking using reinforcement learning. During this study a hardware biped robot was built, a new reinforcement learning algorithm as well as a new learning architecture were developed. The biped learned dynamic walking without any previous knowledge about its dynamic model. The self scaling reinforcement (SSR) learning algorithm was developed in order to deal with the problem of reinforcement learning in continuous action domains. The learning architecture was developed in order to solve complex control problems. It uses different modules that consist of simple controllers and small neural networks. The architecture allows for easy incorporation of new modules that represent new knowledge, or new requirements for the desired task. 相似文献

20.

基于改进深度强化学习的三维环境路径规划

封硕舒红谢步庆《计算机应用与软件》2021,38(1):250-255

提出一种改进深度强化学习算法(NDQN),解决传统Q-learning算法处理复杂地形中移动机器人路径规划时面临的维数灾难.提出一种将深度学习融于Q-learning框架中,以网络输出代替Q值表的深度强化学习方法.针对深度Q网络存在严重的过估计问题,利用更正函数对深度Q网络中的评价函数进行改进.将改进深度强化学习算法与... 相似文献