首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 296 毫秒
1.
王斐  齐欢  周星群  王建辉 《机器人》2018,40(4):551-559
为解决现有机器人装配学习过程复杂且对编程技术要求高等问题,提出一种基于前臂表面肌电信号和惯性多源信息融合的隐式交互方式来实现机器人演示编程.在通过演示学习获得演示人的装配经验的基础上,为提高对装配对象和环境变化的自适应能力,提出了一种多工深度确定性策略梯度算法(M-DDPG)来修正装配参数,在演示编程的基础上,进行强化学习确保机器人稳定执行任务.在演示编程实验中,提出一种改进的PCNN(并行卷积神经网络),称作1维PCNN(1D-PCNN),即通过1维的卷积与池化过程自动提取惯性信息与肌电信息特征,增强了手势识别的泛化性和准确率;在演示再现实验中,采用高斯混合模型(GMM)对演示数据进行统计编码,利用高斯混合回归(GMR)方法实现机器人轨迹动作再现,消除噪声点.最后,基于Primesense Carmine摄像机采用帧差法与多特征图核相关滤波算法(MKCF)的融合跟踪算法分别获取X轴与Y轴方向的环境变化,采用2个相同的网络结构并行进行连续过程的深度强化学习.在轴孔相对位置变化的情况下,机械臂能根据强化学习得到的泛化策略模型自动对机械臂末端位置进行调整,实现轴孔装配的演示学习.  相似文献   

2.
针对传统煤矸石分拣机械臂控制算法如抓取函数法、基于费拉里法的动态目标抓取算法等依赖于精确的环境模型、且控制过程缺乏自适应性,传统深度确定性策略梯度(DDPG)等智能控制算法存在输出动作过大及稀疏奖励容易被淹没等问题,对传统DDPG算法中的神经网络结构和奖励函数进行了改进,提出了一种适合处理六自由度煤矸石分拣机械臂的基于强化学习的改进DDPG算法。煤矸石进入机械臂工作空间后,改进DDPG算法可根据相应传感器返回的煤矸石位置及机械臂状态进行决策,并向相应运动控制器输出一组关节角状态控制量,根据煤矸石位置及关节角状态控制量控制机械臂运动,使机械臂运动到煤矸石附近,实现煤矸石分拣。仿真实验结果表明:改进DDPG算法相较于传统DDPG算法具有无模型通用性强及在与环境交互中可自适应学习抓取姿态的优势,可率先收敛于探索过程中所遇的最大奖励值,利用改进DDPG算法控制的机械臂所学策略泛化性更好、输出的关节角状态控制量更小、煤矸石分拣效率更高。  相似文献   

3.
基于深度强化学习的双足机器人斜坡步态控制方法   总被引:1,自引:0,他引:1  
为提高准被动双足机器人斜坡步行稳定性, 本文提出了一种基于深度强化学习的准被动双足机器人步态控制方法. 通过分析准被动双足机器人的混合动力学模型与稳定行走过程, 建立了状态空间、动作空间、episode过程与奖励函数. 在利用基于DDPG改进的Ape-X DPG算法持续学习后, 准被动双足机器人能在较大斜坡范围内实现稳定行走. 仿真实验表明, Ape-X DPG无论是学习能力还是收敛速度均优于基于PER的DDPG. 同时, 相较于能量成型控制, 使用Ape-X DPG的准被动双足机器人步态收敛更迅速、步态收敛域更大, 证明Ape-X DPG可有效提高准被动双足机器人的步行稳定性.  相似文献   

4.
针对在杂乱、障碍物密集的复杂环境下移动机器人使用深度强化学习进行自主导航所面临的探索困难,进而导致学习效率低下的问题,提出了一种基于轨迹引导的导航策略优化(TGNPO)算法。首先,使用模仿学习的方法为移动机器人训练一个能够同时提供专家示范行为与导航轨迹预测功能的专家策略,旨在全面指导深度强化学习训练;其次,将专家策略预测的导航轨迹与当前时刻移动机器人所感知的实时图像进行融合,并结合坐标注意力机制提取对移动机器人未来导航起引导作用的特征区域,提高导航模型的学习性能;最后,使用专家策略预测的导航轨迹对移动机器人的策略轨迹进行约束,降低导航过程中的无效探索和错误决策。通过在仿真和物理平台上部署所提算法,实验结果表明,相较于现有的先进方法,所提算法在导航的学习效率和轨迹平滑方面取得了显著的优势。这充分证明了该算法能够高效、安全地执行机器人导航任务。  相似文献   

5.
为了控制移动机器人在人群密集的复杂环境中高效友好地完成避障任务,本文提出了一种人群环境中基于深度强化学习的移动机器人避障算法。首先,针对深度强化学习算法中值函数网络学习能力不足的情况,基于行人交互(crowd interaction)对值函数网络做了改进,通过行人角度网格(angel pedestrian grid)对行人之间的交互信息进行提取,并通过注意力机制(attention mechanism)提取单个行人的时序特征,学习得到当前状态与历史轨迹状态的相对重要性以及对机器人避障策略的联合影响,为之后多层感知机的学习提供先验知识;其次,依据行人空间行为(human spatial behavior)设计强化学习的奖励函数,并对机器人角度变化过大的状态进行惩罚,实现了舒适避障的要求;最后,通过仿真实验验证了人群环境中基于深度强化学习的移动机器人避障算法在人群密集的复杂环境中的可行性与有效性。  相似文献   

6.
车辆跟驰行为决策研究对于车辆跟驰驾驶技术的发展至关重要,以深度强化学习方法研究车辆低速跟驰场景,提出了一种改进型DDPG决策算法,该算法在DDPG算法的基础上,结合了CBF控制器以进行安全补偿控制与策略探索指导;同时,设计了符合低速跟驰期望目标的奖励函数。在对比实验中,通过高斯过程模型模拟跟驰车队系统,分别用DDPG算法和DDPG-CBF改进算法控制其中一辆车的跟驰行为,实验结果表明,相比于DDPG算法,DDPG-CBF改进算法可以更有效地保证跟驰决策的安全性,同时具有更高的学习效率,能够应用于车辆低速跟驰场景。  相似文献   

7.
This paper is mostly concerned with the application of connectionist architectures for fast on-line learning of robot dynamic uncertainties used at the executive hierarchical control level in robot contact tasks. The connectionist structures are integrated in non-learning control laws for contact tasks which enable stabilization and good tracking performance of position and force. It has been shown that the problem of tracking a specified reference trajectory and specified force profile with a present quality of their transient response can be efficiently solved by means of the application of a four-layer perceptron. A four-layer perceptron is part of a hybrid learning control algorithm through the process of synchronous training which uses fast learning rules and available sensor information in order to improve robotic performance progressively in the minimum possible number of learning epochs. Some simulation results of the deburring process with robot MANUTEC r3 are shown to verify effectiveness of the proposed control learning algorithms.  相似文献   

8.
王竣禾      姜勇   《智能系统学报》2023,18(1):2-11
针对动态装配环境中存在的复杂、动态的噪声扰动,提出一种基于深度强化学习的动态装配算法。将一段时间内的接触力作为状态,通过长短时记忆网络进行运动特征提取;定义序列贴现因子,对之前时刻的分奖励进行加权得到当前时刻的奖励值;模型输出的动作为笛卡尔空间位移,使用逆运动学调整机器人到达期望位置。与此同时,提出一种对带有资格迹的时序差分算法改进的神经网络参数更新方法,可缩短模型训练时间。在实验部分,首先在圆孔–轴的简单环境中进行预训练,随后在真实场景下继续训练。实验证明提出的方法可以很好地适应动态装配任务中柔性、动态的装配环境。  相似文献   

9.
The complexity in planning and control of robot compliance tasks mainly results from simultaneous control of both position and force and inevitable contact with environments. It is quite difficult to achieve accurate modeling of the interaction between the robot and the environment during contact. In addition, the interaction with the environment varies even for compliance tasks of the same kind. To deal with these phenomena, in this paper, we propose a reinforcement learning and robust control scheme for robot compliance tasks. A reinforcement learning mechanism is used to tackle variations among compliance tasks of the same kind. A robust compliance controller that guarantees system stability in the presence of modeling uncertainties and external disturbances is used to execute control commands sent from the reinforcement learning mechanism. Simulations based on deburring compliance tasks demonstrate the effectiveness of the proposed scheme.  相似文献   

10.
路径规划是人工智能领域的一个经典问题,在国防军事、道路交通、机器人仿真等诸多领域有着广泛应用,然而现有的路径规划算法大多存在着环境单一、离散的动作空间、需要人工构筑模型的问题。强化学习是一种无须人工提供训练数据自行与环境交互的机器学习方法,深度强化学习的发展更使得其解决现实问题的能力得到进一步提升,本文将深度强化学习的DDPG(Deep Deterministic Policy Gradient)算法应用到路径规划领域,完成了连续空间、复杂环境的路径规划。  相似文献   

11.

Deep learning techniques have shown success in learning from raw high-dimensional data in various applications. While deep reinforcement learning is recently gaining popularity as a method to train intelligent agents, utilizing deep learning in imitation learning has been scarcely explored. Imitation learning can be an efficient method to teach intelligent agents by providing a set of demonstrations to learn from. However, generalizing to situations that are not represented in the demonstrations can be challenging, especially in 3D environments. In this paper, we propose a deep imitation learning method to learn navigation tasks from demonstrations in a 3D environment. The supervised policy is refined using active learning in order to generalize to unseen situations. This approach is compared to two popular deep reinforcement learning techniques: deep-Q-networks and Asynchronous actor-critic (A3C). The proposed method as well as the reinforcement learning methods employ deep convolutional neural networks and learn directly from raw visual input. Methods for combining learning from demonstrations and experience are also investigated. This combination aims to join the generalization ability of learning by experience with the efficiency of learning by imitation. The proposed methods are evaluated on 4 navigation tasks in a 3D simulated environment. Navigation tasks are a typical problem that is relevant to many real applications. They pose the challenge of requiring demonstrations of long trajectories to reach the target and only providing delayed rewards (usually terminal) to the agent. The experiments show that the proposed method can successfully learn navigation tasks from raw visual input while learning from experience methods fail to learn an effective policy. Moreover, it is shown that active learning can significantly improve the performance of the initially learned policy using a small number of active samples.

  相似文献   

12.
Skill learning in robot polishing is gaining attention and becoming a hot issue. Current studies on skill learning in robot polishing are mainly about trajectory skills, and force-relevant skills learning models are less studied. A skill learning method with good generalization and robustness is one of the elements worth investigating. In this study, a force-relevant skills learning method called arc-length probabilistic movement primitives (AL-ProMP) is proposed to improve the efficiency of robot polishing force planning. AL-ProMP learns the mapping between the contact force and polishing trajectory, and the temporal scaling factor and force scaling factor in AL-ProMP enable better robustness of force planning in speed scaling tasks and polishing tasks in different scenarios. Speed scaling is an important property for adaptation of the polishing policy. For the generalization of polishing skills to different polishing tools in robotics disc polishing tasks of unknown geometric model workpieces, a novel force scaling factor for different polishing discs is proposed according to the contact force model. In addition, polishing contact position learning provides the basis for polishing trajectory generalization. Finally, it is experimentally verified that the proposed method is effective in learning and generalizing the demonstrated skills and improving the polishing surface quality of the workpiece with unknown geometric model.  相似文献   

13.
人工智能在机器人控制中得到广泛应用,机器人控制算法也逐渐从模型驱动转变为数据驱动。深度强化学习算法可在复杂环境中感知并决策,能够解决高维度和连续状态空间下的机械臂控制问题。然而,目前深度强化学习中数据驱动的训练过程非常依赖计算机GPU算力,且训练时间成本较大。提出基于深度强化学习的先简化模型(2D模型)再复杂模型(3D模型)的机械臂控制快速训练方法。采用深度确定性策略梯度算法代替机械臂传统控制算法中的逆运动学解算方法,直接通过数据驱动的训练过程控制机械臂末端到达目标位置,从而减小训练时间成本。同时,对于状态向量和奖励函数形式,使用不同的设置方式。将最终训练得到的算法模型在真实机械臂上进行实现和验证,结果表明,其控制效果达到了分拣物品的应用要求,相比于直接在3D模型中的训练,能够缩短近52%的平均训练时长。  相似文献   

14.
This paper suggests a solution for peg-in-hole problems involving complex geometry. Successful completion of peg-in-hole assembly tasks depends on a geometry-based approach for determining the guiding direction, fine contact motion control, and a reference force for the alignment/insertion process as well. Therefore, in this study, we propose a peg-in-hole strategy for complex-shaped parts based on a guidance algorithm. This guidance algorithm is inspired by the study of human motion patterns; that is, the assembly direction selection process and the maximum force threshold are determined through the observation of humans performing similar actions. In order to carry out assembly tasks, an assembly direction is chosen using the spatial arrangement and geometric information of complex-shaped parts, and the required force is decided by kinesthetic teaching with a Gaussian mixture model. In addition, an impedance controller using an admittance filter is implemented to achieve stable contact motion for a position control-based industrial robot. The performance of the proposed assembly strategy was evaluated by experiments using arbitrarily complex-shaped parts with different initial situations.  相似文献   

15.
This paper presents a discrete learning controller for vision-guided robot trajectory imitation with no prior knowledge of the camera-robot model. A teacher demonstrates a desired movement in front of a camera, and then, the robot is tasked to replay it by repetitive tracking. The imitation procedure is considered as a discrete tracking control problem in the image plane, with an unknown and time-varying image Jacobian matrix. Instead of updating the control signal directly, as is usually done in iterative learning control (ILC), a series of neural networks are used to approximate the unknown Jacobian matrix around every sample point in the demonstrated trajectory, and the time-varying weights of local neural networks are identified through repetitive tracking, i.e., indirect ILC. This makes repetitive segmented training possible, and a segmented training strategy is presented to retain the training trajectories solely within the effective region for neural network approximation. However, a singularity problem may occur if an unmodified neural-network-based Jacobian estimation is used to calculate the robot end-effector velocity. A new weight modification algorithm is proposed which ensures invertibility of the estimation, thus circumventing the problem. Stability is further discussed, and the relationship between the approximation capability of the neural network and the tracking accuracy is obtained. Simulations and experiments are carried out to illustrate the validity of the proposed controller for trajectory imitation of robot manipulators with unknown time-varying Jacobian matrices.  相似文献   

16.
无奇异间接迭代学习控制及其在机器人运动模仿中的应用   总被引:4,自引:0,他引:4  
针对相当广泛的一类非线性系统有限时间轨迹跟踪问题,提出了间接迭代学习方案. 采用最小二乘算法,根据重复跟踪历史辨识非线性系统的线性化模型.利用一个分段学习方案 可保证学习控制总在有效线性近似区域内进行.探讨了如何在学习过程中避免控制奇异问题, 提出了一种高效的参数修正方法,保证输入耦合矩阵的估计行列式不为零.本文将这一控制方 案应用于未知机器人及摄像机模型下的机器人运动模仿中,而不面临任何奇异问题.这是一个 采用摄像机替代传统程序编写的新的机器人编程方法.  相似文献   

17.
Abstract

Robot position/force control provides an interaction scheme between the robot and the environment. When the environment is unknown, learning algorithms are needed. But, the learning space and learning time are big. To balance the learning accuracy and the learning time, we propose a hybrid reinforcement learning method, which can be in both discrete and continuous domains. The discrete-time learning has poor learning accuracy and less learning time. The continuous-time learning is slow but has better learning precision. This hybrid reinforcement learning learns the optimal contact force, meanwhile it minimizes the position error in the unknown environment. Convergence of the proposed learning algorithm is proven. Real-time experiments are carried out using the pan and tilt robot and the force/torque sensor.  相似文献   

18.
Many motor skills in humanoid robotics can be learned using parametrized motor primitives. While successful applications to date have been achieved with imitation learning, most of the interesting motor learning problems are high-dimensional reinforcement learning problems. These problems are often beyond the reach of current reinforcement learning methods. In this paper, we study parametrized policy search methods and apply these to benchmark problems of motor primitive learning in robotics. We show that many well-known parametrized policy search methods can be derived from a general, common framework. This framework yields both policy gradient methods and expectation-maximization (EM) inspired algorithms. We introduce a novel EM-inspired algorithm for policy learning that is particularly well-suited for dynamical system motor primitives. We compare this algorithm, both in simulation and on a real robot, to several well-known parametrized policy search methods such as episodic REINFORCE, ??Vanilla?? Policy Gradients with optimal baselines, episodic Natural Actor Critic, and episodic Reward-Weighted Regression. We show that the proposed method out-performs them on an empirical benchmark of learning dynamical system motor primitives both in simulation and on a real robot. We apply it in the context of motor learning and show that it can learn a complex Ball-in-a-Cup task on a real Barrett WAM? robot arm.  相似文献   

19.
为了降低多边缘服务器多用户系统中用户的总成本,结合深度确定性策略梯度(deep deterministic policy gradient,DDPG)、长短期记忆网络(LSTM)和注意力机制,提出了一种基于DDPG的深度强化学习卸载算法(A-DDPG)。该算法采用二进制卸载策略,并且将任务的延迟敏感性和服务器负载的有限性以及任务迁移考虑在内,自适应地卸载任务,以最大限度减少由延迟敏感型任务超时造成的总损失。考虑时延和能耗两个指标并设定了不同的权重值,解决因用户类型不同带来的不公平问题,制定了任务卸载问题以最小化所有任务完成时延和能量消耗的总成本,以目标服务器的选择和数据卸载量为学习目标。实验结果表明,A-DDPG算法具有良好的稳定性和收敛性,与DDPG算法和双延迟深度确定性策略梯度(twin delayed deep deterministic policy gradient,TD3)算法相比,A-DDPG算法的用户总成本分别降低了27%和26.66%,平均达到最优任务失败率的时间分别提前了57.14%和40%,其在奖励、总成本和任务失败率方面取得了较好的效果。  相似文献   

20.
基于深度强化学习的机器人操作技能学习成为研究热点, 但由于任务的稀疏奖励性质, 学习效率较低. 本 文提出了基于元学习的双经验池自适应软更新事后经验回放方法, 并将其应用于稀疏奖励的机器人操作技能学习 问题求解. 首先, 在软更新事后经验回放算法的基础上推导出可以提高算法效率的精简值函数, 并加入温度自适应 调整策略, 动态调整温度参数以适应不同的任务环境; 其次, 结合元学习思想对经验回放进行分割, 训练时动态调整 选取真实采样数据和构建虚拟数的比例, 提出了DAS-HER方法; 然后, 将DAS-HER算法应用到机器人操作技能学 习中, 构建了一个稀疏奖励环境下具有通用性的机器人操作技能学习框架; 最后, 在Mujoco下的Fetch和Hand环境 中, 进行了8项任务的对比实验, 实验结果表明, 无论是在训练效率还是在成功率方面, 本文算法表现均优于其他算 法.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号