首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 312 毫秒
1.
Many motor skills in humanoid robotics can be learned using parametrized motor primitives. While successful applications to date have been achieved with imitation learning, most of the interesting motor learning problems are high-dimensional reinforcement learning problems. These problems are often beyond the reach of current reinforcement learning methods. In this paper, we study parametrized policy search methods and apply these to benchmark problems of motor primitive learning in robotics. We show that many well-known parametrized policy search methods can be derived from a general, common framework. This framework yields both policy gradient methods and expectation-maximization (EM) inspired algorithms. We introduce a novel EM-inspired algorithm for policy learning that is particularly well-suited for dynamical system motor primitives. We compare this algorithm, both in simulation and on a real robot, to several well-known parametrized policy search methods such as episodic REINFORCE, ??Vanilla?? Policy Gradients with optimal baselines, episodic Natural Actor Critic, and episodic Reward-Weighted Regression. We show that the proposed method out-performs them on an empirical benchmark of learning dynamical system motor primitives both in simulation and on a real robot. We apply it in the context of motor learning and show that it can learn a complex Ball-in-a-Cup task on a real Barrett WAM? robot arm.  相似文献   

2.
We contribute a method for improving the skill execution performance of a robot by complementing an existing algorithmic solution with corrective human demonstration. We apply the proposed method to the biped walking problem, which is a good example of a complex low level skill due to the complicated dynamics of the walk process in a high dimensional state and action space. We introduce an incremental learning approach to improve the Nao humanoid robot’s stability during walking. First, we identify, extract, and record a complete walk cycle from the motion of the robot as it executes a given walk algorithm as a black box. Second, we apply offline advice operators for improving the stability of the learned open-loop walk cycle. Finally, we present an algorithm to directly modify the recorded walk cycle using real time corrective human demonstration. The demonstrator delivers the corrective feedback using a commercially available wireless game controller without touching the robot. Through the proposed algorithm, the robot learns a closed-loop correction policy for the open-loop walk by mapping the corrective demonstrations to the sensory readings received while walking. Experiment results demonstrate a significant improvement in the walk stability.  相似文献   

3.
4.
In this paper we propose a novel approach for intuitive and natural physical human–robot interaction in cooperative tasks. Through initial learning by demonstration, robot behavior naturally evolves into a cooperative task, where the human co-worker is allowed to modify both the spatial course of motion as well as the speed of execution at any stage. The main feature of the proposed adaptation scheme is that the robot adjusts its stiffness in path operational space, defined with a Frenet–Serret frame. Furthermore, the required dynamic capabilities of the robot are obtained by decoupling the robot dynamics in operational space, which is attached to the desired trajectory. Speed-scaled dynamic motion primitives are applied for the underlying task representation. The combination allows a human co-worker in a cooperative task to be less precise in parts of the task that require high precision, as the precision aspect is learned and provided by the robot. The user can also freely change the speed and/or the trajectory by simply applying force to the robot. The proposed scheme was experimentally validated on three illustrative tasks. The first task demonstrates novel two-stage learning by demonstration, where the spatial part of the trajectory is demonstrated independently from the velocity part. The second task shows how parts of the trajectory can be rapidly and significantly changed in one execution. The final experiment shows two Kuka LWR-4 robots in a bi-manual setting cooperating with a human while carrying an object.  相似文献   

5.
Integrated motion planning and control for the purposes of maneuvering mobile robots under state- and input constraints is a problem of vital practical importance in applications of mobile robots such as autonomous transportation. Those constraints arise naturally in practice due to specifics of robot mechanical construction and the presence of obstacles in motion environment. In contrast to approaches focusing on feedback control design under the assumption of given reference motion or motion planning with neglection of subsequent feedback motion execution, we adopt a controller-driven motion planning paradigm, which has recently gained attention of many researchers. It postulates design of motion planning algorithms dedicated to specific feedback control policies, which compute a sequence of feedback control subtasks instead of classically planned open-loop controls or parametric paths. In this spirit, we propose a motion planning algorithm driven by the VFO (Vector Field Orientation) control law for the waypoint-following task. Presented analysis of the VFO control law reveals its beneficial properties, which are subsequently utilized to solve a generally nonlinear and non-convex optimal motion planning problem by formulating it as a mixed-integer linear program (MILP). The solution proposed in this paper yields a waypoint sequence, which is designed for execution by application of the VFO control law to drive a robot to a prescribed final configuration under an input constraint imposed by bounded curvature of robot motion and state constraints resulting from a convex decomposition of task space. Satisfaction of these constraints is guaranteed analytically and exactly, i.e., without utilization of numerical approximations. Moreover, for a given discrete set of possible waypoint orientations, the proposed algorithm computes plans optimal w.r.t. given cost functional, which can be any convex linear combination of quantities such as robot path length, curvature of robot motion, distance to imposed state constraints, etc. Furthermore, the planning algorithm exploits the possibility of both forward or backward movement of the robot to allow maneuvering in demanding environments. Generated waypoint sequences are a compact representation of a motion plan, which can be immediately executed with the VFO controller without any additional post-processing. Validity of the proposed approach has been confirmed by simulation studies and experimental motion execution with a laboratory-scale mobile robot.  相似文献   

6.
In this article, we present a novel approach to learning efficient navigation policies for mobile robots that use visual features for localization. As fast movements of a mobile robot typically introduce inherent motion blur in the acquired images, the uncertainty of the robot about its pose increases in such situations. As a result, it cannot be ensured anymore that a navigation task can be executed efficiently since the robot’s pose estimate might not correspond to its true location. We present a reinforcement learning approach to determine a navigation policy to reach the destination reliably and, at the same time, as fast as possible. Using our technique, the robot learns to trade off velocity against localization accuracy and implicitly takes the impact of motion blur on observations into account. We furthermore developed a method to compress the learned policy via a clustering approach. In this way, the size of the policy representation is significantly reduced, which is especially desirable in the context of memory-constrained systems. Extensive simulated and real-world experiments carried out with two different robots demonstrate that our learned policy significantly outperforms policies using a constant velocity and more advanced heuristics. We furthermore show that the policy is generally applicable to different indoor and outdoor scenarios with varying landmark densities as well as to navigation tasks of different complexity.  相似文献   

7.
为了辅助医生完成视网膜显微手术中精细的手术操作,过滤颤抖、提高精度和稳定性,提出一种生成手术机器人空间运动约束的方法——虚拟固定器(VF).首先,通过引入手术环境约束和任务约束,采用加权、线性化的多目标约束条件,根据用户的输入设置目标函数,构造了视网膜显微手术中所需的6个虚拟固定器基元.在此基础上,以远程运动中心虚拟约束(RCM VF)的生成为例,通过约束运动基元的组合,推导了复杂约束运动的实现方法.各约束运动基元算法及复杂约束运动算法的仿真结果表明,手术器械可以按照虚拟固定器的定义实现特定的约束运动.最后,在各手术步骤中引入约束运动基元的基础上,在乒乓球和离体猪眼球上进行了手术操作实验,证明了在该虚拟固定器的引导下,视网膜机器人可以完成高难度的手术操作,验证了所提出算法的合理性和有效性.  相似文献   

8.
When describing robot motion with dynamic movement primitives (DMPs), goal (trajectory endpoint), shape and temporal scaling parameters are used. In reinforcement learning with DMPs, usually goals and temporal scaling parameters are predefined and only the weights for shaping a DMP are learned. Many tasks, however, exist where the best goal position is not a priori known, requiring to learn it. Thus, here we specifically address the question of how to simultaneously combine goal and shape parameter learning. This is a difficult problem because learning of both parameters could easily interfere in a destructive way. We apply value function approximation techniques for goal learning and direct policy search methods for shape learning. Specifically, we use “policy improvement with path integrals” and “natural actor critic” for the policy search. We solve a learning-to-pour-liquid task in simulations as well as using a Pa10 robot arm. Results for learning from scratch, learning initialized by human demonstration, as well as for modifying the tool for the learned DMPs are presented. We observe that the combination of goal and shape learning is stable and robust within large parameter regimes. Learning converges quickly even in the presence of disturbances, which makes this combined method suitable for robotic applications.  相似文献   

9.
Precise programming of robots for industrial tasks is inflexible to variations and time-consuming. Teaching a kinematic behavior by demonstration and encoding it with dynamical systems that are robust with respect to perturbations, is proposed in order to address this issue. Given a kinematic behavior encoded by Dynamic Movement Primitives (DMP), this work proposes a passive control scheme for assisting kinesthetic modifications of the learned behavior in task variations. It employs the utilization of penetrable spherical Virtual Fixtures (VFs) around the DMP’s virtual evolution that follows the teacher’s motion. The controller enables the user to haptically ‘inspect’ the spatial properties of the learned behavior in SE(3) and significantly modify it at any required segment, while facilitating the following of already learned segments. A demonstration within the VFs could signify that the kinematic behavior is taught correctly and could lead to autonomous execution, with the DMP generating the newly learned reference commands. The proposed control scheme is theoretically proved to be passive and experimentally validated with a KUKA LWR4+ robot. Results are compared with the case of using a gravity compensated robot agnostic of the previously learned task. It is shown that the time duration of teaching and the user’s cognitive load are reduced.  相似文献   

10.
This article describes a system, called Robel, for defining a robot controller that learns from experience very robust ways of performing a high-level task such as “navigate to”. The designer specifies a collection of skills, represented as hierarchical tasks networks, whose primitives are sensory-motor functions. The skills provide different ways of combining these sensory-motor functions to achieve the desired task. The specified skills are assumed to be complementary and to cover different situations. The relationship between control states, defined through a set of task-dependent features, and the appropriate skills for pursuing the task is learned as a finite observable Markov decision process (MDP). This MDP provides a general policy for the task; it is independent of the environment and characterizes the abilities of the robot for the task.  相似文献   

11.
针对二维动态场景下的移动机器人路径规划问题,提出了一种新颖的路径规划方法——连续动态运动基元(continuous dynamic movement primitives, CDMPs).该方法将传统的单一动态运动基元推广到连续动态运动基元,通过对演示运动轨迹的学习,获得各运动基元的权重序列,利用相位变量的更新,实现对未知动态目标的追踪.该方法克服了移动机器人对环境模型的依赖,解决了动态场景下追踪运动目标和躲避动态障碍物的路径规划问题.最后通过一系列仿真实验,验证了算法的可行性.仿真实验结果表明,对于动态场景下移动机器人路径规划问题, CDMPs算法比传统的DMPs方法在连续性能和规划效率上具有更好的表现.  相似文献   

12.
Recently, robot learning through deep reinforcement learning has incorporated various robot tasks through deep neural networks, without using specific control or recognition algorithms. However, this learning method is difficult to apply to the contact tasks of a robot, due to the exertion of excessive force from the random search process of reinforcement learning. Therefore, when applying reinforcement learning to contact tasks, solving the contact problem using an existing force controller is necessary. A neural-network-based movement primitive (NNMP) that generates a continuous trajectory which can be transmitted to the force controller and learned through a deep deterministic policy gradient (DDPG) algorithm is proposed for this study. In addition, an imitation learning algorithm suitable for NNMP is proposed such that the trajectories similar to the demonstration trajectory are stably generated. The performance of the proposed algorithms was verified using a square peg-in-hole assembly task with a tolerance of 0.1 mm. The results confirm that the complicated assembly trajectory can be learned stably through NNMP by the proposed imitation learning algorithm, and that the assembly trajectory is improved by learning the proposed NNMP through the DDPG algorithm.  相似文献   

13.
14.
ABSTRACT

There is a growing need for adaptive robotic assembly systems that are fast to setup and reprogram when new products are introduced. The World Robot Challenge at World Robot Summit 2018 was centered around the challenge of setting up a flexible robotic assembly system aiming at changeover times below 1 day. This paper presents a method for programming robotic assembly tasks that was initiated in connection with the World Robot Challenge that enables fast and easy setup of robotic insertion tasks.

We propose to program assembly tasks by demonstration, but instead of using the taught behavior directly, the demonstration is merged with assembly primitives to increase robustness. In contrast to other programming by demonstration approaches, we perform not only one demonstration but a sequence of four sub-demonstrations that are used to extract the desired robot trajectory in addition to parameters for the assembly primitive.

The proposed assembly strategy is compared to a standard dynamic movement primitive and experiments show that the proposed assembly strategy increases the robustness towards pose uncertainties and significantly reduces the applied forces during the execution of the assembly task.  相似文献   

15.
We present a method for autonomous learning of dextrous manipulation skills with multifingered robot hands. We use heuristics derived from observations made on human hands to reduce the degrees of freedom of the task and make learning tractable. Our approach consists of learning and storing a few basic manipulation primitives for a few prototypical objects and then using an associative memory to obtain the required parameters for new objects and/or manipulations. The parameter space of the robot is searched using a modified version of the evolution strategy, which is robust to the noise normally present in real-world complex robotic tasks. Given the difficulty of modeling and simulating accurately the interactions of multiple fingers and an object, and to ensure that the learned skills are applicable in the real world, our system does not rely on simulation; all the experimentation is performed by a physical robot, in this case the 16-degree-of-freedom Utah/MIT hand. Experimental results show that accurate dextrous manipulation skills can be learned by the robot in a short period of time. We also show the application of the learned primitives to perform an assembly task and how the primitives generalize to objects that are different from those used during the learning phase.  相似文献   

16.
Fuentes  Olac  Nelson  Randal C. 《Machine Learning》1998,31(1-3):223-237
We present a method for autonomous learning of dextrous manipulation skills with multifingered robot hands. We use heuristics derived from observations made on human hands to reduce the degrees of freedom of the task and make learning tractable. Our approach consists of learning and storing a few basic manipulation primitives for a few prototypical objects and then using an associative memory to obtain the required parameters for new objects and/or manipulations. The parameter space of the robot is searched using a modified version of the evolution strategy, which is robust to the noise normally present in real-world complex robotic tasks. Given the difficulty of modeling and simulating accurately the interactions of multiple fingers and an object, and to ensure that the learned skills are applicable in the real world, our system does not rely on simulation; all the experimentation is performed by a physical robot, in this case the 16-degree-of-freedom Utah/MIT hand. E xperimental results show that accurate dextrous manipulation skills can be learned by the robot in a short period of time. We also show the application of the learned primitives to perform an assembly task and how the primitives generalize to objects that are different from those used during the learning phase.  相似文献   

17.
This paper describes a novel approach for incremental learning of human motion pattern primitives through online observation of human motion. The observed time series data stream is first stochastically segmented into potential motion primitive segments, based on the assumption that data belonging to the same motion primitive will have the same underlying distribution. The motion segments are then abstracted into a stochastic model representation and automatically clustered and organized. As new motion patterns are observed, they are incrementally grouped together into a tree structure, based on their relative distance in the model space. The tree leaves, which represent the most specialized learned motion primitives, are then passed back to the segmentation algorithm so that as the number of known motion primitives increases, the accuracy of the segmentation can also be improved. The combined algorithm is tested on a sequence of continuous human motion data that are obtained through motion capture, and demonstrates the performance of the proposed approach.  相似文献   

18.
In this paper we describe a machine learning approach for acquiring a model of a robot behaviour from raw sensor data. We are interested in automating the acquisition of behavioural models to provide a robot with an introspective capability. We assume that the behaviour of a robot in achieving a task can be modelled as a finite stochastic state transition system.Beginning with data recorded by a robot in the execution of a task, we use unsupervised learning techniques to estimate a hidden Markov model (HMM) that can be used both for predicting and explaining the behaviour of the robot in subsequent executions of the task. We demonstrate that it is feasible to automate the entire process of learning a high quality HMM from the data recorded by the robot during execution of its task.The learned HMM can be used both for monitoring and controlling the behaviour of the robot. The ultimate purpose of our work is to learn models for the full set of tasks associated with a given problem domain, and to integrate these models with a generative task planner. We want to show that these models can be used successfully in controlling the execution of a plan. However, this paper does not develop the planning and control aspects of our work, focussing instead on the learning methodology and the evaluation of a learned model. The essential property of the models we seek to construct is that the most probable trajectory through a model, given the observations made by the robot, accurately diagnoses, or explains, the behaviour that the robot actually performed when making these observations. In the work reported here we consider a navigation task. We explain the learning process, the experimental setup and the structure of the resulting learned behavioural models. We then evaluate the extent to which explanations proposed by the learned models accord with a human observer's interpretation of the behaviour exhibited by the robot in its execution of the task.  相似文献   

19.
We introduce the Self-Adaptive Goal Generation Robust Intelligent Adaptive Curiosity (SAGG-RIAC) architecture as an intrinsically motivated goal exploration mechanism which allows active learning of inverse models in high-dimensional redundant robots. This allows a robot to efficiently and actively learn distributions of parameterized motor skills/policies that solve a corresponding distribution of parameterized tasks/goals. The architecture makes the robot sample actively novel parameterized tasks in the task space, based on a measure of competence progress, each of which triggers low-level goal-directed learning of the motor policy parameters that allow to solve it. For both learning and generalization, the system leverages regression techniques which allow to infer the motor policy parameters corresponding to a given novel parameterized task, and based on the previously learnt correspondences between policy and task parameters.We present experiments with high-dimensional continuous sensorimotor spaces in three different robotic setups: (1) learning the inverse kinematics in a highly-redundant robotic arm, (2) learning omnidirectional locomotion with motor primitives in a quadruped robot, and (3) an arm learning to control a fishing rod with a flexible wire. We show that (1) exploration in the task space can be a lot faster than exploration in the actuator space for learning inverse models in redundant robots; (2) selecting goals maximizing competence progress creates developmental trajectories driving the robot to progressively focus on tasks of increasing complexity and is statistically significantly more efficient than selecting tasks randomly, as well as more efficient than different standard active motor babbling methods; (3) this architecture allows the robot to actively discover which parts of its task space it can learn to reach and which part it cannot.  相似文献   

20.
Expressing security policies to govern distributed systems is a complex and error-prone task. Policies are hard to understand, often expressed with unfriendly syntax, making it difficult for security administrators and for business analysts to create intelligible specifications. We introduce the Hierarchical Policy Language for Distributed Systems (HiPoLDS), which has been designed to enable the specification of security policies in distributed systems in a concise, readable, and extensible way. HiPoLDS design focuses on decentralized execution environments under the control of multiple stakeholders. It represents policy enforcement through the use of distributed reference monitors, which control the flow of information between services. HiPoLDS allows the definition of both abstract and concrete policies, expressing respectively high-level properties required and concrete implementation details to be ultimately introduced into the service implementation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号