首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Task demonstration is an effective technique for developing robot motion control policies. As tasks become more complex, however, demonstration can become more difficult. In this work, we introduce an algorithm that uses corrective human feedback to build a policy able to perform a novel task, by combining simpler policies learned from demonstration. While some demonstration-based learning approaches do adapt policies with execution experience, few provide corrections within low-level motion control domains or to enable the linking of multiple of demonstrated policies. Here we introduce Feedback for Policy Scaffolding (FPS) as an algorithm that first evaluates and corrects the execution of motion primitive policies learned from demonstration. The algorithm next corrects and enables the execution of a more complex task constructed from these primitives. Key advantages of building a policy from demonstrated primitives is the potential for primitive policy reuse within multiple complex policies and the faster development of these policies, in addition to the development of complex policies for which full demonstration is difficult. Policy reuse under our algorithm is assisted by human teacher feedback, which also contributes to the improvement of policy performance. Within a simulated robot motion control domain we validate that, using FPS, a policy for a novel task is successfully built from motion primitives learned from demonstration. We show feedback to both aid and enable policy development, improving policy performance in success, speed and efficiency.  相似文献   

2.
Learning and interacting in human-robot domains   总被引:1,自引:0,他引:1  
We focus on a robotic domain in which a human acts both as a teacher and a collaborator to a mobile robot. First, we present an approach that allows a robot to learn task representations from its own experiences of interacting with a human. While most approaches to learning from demonstration have focused on acquiring policies (i.e., collections of reactive rules), we demonstrate a mechanism that constructs high-level task representations based on the robot's underlying capabilities. Next, we describe a generalization of the framework to allow a robot to interact with humans in order to handle unexpected situations that can occur in its task execution. Without using explicit communication, the robot is able to engage a human to aid it during certain parts of task execution. We demonstrate our concepts with a mobile robot learning various tasks from a human and, when needed, interacting with a human to get help performing them  相似文献   

3.
In this paper a unified motion control strategy dedicated for the waypoint following task realized by a differentially driven robot is presented. It is assumed that the vehicle moves with limited velocities and accelerations in order to reduce excessive slip and skid effects. In order to include operational constraints, a motion planner is combined with a universal stabilizer taking advantage of transverse functions. To improve tracking precision translated transverse functions are deployed and a new adaptive technique for the controller tuning is proposed. During the motion planning stage an auxiliary trajectory connecting points in the configuration space and satisfying assumed phase constraints is generated. The resulting motion execution system has been implemented on a laboratory-scale skid-steering mobile robot, which served as platform for experimental validation of presented algorithms.  相似文献   

4.
5.
《Advanced Robotics》2013,27(2):137-163
This paper focuses on dexterity and versatility in pinching a rectangular object by a pair of robot fingers based on sensory feedback. In the pinching motion of humans, it is possible to execute concurrent pinching and orientation control quickly and precisely by using only the thumb and index finger. However, it is not easy for robot fingers to perform such imposed tasks agilely and simultaneously. In the case of robotic grasping, to perform concurrently such plural tasks retards the convergence speed in the execution of the overall task. This means that in order to increase versatility by imposing additional tasks, dexterity in the execution of each task may deteriorate. In this paper it is shown that both dexterity and versatility in the execution of such imposed tasks can be enhanced remarkably, without any deterioration in dexterity in the execution of each task, by using a sensory feedback method based on the idea of role-sharing joint control which comes from observation of the functional role of each human finger joint.  相似文献   

6.
We propose an approach to efficiently teach robots how to perform dynamic manipulation tasks in cooperation with a human partner. The approach utilises human sensorimotor learning ability where the human tutor controls the robot through a multi-modal interface to make it perform the desired task. During the tutoring, the robot simultaneously learns the action policy of the tutor and through time gains full autonomy. We demonstrate our approach by an experiment where we taught a robot how to perform a wood sawing task with a human partner using a two-person cross-cut saw. The challenge of this experiment is that it requires precise coordination of the robot’s motion and compliance according to the partner’s actions. To transfer the sawing skill from the tutor to the robot we used Locally Weighted Regression for trajectory generalisation, and adaptive oscillators for adaptation of the robot to the partner’s motion.  相似文献   

7.
In this paper we describe a machine learning approach for acquiring a model of a robot behaviour from raw sensor data. We are interested in automating the acquisition of behavioural models to provide a robot with an introspective capability. We assume that the behaviour of a robot in achieving a task can be modelled as a finite stochastic state transition system.Beginning with data recorded by a robot in the execution of a task, we use unsupervised learning techniques to estimate a hidden Markov model (HMM) that can be used both for predicting and explaining the behaviour of the robot in subsequent executions of the task. We demonstrate that it is feasible to automate the entire process of learning a high quality HMM from the data recorded by the robot during execution of its task.The learned HMM can be used both for monitoring and controlling the behaviour of the robot. The ultimate purpose of our work is to learn models for the full set of tasks associated with a given problem domain, and to integrate these models with a generative task planner. We want to show that these models can be used successfully in controlling the execution of a plan. However, this paper does not develop the planning and control aspects of our work, focussing instead on the learning methodology and the evaluation of a learned model. The essential property of the models we seek to construct is that the most probable trajectory through a model, given the observations made by the robot, accurately diagnoses, or explains, the behaviour that the robot actually performed when making these observations. In the work reported here we consider a navigation task. We explain the learning process, the experimental setup and the structure of the resulting learned behavioural models. We then evaluate the extent to which explanations proposed by the learned models accord with a human observer's interpretation of the behaviour exhibited by the robot in its execution of the task.  相似文献   

8.
Recently, robot learning through deep reinforcement learning has incorporated various robot tasks through deep neural networks, without using specific control or recognition algorithms. However, this learning method is difficult to apply to the contact tasks of a robot, due to the exertion of excessive force from the random search process of reinforcement learning. Therefore, when applying reinforcement learning to contact tasks, solving the contact problem using an existing force controller is necessary. A neural-network-based movement primitive (NNMP) that generates a continuous trajectory which can be transmitted to the force controller and learned through a deep deterministic policy gradient (DDPG) algorithm is proposed for this study. In addition, an imitation learning algorithm suitable for NNMP is proposed such that the trajectories similar to the demonstration trajectory are stably generated. The performance of the proposed algorithms was verified using a square peg-in-hole assembly task with a tolerance of 0.1 mm. The results confirm that the complicated assembly trajectory can be learned stably through NNMP by the proposed imitation learning algorithm, and that the assembly trajectory is improved by learning the proposed NNMP through the DDPG algorithm.  相似文献   

9.
ABSTRACT

The recent demographic trend across developed nations shows a dramatic increase in the aging population, fallen fertility rates and a shortage of caregivers. Hence, the demand for service robots to assist with dressing which is an essential Activity of Daily Living (ADL) is increasing rapidly. Robotic Clothing Assistance is a challenging task since the robot has to deal with two demanding tasks simultaneously, (a) non-rigid and highly flexible cloth manipulation and (b) safe human–robot interaction while assisting humans whose posture may vary during the task. On the other hand, humans can deal with these tasks rather easily. In this paper, we propose a framework for robotic clothing assistance by imitation learning from a human demonstration to a compliant dual-arm robot. In this framework, we divide the dressing task into three phases, i.e. reaching phase, arm dressing phase, and body dressing phase. We model the arm dressing phase as a global trajectory modification using Dynamic Movement Primitives (DMP), while we model the body dressing phase toward a local trajectory modification applying Bayesian Gaussian Process Latent Variable Model (BGPLVM). We show that the proposed framework developed towards assisting the elderly is generalizable to various people and successfully performs a sleeveless shirt dressing task. We also present participants feedback on public demonstration at the International Robot Exhibition (iREX) 2017. To our knowledge, this is the first work performing a full dressing of a sleeveless shirt on a human subject with a humanoid robot.  相似文献   

10.
In this article, a learning framework that enables robotic arms to replicate new skills from human demonstration is proposed. The learning framework makes use of online human motion data acquired using wearable devices as an interactive interface for providing the anticipated motion to the robot in an efficient and user-friendly way. This approach offers human tutors the ability to control all joints of the robotic manipulator in real-time and able to achieve complex manipulation. The robotic manipulator is controlled remotely with our low-cost wearable devices for easy calibration and continuous motion mapping. We believe that our approach might lead to improving the human-robot skill learning, adaptability, and sensitivity of the proposed human-robot interaction for flexible task execution and thereby giving room for skill transfer and repeatability without complex coding skills.  相似文献   

11.
《Advanced Robotics》2013,27(2):229-244
In this paper a learning method is described which enables a conventional industrial robot to accurately execute the teach-in path in the presence of dynamical effects and high speed. After training the system is capable of generating positional commands that in combination with the standard robot controller lead the robot along the desired trajectory. The mean path deviations are reduced to a factor of 20 for our test configuration. For low speed motion the learned controllers' accuracy is in the range of the resolution of the positional encoders. The learned controller does not depend on specific trajectories. It acts as a general controller that can be used for non-recurring tasks as well as for sensor-based planned paths. For repetitive control tasks accuracy can be even increased. Such improvements are caused by a three level structure estimating a simple process model, optimal a posteriori commands, and a suitable feedforward controller, the latter including neural networks for the representation of nonlinear behaviour. The learning system is demonstrated in experiments with a Manutec R2 industrial robot. After training with only two sample trajectories the learned control system is applied to other totally different paths which are executed with high precision as well.  相似文献   

12.
In minimally invasive surgery, tools go through narrow openings and manipulate soft organs to perform surgical tasks. There are limitations in current robot-assisted surgical systems due to the rigidity of robot tools. The aim of the STIFF-FLOP European project is to develop a soft robotic arm to perform surgical tasks. The flexibility of the robot allows the surgeon to move within organs to reach remote areas inside the body and perform challenging procedures in laparoscopy. This article addresses the problem of designing learning interfaces enabling the transfer of skills from human demonstration. Robot programming by demonstration encompasses a wide range of learning strategies, from simple mimicking of the demonstrator's actions to the higher level imitation of the underlying intent extracted from the demonstrations. By focusing on this last form, we study the problem of extracting an objective function explaining the demonstrations from an over-specified set of candidate reward functions, and using this information for self-refinement of the skill. In contrast to inverse reinforcement learning strategies that attempt to explain the observations with reward functions defined for the entire task (or a set of pre-defined reward profiles active for different parts of the task), the proposed approach is based on context-dependent reward-weighted learning, where the robot can learn the relevance of candidate objective functions with respect to the current phase of the task or encountered situation. The robot then exploits this information for skills refinement in the policy parameters space. The proposed approach is tested in simulation with a cutting task performed by the STIFF-FLOP flexible robot, using kinesthetic demonstrations from a Barrett WAM manipulator.  相似文献   

13.
We present a novel method for a robot to interactively learn, while executing, a joint human–robot task. We consider collaborative tasks realized by a team of a human operator and a robot helper that adapts to the human’s task execution preferences. Different human operators can have different abilities, experiences, and personal preferences so that a particular allocation of activities in the team is preferred over another. Our main goal is to have the robot learn the task and the preferences of the user to provide a more efficient and acceptable joint task execution. We cast concurrent multi-agent collaboration as a semi-Markov decision process and show how to model the team behavior and learn the expected robot behavior. We further propose an interactive learning framework and we evaluate it both in simulation and on a real robotic setup to show the system can effectively learn and adapt to human expectations.  相似文献   

14.
15.
A knowledge-based framework to support task-level programming and operational control of robots is described. Our bask intention is to enhance the intelligence of a robot control system so that it may carefully coordinate the interactions among discrete, asynchronous and concurrent events under the constraints of action precedence and resource allocation. We do this by integrating both off-line and on-line planning capabilities in a single framework. The off-line phase is equipped with proper languages for describing workbenches, specifying tasks, and soliciting knowledge from the user to support the execution of robot tasks. A static planner is included in the phase to conduct static planning, which develops local plans for various specific tasks. The on-line phase is designed as a dynamic control loop for the robot system. It employs a dynamic planner to tackle any contingent situations during the robot operations. It is responsible for developing proper working paths and motion plans to achieve the task goals within designated temporal and resource constraints. It is implemented in a distributed and cooperative blackboard system, which facilitates the integration of various types of knowledge. Finally, any failures from the on-line phase are fed back to the off-line phase. This forms the interaction between the off-line and on-line phases and introduces an extra closed loop opportunistically to tune the dynamic planner to adapt to the variation of the working environment in a long-term manner.  相似文献   

16.
We propose a self-generating algorithm of behavioral evaluation that is important for a learning function in order to develop appropriate cooperative behavior among robots depending on the situation. The behavioral evaluation is composed of rewards and a consumption of energy. Rewards are provided by an operator when the robots share tasks appropriately, and the consumption of energy is measured during the execution of the tasks. Each robot estimates rules of behavior selection by using the evaluation generated, and learns to select an appropriate behavior when it meets the same situation. As a result, the robots may be able to share tasks efficiently even if the purpose of their task is changed by an operator in the middle of execution, because the evaluation is modified depending on the situation. We performed simulations to study the effectiveness of the proposed algorithm. In the simulations, we applied the algorithm to three robots, each with three behaviors. We confirmed that each robot can generate an appropriate behavioral evaluation based on rewards from an operator, and therefore they develop cooperative behaviors such as task sharing. This work was presented, in part, at the Second International Symposium on Artificial Life and Robotics, Oita, Japan, February 18–20, 1997  相似文献   

17.
When describing robot motion with dynamic movement primitives (DMPs), goal (trajectory endpoint), shape and temporal scaling parameters are used. In reinforcement learning with DMPs, usually goals and temporal scaling parameters are predefined and only the weights for shaping a DMP are learned. Many tasks, however, exist where the best goal position is not a priori known, requiring to learn it. Thus, here we specifically address the question of how to simultaneously combine goal and shape parameter learning. This is a difficult problem because learning of both parameters could easily interfere in a destructive way. We apply value function approximation techniques for goal learning and direct policy search methods for shape learning. Specifically, we use “policy improvement with path integrals” and “natural actor critic” for the policy search. We solve a learning-to-pour-liquid task in simulations as well as using a Pa10 robot arm. Results for learning from scratch, learning initialized by human demonstration, as well as for modifying the tool for the learned DMPs are presented. We observe that the combination of goal and shape learning is stable and robust within large parameter regimes. Learning converges quickly even in the presence of disturbances, which makes this combined method suitable for robotic applications.  相似文献   

18.
由于4轮驱动机器人的轮间耦合特性及系统非线性的存在,即使单个驱动电机的控制精度达到最优,机器人整体的运动控制效果也未必理想.针对这一问题,提出一种基于大脑情感学习的机器人速度补偿控制方法.基于大脑情感学习计算模型,设计了融合机器人整体速度跟踪误差及其积分、微分信息的补偿控制器,通过计算模型内部各节点权值的在线学习,及时地调整控制器的参数,实现对4个轮子速度的自适应补偿.仿真实验表明,该方法有效减小了非线性干扰对系统的影响,具有较高的稳态控制精度和较快的响应速度,大大提高了机器人整体的速度和轨迹跟踪精度.  相似文献   

19.
Precise programming of robots for industrial tasks is inflexible to variations and time-consuming. Teaching a kinematic behavior by demonstration and encoding it with dynamical systems that are robust with respect to perturbations, is proposed in order to address this issue. Given a kinematic behavior encoded by Dynamic Movement Primitives (DMP), this work proposes a passive control scheme for assisting kinesthetic modifications of the learned behavior in task variations. It employs the utilization of penetrable spherical Virtual Fixtures (VFs) around the DMP’s virtual evolution that follows the teacher’s motion. The controller enables the user to haptically ‘inspect’ the spatial properties of the learned behavior in SE(3) and significantly modify it at any required segment, while facilitating the following of already learned segments. A demonstration within the VFs could signify that the kinematic behavior is taught correctly and could lead to autonomous execution, with the DMP generating the newly learned reference commands. The proposed control scheme is theoretically proved to be passive and experimentally validated with a KUKA LWR4+ robot. Results are compared with the case of using a gravity compensated robot agnostic of the previously learned task. It is shown that the time duration of teaching and the user’s cognitive load are reduced.  相似文献   

20.
Trajectory learning is a fundamental component in a robot Programming by Demonstration (PbD) system, where often the very purpose of the demonstration is to teach complex manipulation patterns. However, human demonstrations are inevitably noisy and inconsistent. This paper highlights the trajectory learning component of a PbD system for manipulation tasks encompassing the ability to cluster, select, and approximate human demonstrated trajectories. The proposed technique provides some advantages with respect to alternative approaches and is suitable for learning from both individual and multiple user demonstrations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号