首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Reinforcement learning is an appealing approach for allowing robots to learn new tasks. Relevant literature reveals a plethora of methods, but at the same time makes clear the lack of implementations for dealing with real life challenges. Current expectations raise the demand for adaptable robots. We argue that, by employing model-based reinforcement learning, the—now limited—adaptability characteristics of robotic systems can be expanded. Also, model-based reinforcement learning exhibits advantages that makes it more applicable to real life use-cases compared to model-free methods. Thus, in this survey, model-based methods that have been applied in robotics are covered. We categorize them based on the derivation of an optimal policy, the definition of the returns function, the type of the transition model and the learned task. Finally, we discuss the applicability of model-based reinforcement learning approaches in new applications, taking into consideration the state of the art in both algorithms and hardware.  相似文献   

2.
In order to accomplish diverse tasks successfully in a dynamic (i.e., changing over time) construction environment, robots should be able to prioritize assigned tasks to optimize their performance in a given state. Recently, a deep reinforcement learning (DRL) approach has shown potential for addressing such adaptive task allocation. It remains unanswered, however, whether or not DRL can address adaptive task allocation problems in dynamic robotic construction environments. In this paper, we developed and tested a digital twin-driven DRL learning method to explore the potential of DRL for adaptive task allocation in robotic construction environments. Specifically, the digital twin synthesizes sensory data from physical assets and is used to simulate a variety of dynamic robotic construction site conditions within which a DRL agent can interact. As a result, the agent can learn an adaptive task allocation strategy that increases project performance. We tested this method with a case project in which a virtual robotic construction project (i.e., interlocking concrete bricks are delivered and assembled by robots) was digitally twinned for DRL training and testing. Results indicated that the DRL model’s task allocation approach reduced construction time by 36% in three dynamic testing environments when compared to a rule-based imperative model. The proposed DRL learning method promises to be an effective tool for adaptive task allocation in dynamic robotic construction environments. Such an adaptive task allocation method can help construction robots cope with uncertainties and can ultimately improve construction project performance by efficiently prioritizing assigned tasks.  相似文献   

3.
Human–Robot Collaboration (HRC) is a term used to describe tasks in which robots and humans work together to achieve a goal. Unlike traditional industrial robots, collaborative robots need to be adaptive; able to alter their approach to better suit the situation and the needs of the human partner. As traditional programming techniques can struggle with the complexity required, an emerging approach is to learn a skill by observing human demonstration and imitating the motions; commonly known as Learning from Demonstration (LfD). In this work, we present a LfD methodology that combines an ensemble machine learning algorithm (i.e. Random Forest (RF)) with stochastic regression, using haptic information captured from human demonstration. The capabilities of the proposed method are evaluated using two collaborative tasks; co-manipulation of an object (where the human provides the guidance but the robot handles the objects weight) and collaborative assembly of simple interlocking parts. The proposed method is shown to be capable of imitation learning; interpreting human actions and producing equivalent robot motion across a diverse range of initial and final conditions. After verifying that ensemble machine learning can be utilised for real robotics problems, we propose a further extension utilising Weighted Random Forest (WRF) that attaches weights to each tree based on its performance. It is then shown that the WRF approach outperforms RF in HRC tasks.  相似文献   

4.
《Advanced Robotics》2013,27(10):1215-1229
Reinforcement learning is the scheme for unsupervised learning in which robots are expected to acquire behavior skills through self-explorations based on reward signals. There are some difficulties, however, in applying conventional reinforcement learning algorithms to motion control tasks of a robot because most algorithms are concerned with discrete state space and based on the assumption of complete observability of the state. Real-world environments often have partial observablility; therefore, robots have to estimate the unobservable hidden states. This paper proposes a method to solve these two problems by combining the reinforcement learning algorithm and a learning algorithm for a continuous time recurrent neural network (CTRNN). The CTRNN can learn spatio-temporal structures in a continuous time and space domain, and can preserve the contextual flow by a self-organizing appropriate internal memory structure. This enables the robot to deal with the hidden state problem. We carried out an experiment on the pendulum swing-up task without rotational speed information. As a result, this task is accomplished in several hundred trials using the proposed algorithm. In addition, it is shown that the information about the rotational speed of the pendulum, which is considered as a hidden state, is estimated and encoded on the activation of a context neuron.  相似文献   

5.
仿生机器人是一类典型的多关节非线性欠驱动系统,其步态控制是一个非常具有挑战性的问题。对于该问题,传统的控制和规划方法需要针对具体的运动任务进行专门设计,需要耗费大量时间和精力,而且所设计出来的控制器往往没有通用性。基于数据驱动的强化学习方法能对不同的任务进行自主学习,且对不同的机器人和运动任务具有良好的通用性。因此,近年来这种基于强化学习的方法在仿生机器人运动步态控制方面获得了不少应用。针对这方面的研究,本文从问题形式化、策略表示方法和策略学习方法3个方面对现有的研究情况进行了分析和总结,总结了强化学习应用于仿生机器人步态控制中尚待解决的问题,并指出了后续的发展方向。  相似文献   

6.
In this article we present a self-organized method for allocating the individuals of a robot swarm to tasks that are sequentially interdependent. Tasks that are sequentially interdependent are common in natural and artificial systems. The proposed method does neither rely on global knowledge nor centralized components. Moreover, it does not require the robots to communicate. The method is based on the delay experienced by the robots working on one subtask when waiting for input from another subtask. We explore the capabilities of the method in different simulated environments. Additionally, we evaluate the method in a proof-of-concept experiment using real robots. We show that the method allows a swarm to reach a near-optimal allocation in the studied environments, can easily be transferred to a real robot setting, and is adaptive to changes in the properties of the tasks such as their duration. Finally, we show that the ideal setting of the parameters of the method does not depend on the properties of the environment.  相似文献   

7.
《Advanced Robotics》2013,27(1):83-99
Reinforcement learning can be an adaptive and flexible control method for autonomous system. It does not need a priori knowledge; behaviors to accomplish given tasks are obtained automatically by repeating trial and error. However, with increasing complexity of the system, the learning costs are increased exponentially. Thus, application to complex systems, like a many redundant d.o.f. robot and multi-agent system, is very difficult. In the previous works in this field, applications were restricted to simple robots and small multi-agent systems, and because of restricted functions of the simple systems that have less redundancy, effectiveness of reinforcement learning is restricted. In our previous works, we had taken these problems into consideration and had proposed new reinforcement learning algorithm, 'Q-learning with dynamic structuring of exploration space based on GA (QDSEGA)'. Effectiveness of QDSEGA for redundant robots has been demonstrated using a 12-legged robot and a 50-link manipulator. However, previous works on QDSEGA were restricted to redundant robots and it was impossible to apply it to multi mobile robots. In this paper, we extend our previous work on QDSEGA by combining a rule-based distributed control and propose a hybrid autonomous control method for multi mobile robots. To demonstrate the effectiveness of the proposed method, simulations of a transportation task by 10 mobile robots are carried out. As a result, effective behaviors have been obtained.  相似文献   

8.
Kinematically redundant robots allow simultaneous execution of several tasks with different priorities. Beside the main task, obstacle avoidance is one commonly used subtask. The ability to avoid obstacles is especially important when the robot is working in a human environment. In this paper, we propose a novel control method for kinematically redundant robots, where we focus on a smooth, continuous transition between different tasks. The method is based on a new and very simple null-space formulation. Sufficient conditions for the tasks design are given using the Lyapunov-based stability discussion. The effectiveness of the proposed control method is demonstrated by simulation and on a real robot. Pros and cons of the proposed method and the comparison with other control methods are also discussed.  相似文献   

9.
Generating teams of robots that are able to perform their tasks over long periods of time requires the robots to be responsive to continual changes in robot team member capabilities and to changes in the state of the environment and mission. In this article, we describe the L-ALLIANCE architecture, which enables teams of heterogeneous robots to dynamically adapt their actions over time. This architecture, which is an extension of our earlier work on ALLIANCE, is a distributed, behavior-based architecture aimed for use in applications consisting of a collection of independent tasks. The key issue addressed in L-ALLIANCE is the determination of which tasks robots should select to perform during their mission, even when multiple robots with heterogeneous, continually changing capabilities are present on the team. In this approach, robots monitor the performance of their teammates performing common tasks, and evaluate their performance based upon the time of task completion. Robots then use this information throughout the lifetime of their mission to automatically update their control parameters. After describing the L-ALLIANCE architecture, we discuss the results of implementing this approach on a physical team of heterogeneous robots performing proof-of-concept box pushing experiments. The results illustrate the ability of L-ALLIANCE to enable lifelong adaptation of heterogeneous robot teams to continuing changes in the robot team member capabilities and in the environment.  相似文献   

10.
Dorigo  Marco 《Machine Learning》1995,19(3):209-240
In this article we investigate the feasibility of using learning classifier systems as a tool for building adaptive control systems for real robots. Their use on real robots imposes efficiency constraints which are addressed by three main tools: parallelism, distributed architecture, and training. Parallelism is useful to speed up computation and to increase the flexibility of the learning system design. Distributed architecture helps in making it possible to decompose the overall task into a set of simpler learning tasks. Finally, training provides guidance to the system while learning, shortening the number of cycles required to learn. These tools and the issues they raise are first studied in simulation, and then the experience gained with simulations is used to implement the learning system on the real robot. Results have shown that with this approach it is possible to let the AutonoMouse, a small real robot, learn to approach a light source under a number of different noise and lesion conditions.This work was partially written while the author was at International Computer Science Institute, 1947 Center Street, Suite 600, Berkeley, 94704-1198 California, USA.  相似文献   

11.
For the last decade, we have been developing a vision-based architecture for mobile robot navigation. Using our bio-inspired model of navigation, robots can perform sensory-motor tasks in real time in unknown indoor as well as outdoor environments. We address here the problem of autonomous incremental learning of a sensory-motor task, demonstrated by an operator guiding a robot. The proposed system allows for semisupervision of task learning and is able to adapt the environmental partitioning to the complexity of the desired behavior. A real dialogue based on actions emerges from the interactive teaching. The interaction leads the robot to autonomously build a precise sensory-motor dynamics that approximates the behavior of the teacher. The usability of the system is highlighted by experiments on real robots, in both indoor and outdoor environments. Accuracy measures are also proposed in order to evaluate the learned behavior as compared to the expected behavioral attractor. These measures, used first in a real experiment and then in a simulated experiment, demonstrate how a real interaction between the teacher and the robot influences the learning process.  相似文献   

12.
The use of robots in society could be expanded by using reinforcement learning (RL) to allow robots to learn and adapt to new situations online. RL is a paradigm for learning sequential decision making tasks, usually formulated as a Markov Decision Process (MDP). For an RL algorithm to be practical for robotic control tasks, it must learn in very few samples, while continually taking actions in real-time. In addition, the algorithm must learn efficiently in the face of noise, sensor/actuator delays, and continuous state features. In this article, we present texplore, the first algorithm to address all of these challenges together. texplore is a model-based RL method that learns a random forest model of the domain which generalizes dynamics to unseen states. The agent explores states that are promising for the final policy, while ignoring states that do not appear promising. With sample-based planning and a novel parallel architecture, texplore can select actions continually in real-time whenever necessary. We empirically evaluate the importance of each component of texplore in isolation and then demonstrate the complete algorithm learning to control the velocity of an autonomous vehicle in real-time.  相似文献   

13.
14.
Unmeasurable object deformation and local communication time delays between the slave robots influence the manipulation effect for multirobot multioperator teleoperation. In this article, a distributed control method based on high‐gain nonlinear observer, interactive identification, and impedance control is proposed for this problem. First, we use Hunt‐Crossley contact model and deduce the desired synchronizing object state in cooperative teleoperation. Second, an impedance item expressed by the internal position errors is presented to decrease object position tracking errors. For the unmeasurable object deformation, an interactive identification method is proposed for estimating unknown variables. Third, we consider both varying communication time delays and local time delays in the slave side. Two mirror high‐gain nonlinear observers are designed for estimating other slave robots' real‐time state. Finally, we build the system controllers and prove the stability of the closed‐loop system and the boundless of estimating errors using Lyapunov functions. Comparable simulation results executed by the physical system present that the position and internal force tracking errors of the object decrease in the designated cooperative tasks.  相似文献   

15.
Recently, robots are expected to support our daily lives in real environments. In such environments, however, there are a lot of obstacles and the motion of the robot is affected by them. In this research, we develop a musculoskeletal robotic arm and a system identification method for coping with external forces while learning the dynamics of complicated situations, based on Gaussian process regression (GPR). The musculoskeletal robot has the ability to cope with external forces by utilizing a bio-inspired mechanism. GPR is an easy-to-implement method, but can handle complicated prediction tasks. The experimental results show that the behavior of the robot while interacting with its surroundings can be predicted by our method.  相似文献   

16.
This paper focuses on the general problem of coordinating multiple robots. More specifically, it addresses the self-selection of heterogeneous specialized tasks by autonomous robots. In this paper we focus on a specifically distributed or decentralized approach as we are particularly interested in a decentralized solution where the robots themselves autonomously and in an individual manner, are responsible for selecting a particular task so that all the existing tasks are optimally distributed and executed. In this regard, we have established an experimental scenario to solve the corresponding multi-task distribution problem and we propose a solution using two different approaches by applying Response Threshold Models as well as Learning Automata-based probabilistic algorithms. We have evaluated the robustness of the algorithms, perturbing the number of pending loads to simulate the robot’s error in estimating the real number of pending tasks and also the dynamic generation of loads through time. The paper ends with a critical discussion of experimental results.  相似文献   

17.
Expert and intelligent systems are being developed to control many technological systems including mobile robots. However, the PID (Proportional-Integral-Derivative) controller is a fast low-level control strategy widely used in many control engineering tasks. Classic control theory has contributed with different tuning methods to obtain the gains of PID controllers for specific operation conditions. Nevertheless, when the system is not fully known and the operative conditions are variable and not previously known, classical techniques are not entirely suitable for the PID tuning. To overcome these drawbacks many adaptive approaches have been arisen, mainly from the field of artificial intelligent. In this work, we propose an incremental Q-learning strategy for adaptive PID control. In order to improve the learning efficiency we define a temporal memory into the learning process. While the memory remains invariant, a non-uniform specialization process is carried out generating new limited subspaces of learning. An implementation on a real mobile robot demonstrates the applicability of the proposed approach for a real-time simultaneous tuning of multiples adaptive PID controllers for a real system operating under variable conditions in a real environment.  相似文献   

18.
水下仿生机器人具有高效率、高机动性、低噪声等优点,针对仿生机器鳗鱼存在设计复杂、控制难度大等问题,该文提出了一种新型欠驱动机器鳗鱼的控制方法。首先,基于主动加被动的仿生机构推进原理,设计了两段主动体与两段被动顺从体相结合的机器鳗鱼仿生机构;然后,在仿真环境中进行建模,利用深度强化学习算法进行数据收集和训练,选择表现良好的神经网络在仿真环境中进行控制测试,从而得到机器鳗鱼的控制函数;最后,通过对比实验,验证了该文设计方法的可行性以及控制函数的有效性,实现了对机器鳗鱼的控制。  相似文献   

19.
Reinforcement learning (RL) has been applied to constructing controllers for nonlinear systems in recent years. Since RL methods do not require an exact dynamics model of the controlled object, they have a higher flexibility and potential for adaptation to uncertain or nonstationary environments than methods based on traditional control theory. If the target system has a continuous state space whose dynamic characteristics are nonlinear, however, RL methods often suffer from unstable learning processes. For this reason, it is difficult to apply RL methods to control tasks in the real world. In order to overcome the disadvantage of RL methods, we propose an RL scheme combining multiple controllers, each of which is constructed based on traditional control theory. We then apply it to a swinging-up and stabilizing task of an acrobot with a limited torque, which is a typical but difficult task in the field of nonlinear control theory. Our simulation result showed that our method was able to realize stable learning and to achieve fairly good control.This work was presented, in part, at the 9th International Symposium on Artificial Life and Robotics, Oita, Japan, January 28–30, 2004  相似文献   

20.
We propose an integrated technique of genetic programming (GP) and reinforcement learning (RL) to enable a real robot to adapt its actions to a real environment. Our technique does not require a precise simulator because learning is achieved through the real robot. In addition, our technique makes it possible for real robots to learn effective actions. Based on this proposed technique, we acquire common programs, using GP, which are applicable to various types of robots. Through this acquired program, we execute RL in a real robot. With our method, the robot can adapt to its own operational characteristics and learn effective actions. In this paper, we show experimental results from two different robots: a four-legged robot "AIBO" and a humanoid robot "HOAP-1." We present results showing that both effectively solved the box-moving task; the end result demonstrates that our proposed technique performs better than the traditional Q-learning method.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号