期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A reinforcement learning approach to fail-safe design for multiple space robots—cooperation mechanism without communication and negotiation schemes

《Advanced Robotics》2013,27(1):21-39

This paper explores a fail-safe design for multiple space robots, which enables robots to complete given tasks even when they can no longer be controlled due to a communication accident or negotiation problem. As the first step towards this goal, we propose new reinforcement learning methods that help robots avoid deadlock situations in addition to improving the degree of task completion without communications via ground stations or negotiations with other robots. Through intensive simulations on a truss construction task, we found that our reinforcement learning methods have great potential to contribute towards fail-safe design for multiple space robots in the above case. Furthermore, the simulations revealed the following detailed implications: (i) the first several planned behaviors must not be reinforced with negative rewards even in deadlock situations in order to derive cooperation among multiple robots, (ii) a certain amount of positive rewards added into negative rewards in deadlock situations contributes to reducing the computational cost of finding behavior plans for task completion, and (iii) an appropriate balance between positive and negative rewards in deadlock situations is indispensable for finding good behavior plans at a small computational cost. 相似文献

2.

Dynamic motion learning for multi-DOF flexible-joint robots using active–passive motor babbling through deep learning

Kuniyuki Takahashi Tetsuya Ogata Jun Nakanishi Gordon Cheng Shigeki Sugano 《Advanced Robotics》2017,31(18):1002-1015

This paper proposes a learning strategy for robots with flexible joints having multi-degrees of freedom in order to achieve dynamic motion tasks. In spite of there being several potential benefits of flexible-joint robots such as exploitation of intrinsic dynamics and passive adaptation to environmental changes with mechanical compliance, controlling such robots is challenging because of increased complexity of their dynamics. To achieve dynamic movements, we introduce a two-phase learning framework of the body dynamics of the robot using a recurrent neural network motivated by a deep learning strategy. The proposed methodology comprises a pre-training phase with motor babbling and a fine-tuning phase with additional learning of the target tasks. In the pre-training phase, we consider active and passive exploratory motions for efficient acquisition of body dynamics. In the fine-tuning phase, the learned body dynamics are adjusted for specific tasks. We demonstrate the effectiveness of the proposed methodology in achieving dynamic tasks involving constrained movement requiring interactions with the environment on a simulated robot model and an actual PR2 robot both of which have a compliantly actuated seven degree-of-freedom arm. The results illustrate a reduction in the required number of training iterations for task learning and generalization capabilities for untrained situations. 相似文献

3.

Efficient vision-based navigation

Armin Hornung Maren Bennewitz Hauke Strasdat 《Autonomous Robots》2010,29(2):137-149

In this article, we present a novel approach to learning efficient navigation policies for mobile robots that use visual features for localization. As fast movements of a mobile robot typically introduce inherent motion blur in the acquired images, the uncertainty of the robot about its pose increases in such situations. As a result, it cannot be ensured anymore that a navigation task can be executed efficiently since the robot’s pose estimate might not correspond to its true location. We present a reinforcement learning approach to determine a navigation policy to reach the destination reliably and, at the same time, as fast as possible. Using our technique, the robot learns to trade off velocity against localization accuracy and implicitly takes the impact of motion blur on observations into account. We furthermore developed a method to compress the learned policy via a clustering approach. In this way, the size of the policy representation is significantly reduced, which is especially desirable in the context of memory-constrained systems. Extensive simulated and real-world experiments carried out with two different robots demonstrate that our learned policy significantly outperforms policies using a constant velocity and more advanced heuristics. We furthermore show that the policy is generally applicable to different indoor and outdoor scenarios with varying landmark densities as well as to navigation tasks of different complexity. 相似文献

4.

Digital twin-driven deep reinforcement learning for adaptive task allocation in robotic construction

《Advanced Engineering Informatics》2022

In order to accomplish diverse tasks successfully in a dynamic (i.e., changing over time) construction environment, robots should be able to prioritize assigned tasks to optimize their performance in a given state. Recently, a deep reinforcement learning (DRL) approach has shown potential for addressing such adaptive task allocation. It remains unanswered, however, whether or not DRL can address adaptive task allocation problems in dynamic robotic construction environments. In this paper, we developed and tested a digital twin-driven DRL learning method to explore the potential of DRL for adaptive task allocation in robotic construction environments. Specifically, the digital twin synthesizes sensory data from physical assets and is used to simulate a variety of dynamic robotic construction site conditions within which a DRL agent can interact. As a result, the agent can learn an adaptive task allocation strategy that increases project performance. We tested this method with a case project in which a virtual robotic construction project (i.e., interlocking concrete bricks are delivered and assembled by robots) was digitally twinned for DRL training and testing. Results indicated that the DRL model’s task allocation approach reduced construction time by 36% in three dynamic testing environments when compared to a rule-based imperative model. The proposed DRL learning method promises to be an effective tool for adaptive task allocation in dynamic robotic construction environments. Such an adaptive task allocation method can help construction robots cope with uncertainties and can ultimately improve construction project performance by efficiently prioritizing assigned tasks. 相似文献

5.

Survey of Model-Based Reinforcement Learning: Applications on Robotics

Athanasios S. Polydoros Lazaros Nalpantidis 《Journal of Intelligent and Robotic Systems》2017,86(2):153-173

Reinforcement learning is an appealing approach for allowing robots to learn new tasks. Relevant literature reveals a plethora of methods, but at the same time makes clear the lack of implementations for dealing with real life challenges. Current expectations raise the demand for adaptable robots. We argue that, by employing model-based reinforcement learning, the—now limited—adaptability characteristics of robotic systems can be expanded. Also, model-based reinforcement learning exhibits advantages that makes it more applicable to real life use-cases compared to model-free methods. Thus, in this survey, model-based methods that have been applied in robotics are covered. We categorize them based on the derivation of an optimal policy, the definition of the returns function, the type of the transition model and the learned task. Finally, we discuss the applicability of model-based reinforcement learning approaches in new applications, taking into consideration the state of the art in both algorithms and hardware. 相似文献

6.

一种新的多智能体强化学习算法及其在多机器人协作任务中的应用 总被引：1，自引：0，他引：1

顾国昌仲宇张汝波《机器人》2003,25(4):344-348

在多机器人系统中，评价一个机器人行为的好坏常常依赖于其它机器人的行为，此时必须采用组合动作以实现多机器人的协作，但采用组合动作的强化学习算法由于学习空间异常庞大而收敛得极慢．本文提出的新方法通过预测各机器人执行动作的概率来降低学习空间的维数，并应用于多机器人协作任务之中．实验结果表明，基于预测的加速强化学习算法可以比原始算法更快地获得多机器人的协作策略．相似文献

7.

Learning to Acquire Whole-Body Humanoid Center of Mass Movements to Achieve Dynamic Tasks

《Advanced Robotics》2013,27(10):1125-1142

This paper presents a novel approach for acquiring dynamic whole-body movements on humanoid robots focused on learning a control policy for the center of mass (CoM). In our approach, we combine both a model-based CoM controller and a model-free reinforcement learning (RL) method to acquire dynamic whole-body movements in humanoid robots. (i) To cope with high dimensionality, we use a model-based CoM controller as a basic controller that derives joint angular velocities from the desired CoM velocity. The balancing issue can also be considered in the controller. (ii) The RL method is used to acquire a controller that generates the desired CoM velocity based on the current state. To demonstrate the effectiveness of our approach, we apply it to a ball-punching task on a simulated humanoid robot model. The acquired whole-body punching movement was also demonstrated on Fujitsu's Hoap-2 humanoid robot. 相似文献

8.

Reinforcement learning of a continuous motor sequence with hidden states

《Advanced Robotics》2013,27(10):1215-1229

Reinforcement learning is the scheme for unsupervised learning in which robots are expected to acquire behavior skills through self-explorations based on reward signals. There are some difficulties, however, in applying conventional reinforcement learning algorithms to motion control tasks of a robot because most algorithms are concerned with discrete state space and based on the assumption of complete observability of the state. Real-world environments often have partial observablility; therefore, robots have to estimate the unobservable hidden states. This paper proposes a method to solve these two problems by combining the reinforcement learning algorithm and a learning algorithm for a continuous time recurrent neural network (CTRNN). The CTRNN can learn spatio-temporal structures in a continuous time and space domain, and can preserve the contextual flow by a self-organizing appropriate internal memory structure. This enables the robot to deal with the hidden state problem. We carried out an experiment on the pendulum swing-up task without rotational speed information. As a result, this task is accomplished in several hundred trials using the proposed algorithm. In addition, it is shown that the information about the rotational speed of the pendulum, which is considered as a hidden state, is estimated and encoded on the activation of a context neuron. 相似文献

9.

TEXPLORE: real-time sample-efficient reinforcement learning for robots

Todd Hester Peter Stone 《Machine Learning》2013,90(3):385-429

The use of robots in society could be expanded by using reinforcement learning (RL) to allow robots to learn and adapt to new situations online. RL is a paradigm for learning sequential decision making tasks, usually formulated as a Markov Decision Process (MDP). For an RL algorithm to be practical for robotic control tasks, it must learn in very few samples, while continually taking actions in real-time. In addition, the algorithm must learn efficiently in the face of noise, sensor/actuator delays, and continuous state features. In this article, we present texplore, the first algorithm to address all of these challenges together. texplore is a model-based RL method that learns a random forest model of the domain which generalizes dynamics to unseen states. The agent explores states that are promising for the final policy, while ignoring states that do not appear promising. With sample-based planning and a novel parallel architecture, texplore can select actions continually in real-time whenever necessary. We empirically evaluate the importance of each component of texplore in isolation and then demonstrate the complete algorithm learning to control the velocity of an autonomous vehicle in real-time. 相似文献

10.

Learning Image Representations Tied to Egomotion from Unlabeled Video

Dinesh Jayaraman Kristen Grauman 《International Journal of Computer Vision》2017,125(1-3):136-161

Understanding how images of objects and scenes behave in response to specific egomotions is a crucial aspect of proper visual development, yet existing visual learning methods are conspicuously disconnected from the physical source of their images. We propose a new “embodied” visual learning paradigm, exploiting proprioceptive motor signals to train visual representations from egocentric video with no manual supervision. Specifically, we enforce that our learned features exhibit equivariance i.e., they respond predictably to transformations associated with distinct egomotions. With three datasets, we show that our unsupervised feature learning approach significantly outperforms previous approaches on visual recognition and next-best-view prediction tasks. In the most challenging test, we show that features learned from video captured on an autonomous driving platform improve large-scale scene recognition in static images from a disjoint domain. 相似文献

11.

Learning to transfer optimal navigation policies

《Advanced Robotics》2013,27(13):1565-1582

Autonomous agents that act in the real world utilizing sensory input greatly rely on the ability to plan their actions and to transfer these skills across tasks. The majority of path-planning approaches for mobile robots, however, solve the current navigation problem from scratch, given the current and goal configuration of the robot. Consequently, these approaches yield highly efficient plans for the specific situation, but the computed policies typically do not transfer to other, similar tasks. In this paper, we propose to apply techniques from statistical relational learning to the path-planning problem. More precisely, we propose to learn relational decision trees as abstract navigation strategies from example paths. Relational abstraction has several interesting and important properties. First, it allows a mobile robot to imitate navigation behavior shown by users or by optimal policies. Second, it yields comprehensible models of behavior. Finally, a navigation policy learned in one environment naturally transfers to unknown environments. In several experiments with real robots and in simulated runs, we demonstrate that our approach yields efficient navigation plans. We show that our system is robust against observation noise and can outperform hand-crafted policies. 相似文献

12.

Extending adaptive fuzzy behavior hierarchies to multiple levels of composite behaviors

Brent E. Eskridge Dean F. Hougen 《Robotics and Autonomous Systems》2010,58(9):1076-1084

We propose an extended version of adaptive fuzzy behavior hierarchies, termed Multiple Composite Levels (MCL), that allows for the proper modulation of composite behaviors over multiple levels of a behavior hierarchy, and demonstrate its effectiveness for a hybrid learning/reactive control system. Controllers using adaptive fuzzy behavior hierarchies have previously been shown to provide effective control for robots tasked with multiple concurrent tasks. However, when more complex hierarchies are used to provide control for tasks of increasing complexity, low-level reactive behaviors may not be properly weighted, resulting in sub-optimal control. Through experimental evaluation in which composite behaviors that coordinate lower behaviors are learned using reinforcement learning, we demonstrate that MCL provides effective control in a complex multi-agent task, whereas the original implementation of adaptive fuzzy behavior hierarchies does not. 相似文献

13.

iTP-LfD: Improved task parametrised learning from demonstration for adaptive path generation of cobot

《Robotics and Computer》2021

An approach of Task-Parameterised Learning from Demonstration (TP-LfD) aims at automatically adapting the movements of collaborative robots (cobots) to new settings using knowledge learnt from demonstrated paths. The approach is suitable for encoding complex relations between a cobot and its surrounding, i.e., task-relevant objects. However, further efforts are still required to enhance the intelligence and adaptability of TP-LfD for dynamic tasks. With this aim, this paper presents an improved TP-LfD (iTP-LfD) approach to program cobots adaptively for a variety of industrial tasks. iTP-LfD comprises of three main improvements over other developed TP-LfD approaches: 1) detecting generic visual features for frames of reference (frames) in demonstrations for path reproduction in new settings without using complex computer vision algorithms, 2) minimising redundant frames that belong to the same object in demonstrations using a statistical algorithm, and 3) designing a reinforcement learning algorithm to eliminate irrelevant frames. The distinguishing characteristic of the iTP-LfD approach is that optimal frames are identified from demonstrations by simplifying computational complexity, overcoming occlusions in new settings, and boosting the overall performance. Case studies for a variety of industrial tasks involving different objects and scenarios highlight the adaptability and robustness of the iTP-LfD approach. 相似文献

14.

Hierarchical and parameterized learning of pick-and-place manipulation from under-specified human demonstrations

Kun Qian Huan Liu Jaime Valls Miro Xingshuo Jing Bo Zhou 《Advanced Robotics》2020,34(13):858-872

相似文献

15.

移动机器人的确定学习与控制

周勇王聪顾武军曾玮《控制理论与应用》2012,29(1):119-124

利用确定学习, 提出了移动机器人的学习控制策略. 在闭环控制过程中, 该控制器可以学习到未知控制系统的动态, 并将学到的动态作为经验知识以常值网络权值的形式储存. 在下次重复相同的控制任务时, 控制器可以调用以往所学到的动态知识用于控制并获得更好的控制性能. 该策略避免了耗时的神经网络重新训练过程, 使得移动机器人具有真正意义上的从经历中获取知识, 存储知识, 并将学到的知识再利用的智能控制能力. 相似文献

16.

Reinforcement learning based on movement primitives for contact tasks

《Robotics and Computer》2020

Recently, robot learning through deep reinforcement learning has incorporated various robot tasks through deep neural networks, without using specific control or recognition algorithms. However, this learning method is difficult to apply to the contact tasks of a robot, due to the exertion of excessive force from the random search process of reinforcement learning. Therefore, when applying reinforcement learning to contact tasks, solving the contact problem using an existing force controller is necessary. A neural-network-based movement primitive (NNMP) that generates a continuous trajectory which can be transmitted to the force controller and learned through a deep deterministic policy gradient (DDPG) algorithm is proposed for this study. In addition, an imitation learning algorithm suitable for NNMP is proposed such that the trajectories similar to the demonstration trajectory are stably generated. The performance of the proposed algorithms was verified using a square peg-in-hole assembly task with a tolerance of 0.1 mm. The results confirm that the complicated assembly trajectory can be learned stably through NNMP by the proposed imitation learning algorithm, and that the assembly trajectory is improved by learning the proposed NNMP through the DDPG algorithm. 相似文献

17.

Costs and benefits of behavioral specialization

Arne Brutschy Nam-Luc Tran Nadir Baiboun Marco Frison Giovanni Pini Andrea Roli Marco Dorigo Mauro Birattari 《Robotics and Autonomous Systems》2012,60(11):1408-1420

In this work, we study behavioral specialization in a swarm of autonomous robots. In the studied swarm, robots have to carry out tasks of different types that appear stochastically in time and space in a given environment. We consider a setting in which a robot working repeatedly on tasks of the same type improves its performance on them due to learning. Robots can exploit learning by adapting their task selection behavior, that is, by selecting with higher probability tasks of the type on which they have improved their performance. This adaptation of behavior is called behavioral specialization. We employ a simple task allocation strategy that allows a swarm of robots to behaviorally specialize. We study the influence of different environmental parameters on the performance of the swarm and show that the swarm can exploit learning successfully. However, there is a trade-off between the benefits and the costs of specialization. We study this trade-off in multiple experiments using different swarm sizes. Our experimental results indicate that spatiality has a major influence on the costs and benefits of specialization. 相似文献

18.

Hybrid autonomous control for multi mobile robots

《Advanced Robotics》2013,27(1):83-99

Reinforcement learning can be an adaptive and flexible control method for autonomous system. It does not need a priori knowledge; behaviors to accomplish given tasks are obtained automatically by repeating trial and error. However, with increasing complexity of the system, the learning costs are increased exponentially. Thus, application to complex systems, like a many redundant d.o.f. robot and multi-agent system, is very difficult. In the previous works in this field, applications were restricted to simple robots and small multi-agent systems, and because of restricted functions of the simple systems that have less redundancy, effectiveness of reinforcement learning is restricted. In our previous works, we had taken these problems into consideration and had proposed new reinforcement learning algorithm, 'Q-learning with dynamic structuring of exploration space based on GA (QDSEGA)'. Effectiveness of QDSEGA for redundant robots has been demonstrated using a 12-legged robot and a 50-link manipulator. However, previous works on QDSEGA were restricted to redundant robots and it was impossible to apply it to multi mobile robots. In this paper, we extend our previous work on QDSEGA by combining a rule-based distributed control and propose a hybrid autonomous control method for multi mobile robots. To demonstrate the effectiveness of the proposed method, simulations of a transportation task by 10 mobile robots are carried out. As a result, effective behaviors have been obtained. 相似文献

19.

Probabilistic Policy Reuse for inter-task transfer learning

Fernando Fernández Javier García Manuela Veloso 《Robotics and Autonomous Systems》2010,58(7):866-871

Policy Reuse is a reinforcement learning technique that efficiently learns a new policy by using past similar learned policies. The Policy Reuse learner improves its exploration by probabilistically including the exploitation of those past policies. Policy Reuse was introduced, and its effectiveness was previously demonstrated, in problems with different reward functions in the same state and action spaces. In this article, we contribute Policy Reuse as transfer learning among different domains. We introduce extended Markov Decision Processes (MDPs) to include domains and tasks, where domains have different state and action spaces, and tasks are problems with different rewards within a domain. We show how Policy Reuse can be applied among domains by defining and using a mapping between their state and action spaces. We use several domains, as versions of a simulated RoboCup Keepaway problem, where we show that Policy Reuse can be used as a mechanism of transfer learning significantly outperforming a basic policy learner. 相似文献

20.

CONRO: Towards Deployable Robots with Inter-Robots Metamorphic Capabilities 总被引：2，自引：0，他引：2

Andres Castano Wei-Min Shen Peter Will 《Autonomous Robots》2000,8(3):309-324

Metamorphic robots are modular robots that can reconfigure their shape. Such capability is desirable in tasks such as earthquake search and rescue and battlefield surveillance and scouting, where robots must go through unexpected situations and obstacles and perform tasks that are difficult for fixed-shape robots. The capabilities of the robots are determined by the design specification of their modules. In this paper, we present the design specification of a CONRO module, a small, self-sufficient and relatively homogeneous module that can be connected to other modules to form complex robots. These robots have not only the capability of changing their shape (intra-robot metamorphing) but also can split into smaller robots or merge with other robots to create a single larger robot (inter-robot metamorphing), i.e., CONRO robots can alter their shape and their size. Thus, heterogeneous robot teams can be built with homogeneous components. Furthermore, the CONRO robots can separate the reconfiguration stage from the locomotion stage, allowing the selection of configuration-dependent gaits. The locomotion and automatic inter-module docking capabilities of such robots were tested using tethered prototypes that can be reconfigured manually. We conclude the paper discussing the future work needed to fully realize the construction of these robots. 相似文献