首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 156 毫秒
1.
In this paper, we investigate the use of hierarchical reinforcement learning (HRL) to speed up the acquisition of cooperative multi-agent tasks. We introduce a hierarchical multi-agent reinforcement learning (RL) framework, and propose a hierarchical multi-agent RL algorithm called Cooperative HRL. In this framework, agents are cooperative and homogeneous (use the same task decomposition). Learning is decentralized, with each agent learning three interrelated skills: how to perform each individual subtask, the order in which to carry them out, and how to coordinate with other agents. We define cooperative subtasks to be those subtasks in which coordination among agents significantly improves the performance of the overall task. Those levels of the hierarchy which include cooperative subtasks are called cooperation levels. A fundamental property of the proposed approach is that it allows agents to learn coordination faster by sharing information at the level of cooperative subtasks, rather than attempting to learn coordination at the level of primitive actions. We study the empirical performance of the Cooperative HRL algorithm using two testbeds: a simulated two-robot trash collection task, and a larger four-agent automated guided vehicle (AGV) scheduling problem. We compare the performance and speed of Cooperative HRL with other learning algorithms, as well as several well-known industrial AGV heuristics. We also address the issue of rational communication behavior among autonomous agents in this paper. The goal is for agents to learn both action and communication policies that together optimize the task given a communication cost. We extend the multi-agent HRL framework to include communication decisions and propose a cooperative multi-agent HRL algorithm called COM-Cooperative HRL. In this algorithm, we add a communication level to the hierarchical decomposition of the problem below each cooperation level. Before an agent makes a decision at a cooperative subtask, it decides if it is worthwhile to perform a communication action. A communication action has a certain cost and provides the agent with the actions selected by the other agents at a cooperation level. We demonstrate the efficiency of the COM-Cooperative HRL algorithm as well as the relation between the communication cost and the learned communication policy using a multi-agent taxi problem.  相似文献   

2.
This paper studies semiglobal and global state synchronization of homogeneous multiagent systems with partial‐state coupling (ie, agents are coupled through part of their states) via a static protocol. We consider 2 classes of agents, ie, G‐passive and G‐passifiable via input feedforward, which are subjected to input saturation. The proposed static protocol is purely decentralized, ie, without an additional channel for the exchange of controller states. For semiglobal synchronization, a static protocol is designed for an a priori given set of network graphs with a directed spanning tree. In other words, the static protocol only needs rough information on the network graph, ie, a lower bound for the real part and an upper bound for the modulus, of the nonzero eigenvalues of the corresponding Laplacian matrix. Whereas for global synchronization, only strongly connected and detailed balanced network graphs are considered. In this case, for G‐passive agents, the static protocol does not need any network information, whereas for G‐passifiable agents via input feedforward, the static protocol only needs an upper bound for the modulus of the eigenvalues of the corresponding Laplacian matrix.  相似文献   

3.
This paper investigates the decentralized output feedback control problem for Markovian jump interconnected systems with unknown interconnections and measurement errors. Different from some existing results, the global operation modes of all subsystems are not required to be completely accessible for the decentralized control system. A decentralized dynamic output feedback controller is constructed using neighboring mode information and local outputs, where the measurement errors between actual and measured outputs are considered. Subsequently, a new design method is developed such that the resultant closed‐loop system is stochastically stable and satisfying an L‐norm constraint. Sufficient conditions are formulated by linear matrix inequalities, and the controller gains are characterized in terms of the solution of a convex optimization problem. Finally, an example is given to illustrate the effectiveness of the proposed theoretical results.  相似文献   

4.
In this paper, we address the problem of agent loss in vehicle formations and sensor networks via two separate approaches: (1) perform a ‘self‐repair’ operation in the event of agent loss to recover desirable information architecture properties or (2) introduce robustness into the information architecture a priori such that agent loss does not destroy desirable properties. We model the information architecture as a graph G(V, E), where V is a set of vertices representing the agents and E is a set of edges representing information flow amongst the agents. We focus on two properties of the graph called rigidity and global rigidity, which are required for formation shape maintenance and sensor network self‐localization, respectively. For the self‐repair approach, we show that while previous results permit local repair involving only neighbours of the lost agent, the repair cannot always be implemented using only local information. We present new results that can be applied to make the local repair using only local information. We describe implementation and illustrate with algorithms and examples. For the robustness approach, we investigate the structure of graphs with the property that rigidity or global rigidity is preserved after removing any single vertex (we call the property as 2‐vertex‐rigidity or 2‐vertex‐global‐rigidity, respectively). Information architectures with such properties would allow formation shape maintenance or self‐localization to be performed even in the event of agent failure. We review a characterization of a class of 2‐vertex‐rigidity and develop a separate class, making significant strides towards a complete characterization. We also present a characterization of a class of 2‐vertex‐global‐rigidity. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

5.
This paper presents a Lie group setting for the problem of control of formations, as a natural outcome of the analysis of a planar two-vehicle formation control law. The vehicle trajectories are described using the planar Frenet–Serret equations of motion, which capture the evolution of both the vehicle position and orientation for unit-speed motion subject to curvature (steering) control. The set of all possible (relative) equilibria for arbitrary G-invariant curvature controls is described (where G=SE(2) is a symmetry group for the control law), and a global convergence result for the two-vehicle control law is proved. An n-vehicle generalization of the two-vehicle control law is also presented, and the corresponding (relative) equilibria for the n-vehicle problem are characterized. Work is on-going to discover stability and convergence results for the n-vehicle problem.  相似文献   

6.
In this paper, we consider decentralized state feedback control of discrete event systems with a (global) control specification given by a predicate. In this framework, instead of a global state feedback, each local state feedback controls a part of the system according to local information so that global behaviors satisfy the global control specification. We introduce the notion of n-observability of predicates, and present necessary and sufficient conditions for the existence of a decentralized state feedback which achieves the global control specification.  相似文献   

7.
We consider the problem of exploring an anonymous undirected graph using an oblivious robot. The studied exploration strategies are designed so that the next edge in the robot’s walk is chosen using only local information, and so that some local equity (fairness) criterion is satisfied for the adjacent undirected edges. Such strategies can be seen as an attempt to derandomize random walks, and are natural counterparts for undirected graphs of the rotor-router model for symmetric directed graphs. The first of the studied strategies, known as Oldest-First, always chooses the neighboring edge for which the most time has elapsed since its last traversal. Unlike in the case of symmetric directed graphs, we show that such a strategy in some cases leads to exponential cover time. We then consider another strategy called Least-Used-First which always uses adjacent edges which have been traversed the smallest number of times. We show that any Least-Used-First exploration covers a graph G = (V, E) of diameter D within time O(D|E|), and in the long run traverses all edges of G with the same frequency.  相似文献   

8.
Information flow and cooperative control of vehicle formations   总被引:25,自引:0,他引:25  
We consider the problem of cooperation among a collection of vehicles performing a shared task using intervehicle communication to coordinate their actions. Tools from algebraic graph theory prove useful in modeling the communication network and relating its topology to formation stability. We prove a Nyquist criterion that uses the eigenvalues of the graph Laplacian matrix to determine the effect of the communication topology on formation stability. We also propose a method for decentralized information exchange between vehicles. This approach realizes a dynamical system that supplies each vehicle with a common reference to be used for cooperative motion. We prove a separation principle that decomposes formation stability into two components: Stability of this is achieved information flow for the given graph and stability of an individual vehicle for the given controller. The information flow can thus be rendered highly robust to changes in the graph, enabling tight formation control despite limitations in intervehicle communication capability.  相似文献   

9.
In this paper, the past and current issues involved in the design of decentralized networked control systems are reviewed. The basic models of interconnected systems described as continuous-time linear time-invariant systems in the time domain serve as a framework for the inclusion of communication channels in the decentralized feedback loop. The I/O-oriented models and the interaction oriented models with disjoint subsystems and interactions are distinguished. The overview is focused on packet dropouts, transmission delays, and quantization effects which are included in the time-driven design of feedback loop components. Single- and multiple-packet transmissions are considered in this contents. The design of decentralized state feedback gain matrices with delayed feedback uses the methodology of sampled-data feedback design for continuous-time systems, while the decentralized H quantizer design is based on the static output controller. The Liapunov stability approach results in computationally efficient decentralized control design strategies described by using linear matrix inequalities.  相似文献   

10.
This article presents a novel control strategy based on predictor‐feedback delay compensation for multiagent systems to reach a prescribed target formation under unknown but bounded communication delays and switching communication topology. Both communication delays and network topology can be subjected to arbitrarily‐fast time variations. The key idea is to implement predictor‐feedback strategies using only relative measurements between agents expressed in each local agent's frame, with the aim to counteract the negative effect of time delays. Nevertheless, due to the decentralized nature of the control, the presence of time‐varying delays and switching communication topology, only partial delay compensation is possible. Despite this, we show that better performance can be achieved with our proposal with respect to nonpredictor control schemes by introducing a weighting factor for predictor‐feedback terms in the control law. Sufficient conditions based on Linear Matrix Inequalities for robust stability are also provided, which allow to easily design the controller parameters in order to maximize the speed of convergence. Finally, simulation results are provided to show the effectiveness of the proposed approach.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号