首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper studies the problem of finite‐time optimal formation control for second‐order multiagent systems in situations where the formation time and/or the cost function need to be considered. The finite‐time optimal formation control laws are proposed for the cases with or without a leader, respectively. For the case of control being constrained, the time optimal formation problem is considered and an algorithm is designed to derive a feasible solution for the problem concerned. Although the feasible solution may not be optimal, it can provide a lower bound for time for the formation problem with control constraints. Once the given formation time is lower than this bound, the control constraints cannot be ensured. Finally, some numerical examples are given to illustrate the effectiveness of the theoretical results.  相似文献   

2.
Ho  F.  Kamel  M. 《Machine Learning》1998,33(2-3):155-177
A central issue in the design of cooperative multiagent systems is how to coordinate the behavior of the agents to meet the goals of the designer. Traditionally, this had been accomplished by hand-coding the coordination strategies. However, this task is complex due to the interactions that can take place among agents. Recent work in the area has focused on how strategies can be learned. Yet, many of these systems suffer from convergence, complexity and performance problems. This paper presents a new approach for learning multiagent coordination strategies that addresses these issues. The effectiveness of the technique is demonstrated using a synthetic domain and the predator and prey pursuit problem.  相似文献   

3.
In this article, a novel off‐policy cooperative game Q‐learning algorithm is proposed for achieving optimal tracking control of linear discrete‐time multiplayer systems suffering from exogenous dynamic disturbance. The key strategy, for the first time, is to integrate reinforcement learning, cooperative games with output regulation under the discrete‐time sampling framework for achieving data‐driven optimal tracking control and disturbance rejection. Without the information of state and input matrices of multiplayer systems, as well as the dynamics of exogenous disturbance and command generator, the coordination equilibrium solution and the steady‐state control laws are learned using data by a novel off‐policy Q‐learning approach, such that multiplayer systems have the capability of tolerating disturbance and follow the reference signal via the optimal approach. Moreover, the rigorous theoretical proofs of unbiasedness of coordination equilibrium solution and convergence of the proposed algorithm are presented. Simulation results are given to show the efficacy of the developed approach.  相似文献   

4.
Multiagent Systems: A Survey from a Machine Learning Perspective   总被引:27,自引:0,他引:27  
Distributed Artificial Intelligence (DAI) has existed as a subfield of AI for less than two decades. DAI is concerned with systems that consist of multiple independent entities that interact in a domain. Traditionally, DAI has been divided into two sub-disciplines: Distributed Problem Solving (DPS) focuses on the information management aspects of systems with several components working together towards a common goal; Multiagent Systems (MAS) deals with behavior management in collections of several independent entities, or agents. This survey of MAS is intended to serve as an introduction to the field and as an organizational framework. A series of general multiagent scenarios are presented. For each scenario, the issues that arise are described along with a sampling of the techniques that exist to deal with them. The presented techniques are not exhaustive, but they highlight how multiagent systems can be and have been used to build complex systems. When options exist, the techniques presented are biased towards machine learning approaches. Additional opportunities for applying machine learning to MAS are highlighted and robotic soccer is presented as an appropriate test bed for MAS. This survey does not focus exclusively on robotic systems. However, we believe that much of the prior research in non-robotic MAS is relevant to robotic MAS, and we explicitly discuss several robotic MAS, including all of those presented in this issue.  相似文献   

5.
Attack optimization is an important issue in securing cyber‐physical systems. This paper investigates how an attacker should schedule its denial‐of‐service attacks to degrade the robust performance of a closed‐loop system. The measurements of system states are transmitted to a remote controller over a multichannel network. With limited resources, the attacker only has the capacity to jam sparse channels and to decide which channels should be attacked. Under an framework, a data‐based optimal attack strategy that uses Q‐learning is proposed to maximize the effect on the closed‐loop system. The Q‐learning algorithm can adaptively learn the optimal attack using data sniffed over the wireless network without requiring a priori knowledge of system parameters. Simulation results sustain the performance of the proposed attack scenario.  相似文献   

6.
We discuss the solution of complex multistage decision problems using methods that are based on the idea of policy iteration(PI),i.e.,start from some base policy and generate an improved policy.Rollout is the simplest method of this type,where just one improved policy is generated.We can view PI as repeated application of rollout,where the rollout policy at each iteration serves as the base policy for the next iteration.In contrast with PI,rollout has a robustness property:it can be applied on-line and is suitable for on-line replanning.Moreover,rollout can use as base policy one of the policies produced by PI,thereby improving on that policy.This is the type of scheme underlying the prominently successful Alpha Zero chess program.In this paper we focus on rollout and PI-like methods for problems where the control consists of multiple components each selected(conceptually)by a separate agent.This is the class of multiagent problems where the agents have a shared objective function,and a shared and perfect state information.Based on a problem reformulation that trades off control space complexity with state space complexity,we develop an approach,whereby at every stage,the agents sequentially(one-at-a-time)execute a local rollout algorithm that uses a base policy,together with some coordinating information from the other agents.The amount of total computation required at every stage grows linearly with the number of agents.By contrast,in the standard rollout algorithm,the amount of total computation grows exponentially with the number of agents.Despite the dramatic reduction in required computation,we show that our multiagent rollout algorithm has the fundamental cost improvement property of standard rollout:it guarantees an improved performance relative to the base policy.We also discuss autonomous multiagent rollout schemes that allow the agents to make decisions autonomously through the use of precomputed signaling information,which is sufficient to maintain the cost improvement property,without any on-line coordination of control selection between the agents.For discounted and other infinite horizon problems,we also consider exact and approximate PI algorithms involving a new type of one-agent-at-a-time policy improvement operation.For one of our PI algorithms,we prove convergence to an agentby-agent optimal policy,thus establishing a connection with the theory of teams.For another PI algorithm,which is executed over a more complex state space,we prove convergence to an optimal policy.Approximate forms of these algorithms are also given,based on the use of policy and value neural networks.These PI algorithms,in both their exact and their approximate form are strictly off-line methods,but they can be used to provide a base policy for use in an on-line multiagent rollout scheme.  相似文献   

7.
The emergence of networked control systems urges the digital control design to integrate communication constraints efficiently. In order to accommodate this requirement, this paper investigates the joint design of tracking problem for multi‐agent system (MAS) in the presence of resource‐limited communication channel and quantization. An event‐triggered robust learning control with quantization is firstly proposed and employed for MAS in this paper. The new event‐triggered distributed robust learning control system with the introduction of logarithmic quantization guarantees the asymptotic tracking property on the finite interval. Convergence analysis is given based on the Lyapunov direct method. Finally, numerical simulations are given to illustrate the efficacy of the event‐triggered approach compared with time‐triggered controllers.  相似文献   

8.
In this paper, we propose a model‐free algorithm for global stabilization of linear systems subject to actuator saturation. The idea of gain‐scheduled low gain feedback is applied to develop control laws that avoid saturation and achieve global stabilization. To design these control laws, we employ the framework of parameterized algebraic Riccati equations (AREs). Reinforcement learning techniques are developed to find the solution of the parameterized ARE without requiring any knowledge of the system dynamics. In particular, we present an iterative Q‐learning scheme that searches for a low gain parameter and iteratively solves the parameterized ARE using the Bellman equation. Both state feedback and output feedback algorithms are developed. It is shown that the proposed scheme achieves model‐free global stabilization under bounded controls and convergence to the optimal solution of the ARE is achieved. Simulation results are presented that confirm the effectiveness of the proposed method.  相似文献   

9.
Conventional Q‐learning requires pre‐defined quantized state space and action space. It is not practical for real robot applications since discrete and finite numbers of action sets cannot precisely identify the variances in the different positions on the same state element on which the robot is located. In this paper, a Q‐Learning composed continuous action generator, called the fuzzy cerebellar model articulation controller (FCMAC) method, is presented to solve the problem. The FCMAC displays continuous action generation by linear combination of the weighting distribution of the state space where the optimal policy of each state is derived from Q‐learning. This provides better resolution of the weighting distribution for the state space where the robot is located. The algorithm not only solves the single‐agent problem but also solves the multi‐agent problem by extension. An experiment is implemented in a task where two robots are taking action independently and both are connected with a straight bar. Their goal is to cooperate with each other to pass through a gate in the middle of a grid environment.  相似文献   

10.
多智能体控制系统的设计与实现   总被引:18,自引:0,他引:18  
本文提出了利用多智能体技术实现控制系统的思想,分析了该系统的结构,采用黑板结构及产生式规则表达方法,对多智能体控制系统进行了设计,并对系统的设计做了进一步的探讨。  相似文献   

11.
The iterative learning control (ILC) is considered for the Hammerstein‐Wiener (HW) system, which is a cascading system consisting of a static nonlinearity followed by a linear stochastic system and then a static nonlinearity. Except the structure, the system is unknown, but the system output is observed with additive noise. Both the linear and nonlinear parts of the system may be time‐varying. The optimal control sequence under the tracking performance is first characterized, which, is however, unavailable since the system is unknown. By using the observations on system output the ILC is generated by a Kiefer‐Wolfowitz (KW) algorithm with randomized differences, which aims at minimizing the tracking error. It is proved that ILC converges to the optimal one with probability one and the resulting tracking error tends to its minimal value. Copyright © 2011 John Wiley and Sons Asia Pte Ltd and Chinese Automatic Control Society  相似文献   

12.
This paper proposes an intermittent model‐free learning algorithm for linear time‐invariant systems, where the control policy and transmission decisions are co‐designed simultaneously while also being subjected to worst‐case disturbances. The control policy is designed by introducing an internal dynamical system to further reduce the transmission rate and provide bandwidth flexibility in cyber‐physical systems. Moreover, a Q‐learning algorithm with two actors and a single critic structure is developed to learn the optimal parameters of a Q‐function. It is shown by using an impulsive system approach that the closed‐loop system has an asymptotically stable equilibrium and that no Zeno behavior occurs. Furthermore, a qualitative performance analysis of the model‐free dynamic intermittent framework is given and shows the degree of suboptimality concerning the optimal continuous updated controller. Finally, a numerical simulation of an unknown system is carried out to highlight the efficacy of the proposed framework.  相似文献   

13.
This paper presents a nonlinear iterative learning control (NILC) for nonlinear time‐varying systems. An algorithm of a new strategy for the NILC implementation is proposed. This algorithm ensures that trajectory‐tracking errors of the proposed NILC, when implemented, are bounded by a given error norm bound. A special feature of the algorithm is that the trial‐time interval is finite but not fixed as it is for the other iterative learning algorithms. A sufficient condition for convergence and robustness of the bounded‐error learning procedure is derived. With respect to the bounded‐error and standard learning processes applied to a virtual robot, simulation results are presented in order to verify maximal tracking errors, convergence and applicability of the proposed learning control.  相似文献   

14.
This work focuses on the iterative learning control (ILC) for linear discrete‐time systems with unknown initial state and disturbances. First, multiple high‐order internal models (HOIMs) are introduced for the reference, initial state, and disturbances. Both the initial state and disturbance consist of two components, one strictly satisfies HOIM and the other is random bounded. Then, an ILC scheme is constructed according to an augmented HOIM that is the aggregation of all HOIMs. For all known HOIMs, an ILC design criterion is introduced to achieve satisfactory tracking performance based on the 2‐D theory. Next, the case with unknown HOIMs is discussed, where a time‐frequency‐analysis (TFA)‐based ILC algorithm is proposed. In this situation, it is shown that the tracking error inherits the unknown augmented HOIM that is an aggregation of all unknown HOIMs. Then, a TFA‐based method, e.g., the short‐time Fourier transformation (STFT), is employed to identify the unknown augmented HOIM, where the STFT could ignore the effect of the random bounded initial state and disturbances. A new ILC law is designed for the identified unknown augmented HOIM, which has the ability to reject the unknown the initial state and disturbances that strictly satisfy HOIMs. Finally, a gantry robot system with iteration‐invariant or slowly‐varying frequencies is given to illustrate the efficiency of the proposed TFA‐based ILC algorithm.  相似文献   

15.
In this paper, we consider the problem of leader synchronization in systems with interacting agents in large networks while simultaneously satisfying energy‐related user‐defined distributed optimization criteria. But modeling in large networks is very difficult, and for that reason, we derive a model‐free formulation that is based on a separate distributed Q‐learning function for every agent. Every Q‐function is a parametrization of each agent's control, of the neighborhood controls, and of the neighborhood tracking error. It is also evident that none of the agents has any information on where the leader is connected to and from where she spreads the desired information. The proposed algorithm uses an integral reinforcement learning approach with a separate distributed actor/critic network for each agent: a critic approximator to approximate each value function and an actor approximator to approximate each optimal control law. The derived tuning laws for each actor and critic approximators are designed appropriately by using gradient descent laws. We provide rigorous stability and convergence proofs to show that the closed‐loop system has an asymptotically stable equilibrium point and that the control policies form a graphical Nash equilibrium. We demonstrate the effectiveness of the proposed method on a network consisting of 10 agents. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

16.
This paper investigates limited-budget consensus design and analysis problems of general high-order multiagent systems with intermittent communications and switching topologies. The main contribution of this paper is that the trade-off design between the energy consumption and the consensus performance can be realized while achieving leaderless or leader-following consensus, under constraints of limited budgets and intermittent communications. Firstly, a new intermittent limited-budget consensus control protocol with a practical trade-off design index is proposed, where the total budget of the whole multiagent system is limited. Then, leaderless limited-budget consensus design and analysis criteria are derived, in which the matrix variables of linear matrix inequalities are determined according to the total budget and the practical trade-off design parameters. Meanwhile, an explicit formulation of the consensus function is derived to describe the consensus state trajectory of the whole system. Moreover, a new two-stage transformation strategy is utilized for leader-following cases, by which the dynamics decomposition of leaderless and leader-following cases can be converted into a unified framework, and sufficient conditions of the leader-following limited-budget consensus design and analysis are determined via those of the leaderless cases. Finally, numerical simulations are given to illustrate theoretical results.   相似文献   

17.
This paper discusses first‐ and second‐order fractional‐order PID‐type iterative learning control strategies for a class of Caputo‐type fractional‐order linear time‐invariant system. First, the additivity of the fractional‐order derivative operators is exploited by the property of Laplace transform of the convolution integral, whilst the absolute convergence of the Mittag‐Leffler function on the infinite time interval is induced and some properties of the state transmit function of the fractional‐order system are achieved via the Gamma and Bata function characteristics. Second, by using the above properties and the generalized Young inequality of the convolution integral, the monotone convergence of the developed first‐order learning strategy is analyzed and the monotone convergence of the second‐order learning scheme is derived after finite iterations, when the tracking errors are assessed in the form of the Lebesgue‐p norm. The resultant convergences exhibit that not only the fractional‐order system input and output matrices and the fractional‐order derivative learning gain, but also the system state matrix and the proportional learning gain, and fractional‐order integral learning gain dominate the convergence. Numerical simulations illustrate the validity and the effectiveness of the results.  相似文献   

18.
In this paper, the finite‐time consensus tracking problem is investigated for second‐order multi‐agent systems. A novel distributed consensus algorithm based on the sliding mode control (SMC) is designed, and the tracking time is estimated analytically.  相似文献   

19.
A new adaptive distributed controller is developed for the leader‐following consensus problem of multiple uncertain Euler‐Lagrange systems. A distinct feature of our proposed approach as opposed to the existing ones is that it does not need the exchange of controller's state among the communication network. As a consequence, it not only makes the implementation of the controller much easier but also reduces the communication cost. The effectiveness of the main result is demonstrated by some exemplary applications to cooperative control of multiple two‐link robot arms.  相似文献   

20.
Iterative Learning Control Using Adjoint Systems and Stable Inversion   总被引:1,自引:0,他引:1  
In this paper, we investigate iterative learning control (ILC) for non‐minimum phase systems from a novel viewpoint. For non‐minimum phase systems, the magnitude of a desiredinput obtained by ILC using forward‐time updating and Silverman's inversion are too large because of the influence of the unstable zeros. On the other hand, stable inversion constructs a bounded desired input by using non‐causal inverse for non‐minimum phase systems. In this paper, we first clarify that ILC using an adjoint system achieves the desired input defined by stable inversion. Hence, ILC using an adjoint system is an effective method for the control of non‐minimum phase systems with uncertainty. However, a useful convergence condition of ILC using an adjoint system was not achieved. Next, we develop a simple convergence condition in the frequency domain.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号