首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In this article, an optimal bipartite consensus control (OBCC) scheme is proposed for heterogeneous multiagent systems (MASs) with input delay by reinforcement learning (RL) algorithm. A directed signed graph is established to construct MASs with competitive and cooperative relationships, and model reduction method is developed to tackle input delay problem. Then, based on the Hamilton–Jacobi–Bellman (HJB) equation, policy iteration method is utilized to design the bipartite consensus controller, which consists of value function and optimal controller. Further, a distributed event-triggered function is proposed to increase control efficiency, which only requires information from its own agent and neighboring agents. Based on the input-to-state stability (ISS) function and Lyapunov function, sufficient conditions for the stability of MASs can be derived. Apart from that, RL algorithm is employed to solve the event-triggered OBCC problem in MASs, where critic neural networks (NNs) and actor NNs estimate value function and control policy, respectively. Finally, simulation results are given to validate the feasibility and efficiency of the proposed algorithm.  相似文献   

2.
In this paper, we introduce an online algorithm that uses integral reinforcement knowledge for learning the continuous‐time optimal control solution for nonlinear systems with infinite horizon costs and partial knowledge of the system dynamics. This algorithm is a data‐based approach to the solution of the Hamilton–Jacobi–Bellman equation, and it does not require explicit knowledge on the system's drift dynamics. A novel adaptive control algorithm is given that is based on policy iteration and implemented using an actor/critic structure having two adaptive approximator structures. Both actor and critic approximation networks are adapted simultaneously. A persistence of excitation condition is required to guarantee convergence of the critic to the actual optimal value function. Novel adaptive control tuning algorithms are given for both critic and actor networks, with extra terms in the actor tuning law being required to guarantee closed loop dynamical stability. The approximate convergence to the optimal controller is proven, and stability of the system is also guaranteed. Simulation examples support the theoretical result. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

3.
In this study, a finite-time online optimal controller was designed for a nonlinear wheeled mobile robotic system (WMRS) with inequality constraints, based on reinforcement learning (RL) neural networks. In addition, an extended cost function, obtained by introducing a penalty function to the original long-time cost function, was proposed to deal with the optimal control problem of the system with inequality constraints. A novel Hamilton-Jacobi-Bellman (HJB) equation containing the constraint conditions was defined to determine the optimal control input. Furthermore, two neural networks (NNs), a critic and an actor NN, were established to approximate the extended cost function and the optimal control input, respectively. The adaptation laws of the critic and actor NN were obtained with the gradient descent method. The semi-global practical finite-time stability (SGPFS) was proved using Lyapunov's stability theory. The tracking error converges to a small region near zero within the constraints in a finite period. Finally, the effectiveness of the proposed optimal controller was verified by a simulation based on a practical wheeled mobile robot model.  相似文献   

4.
The two‐player zero‐sum (ZS) game problem provides the solution to the bounded L2‐gain problem and so is important for robust control. However, its solution depends on solving a design Hamilton–Jacobi–Isaacs (HJI) equation, which is generally intractable for nonlinear systems. In this paper, we present an online adaptive learning algorithm based on policy iteration to solve the continuous‐time two‐player ZS game with infinite horizon cost for nonlinear systems with known dynamics. That is, the algorithm learns online in real time an approximate local solution to the game HJI equation. This method finds, in real time, suitable approximations of the optimal value and the saddle point feedback control policy and disturbance policy, while also guaranteeing closed‐loop stability. The adaptive algorithm is implemented as an actor/critic/disturbance structure that involves simultaneous continuous‐time adaptation of critic, actor, and disturbance neural networks. We call this online gaming algorithm ‘synchronous’ ZS game policy iteration. A persistence of excitation condition is shown to guarantee convergence of the critic to the actual optimal value function. Novel tuning algorithms are given for critic, actor, and disturbance networks. The convergence to the optimal saddle point solution is proven, and stability of the system is also guaranteed. Simulation examples show the effectiveness of the new algorithm in solving the HJI equation online for a linear system and a complex nonlinear system. Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   

5.
This paper is to develop a simplified optimized tracking control using reinforcement learning (RL) strategy for a class of nonlinear systems. Since the nonlinear control gain function is considered in the system modeling, it is challenging to extend the existing RL-based optimal methods to the tracking control. The main reasons are that these methods' algorithm are very complex; meanwhile, they also require to meet some strict conditions. Different with these exiting RL-based optimal methods that derive the actor and critic training laws from the square of Bellman residual error, which is a complex function consisting of multiple nonlinear terms, the proposed optimized scheme derives the two RL training laws from negative gradient of a simple positive function, so that the algorithm can be significantly simplified. Moreover, the actor and critic in RL are constructed by employing neural network (NN) to approximate the solution of Hamilton–Jacobi–Bellman (HJB) equation. Finally, the feasibility of the proposed method is demonstrated in accordance with both Lyapunov stability theory and simulation example.  相似文献   

6.
In this paper we discuss an online algorithm based on policy iteration for learning the continuous-time (CT) optimal control solution with infinite horizon cost for nonlinear systems with known dynamics. That is, the algorithm learns online in real-time the solution to the optimal control design HJ equation. This method finds in real-time suitable approximations of both the optimal cost and the optimal control policy, while also guaranteeing closed-loop stability. We present an online adaptive algorithm implemented as an actor/critic structure which involves simultaneous continuous-time adaptation of both actor and critic neural networks. We call this ‘synchronous’ policy iteration. A persistence of excitation condition is shown to guarantee convergence of the critic to the actual optimal value function. Novel tuning algorithms are given for both critic and actor networks, with extra nonstandard terms in the actor tuning law being required to guarantee closed-loop dynamical stability. The convergence to the optimal controller is proven, and the stability of the system is also guaranteed. Simulation examples show the effectiveness of the new algorithm.  相似文献   

7.
This paper investigates the cluster synchronisation problem for multi-agent non-zero sum differential game with partially unknown dynamics. The objective is to design a controller to achieve the cluster synchronisation and to ensure local optimality of the performance index. With the definition of cluster tracking error and the concept of Nash equilibrium in the multi-agent system (MAS), the previous problem can be transformed into the problem of solving the coupled Hamilton–Jacobi–Bellman (HJB) equations. To solve these HJB equations, a data-based policy iteration algorithm is proposed with an actor–critic neural network (NN) structure in the case of the MAS with partially unknown dynamics; the weights of NNs are updated with the system data rather than the complete knowledge of system dynamics and the residual errors are minimised using the least-square approach. A simulation example is provided to verify the effectiveness of the proposed approach.  相似文献   

8.
In this paper, an adaptive output feedback event-triggered optimal control algorithm is proposed for partially unknown constrained-input continuous-time nonlinear systems. First, a neural network observer is constructed to estimate unmeasurable state. Next, an event-triggered condition is established, and only when the event-triggered condition is violated will the event be triggered and the state be sampled. Then, an event-triggered-based synchronous integral reinforcement learning (ET-SIRL) control algorithm with critic-actor neural networks (NNs) architecture is proposed to solve the event-triggered Hamilton–Jacobi–Bellman equation under the established event-triggered condition. The critic and actor NNs are used to approximate cost function and optimal event-triggered optimal control law, respectively. Meanwhile, the event-triggered-based closed-loop system state and all the neural network weight estimation errors are uniformly ultimately bounded proved by Lyapunov stability theory, and there is no Zeno behavior. Finally, two numerical examples are presented to show the effectiveness of the proposed ET-SIRL control algorithm.  相似文献   

9.
In this paper, we propose an actor-critic neuro-control for a class of continuous-time nonlinear systems under nonlinear abrupt faults, which is combined with an adaptive fault diagnosis observer (AFDO). Together with its estimation laws, an AFDO scheme, which estimates the faults in real time, is designed based on Lyapunov analysis. Then, based on the designed AFDO, a fault tolerant actor- critic control scheme is proposed where the critic neural network (NN) is used to approximate the value function and the actor NN updates the fault tolerant policy based on the approximated value function in the critic NN. The weight update laws for critic NN and actor NN are designed using the gradient descent method. By Lyapunov analysis, we prove the uniform ultimately boundedness (UUB) of all the states, their estimation errors, and NN weights of the fault tolerant system under the unpredictable faults. Finally, we verify the effectiveness of the proposed method through numerical simulations.  相似文献   

10.
This article proposes three novel time-varying policy iteration algorithms for finite-horizon optimal control problem of continuous-time affine nonlinear systems. We first propose a model-based time-varying policy iteration algorithm. The method considers time-varying solutions to the Hamiltonian–Jacobi–Bellman equation for finite-horizon optimal control. Based on this algorithm, value function approximation is applied to the Bellman equation by establishing neural networks with time-varying weights. A novel update law for time-varying weights is put forward based on the idea of iterative learning control, which obtains optimal solutions more efficiently compared to previous works. Considering that system models may be unknown in real applications, we propose a partially model-free time-varying policy iteration algorithm that applies integral reinforcement learning to acquiring the time-varying value function. Moreover, analysis of convergence, stability, and optimality is provided for every algorithm. Finally, simulations for different cases are given to verify the convenience and effectiveness of the proposed algorithms.  相似文献   

11.
This paper proposes a novel finite-time optimal control method based on input–output data for unknown nonlinear systems using adaptive dynamic programming (ADP) algorithm. In this method, the single-hidden layer feed-forward network (SLFN) with extreme learning machine (ELM) is used to construct the data-based identifier of the unknown system dynamics. Based on the data-based identifier, the finite-time optimal control method is established by ADP algorithm. Two other SLFNs with ELM are used in ADP method to facilitate the implementation of the iterative algorithm, which aim to approximate the performance index function and the optimal control law at each iteration, respectively. A simulation example is provided to demonstrate the effectiveness of the proposed control scheme.  相似文献   

12.
In this paper we present an online adaptive control algorithm based on policy iteration reinforcement learning techniques to solve the continuous-time (CT) multi player non-zero-sum (NZS) game with infinite horizon for linear and nonlinear systems. NZS games allow for players to have a cooperative team component and an individual selfish component of strategy. The adaptive algorithm learns online the solution of coupled Riccati equations and coupled Hamilton–Jacobi equations for linear and nonlinear systems respectively. This adaptive control method finds in real-time approximations of the optimal value and the NZS Nash-equilibrium, while also guaranteeing closed-loop stability. The optimal-adaptive algorithm is implemented as a separate actor/critic parametric network approximator structure for every player, and involves simultaneous continuous-time adaptation of the actor/critic networks. A persistence of excitation condition is shown to guarantee convergence of every critic to the actual optimal value function for that player. A detailed mathematical analysis is done for 2-player NZS games. Novel tuning algorithms are given for the actor/critic networks. The convergence to the Nash equilibrium is proven and stability of the system is also guaranteed. This provides optimal adaptive control solutions for both non-zero-sum games and their special case, the zero-sum games. Simulation examples show the effectiveness of the new algorithm.  相似文献   

13.
This paper develops an online adaptive critic algorithm based on policy iteration for partially unknown nonlinear optimal control with infinite horizon cost function. In the proposed method, only a critic network is established, which eliminates the action network, to simplify its architecture. The online least squares support vector machine (LS‐SVM) is utilized to approximate the gradient of the associated cost function in the critic network by updating the input‐output data. Additionally, a data buffer memory is added to alleviate computational load. Finally, the feasibility of the online learning algorithm is demonstrated in simulation on two example systems.  相似文献   

14.
In this paper, an online optimal distributed learning algorithm is proposed to solve leader-synchronization problem of nonlinear multi-agent differential graphical games. Each player approximates its optimal control policy using a single-network approximate dynamic programming (ADP) where only one critic neural network (NN) is employed instead of typical actorcritic structure composed of two NNs. The proposed distributed weight tuning laws for critic NNs guarantee stability in the sense of uniform ultimate boundedness (UUB) and convergence of control policies to the Nash equilibrium. In this paper, by introducing novel distributed local operators in weight tuning laws, there is no more requirement for initial stabilizing control policies. Furthermore, the overall closed-loop system stability is guaranteed by Lyapunov stability analysis. Finally, Simulation results show the effectiveness of the proposed algorithm.   相似文献   

15.
针对非线性连续系统难以跟踪时变轨迹的问题,本文首先通过系统变换引入新的状态变量从而将非线性系统的最优跟踪问题转化为一般非线性时不变系统的最优控制问题,并基于近似动态规划算法(ADP)获得近似最优值函数与最优控制策略.为有效地实现该算法,本文利用评价网与执行网来估计值函数及相应的控制策略,并且在线更新二者.为了消除神经网络近似过程中产生的误差,本文在设计控制器时增加一个鲁棒项;并且通过Lyapunov稳定性定理来证明本文提出的控制策略可保证系统跟踪误差渐近收敛到零,同时也验证在较小的误差范围内,该控制策略能够接近于最优控制策略.最后给出两个时变跟踪轨迹实例来证明该方法的可行性与有效性.  相似文献   

16.
This paper proposes an online adaptive approximate solution for the infinite-horizon optimal tracking control problem of continuous-time nonlinear systems with unknown dynamics. The requirement of the complete knowledge of system dynamics is avoided by employing an adaptive identifier in conjunction with a novel adaptive law, such that the estimated identifier weights converge to a small neighborhood of their ideal values. An adaptive steady-state controller is developed to maintain the desired tracking performance at the steady-state, and an adaptive optimal controller is designed to stabilize the tracking error dynamics in an optimal manner. For this purpose, a critic neural network (NN) is utilized to approximate the optimal value function of the Hamilton-Jacobi-Bellman (HJB) equation, which is used in the construction of the optimal controller. The learning of two NNs, i.e., the identifier NN and the critic NN, is continuous and simultaneous by means of a novel adaptive law design methodology based on the parameter estimation error. Stability of the whole system consisting of the identifier NN, the critic NN and the optimal tracking control is guaranteed using Lyapunov theory; convergence to a near-optimal control law is proved. Simulation results exemplify the effectiveness of the proposed method.   相似文献   

17.
The Hamilton–Jacobi–Bellman (HJB) equation can be solved to obtain optimal closed-loop control policies for general nonlinear systems. As it is seldom possible to solve the HJB equation exactly for nonlinear systems, either analytically or numerically, methods to build approximate solutions through simulation based learning have been studied in various names like neurodynamic programming (NDP) and approximate dynamic programming (ADP). The aspect of learning connects these methods to reinforcement learning (RL), which also tries to learn optimal decision policies through trial-and-error based learning. This study develops a model-based RL method, which iteratively learns the solution to the HJB and its associated equations. We focus particularly on the control-affine system with a quadratic objective function and the finite horizon optimal control (FHOC) problem with time-varying reference trajectories. The HJB solutions for such systems involve time-varying value, costate, and policy functions subject to boundary conditions. To represent the time-varying HJB solution in high-dimensional state space in a general and efficient way, deep neural networks (DNNs) are employed. It is shown that the use of DNNs, compared to shallow neural networks (SNNs), can significantly improve the performance of a learned policy in the presence of uncertain initial state and state noise. Examples involving a batch chemical reactor and a one-dimensional diffusion-convection-reaction system are used to demonstrate this and other key aspects of the method.  相似文献   

18.
徐昕  沈栋  高岩青  王凯 《自动化学报》2012,38(5):673-687
基于马氏决策过程(Markov decision process, MDP)的动态系统学习控制是近年来一个涉及机器学习、控制理论和运筹学等多个学科的交叉研究方向, 其主要目标是实现系统在模型复杂或者不确定等条件下基于数据驱动的多阶段优化控制. 本文对基于MDP的动态系统学习控制理论、算法与应用的发展前沿进行综述,重点讨论增强学习(Reinforcement learning, RL)与近似动态规划(Approximate dynamic programming, ADP)理论与方法的研究进展,其中包括时域差值学习理论、求解连续状态与行为空间MDP的值函数逼近方法、 直接策略搜索与近似策略迭代、自适应评价设计算法等,最后对相关研究领域的应用及发展趋势进行分析和探讨.  相似文献   

19.
ABSTRACT

In this paper, we propose an identifier–critic-based approximate dynamic programming (ADP) structure to online solve H∞ control problem of nonlinear continuous-time systems without knowing precise system dynamics, where the actor neural network (NN) that has been widely used in the standard ADP learning structure is avoided. We first use an identifier NN to approximate the completely unknown nonlinear system dynamics and disturbances. Then, another critic NN is proposed to approximate the solution of the induced optimal equation. The H∞ control pair is obtained by using the proposed identifier–critic ADP structure. A recently developed adaptation algorithm is used to online directly estimate the unknown NN weights simultaneously, where the convergence to the optimal solution can be rigorously guaranteed, and the stability of the closed-loop system is analysed. Thus, this new ADP scheme can improve the computational efficiency of H∞ control implementation. Finally, simulation results confirm the effectiveness of the proposed methods.  相似文献   

20.
Kernel-based least squares policy iteration for reinforcement learning.   总被引:4,自引:0,他引:4  
In this paper, we present a kernel-based least squares policy iteration (KLSPI) algorithm for reinforcement learning (RL) in large or continuous state spaces, which can be used to realize adaptive feedback control of uncertain dynamic systems. By using KLSPI, near-optimal control policies can be obtained without much a priori knowledge on dynamic models of control plants. In KLSPI, Mercer kernels are used in the policy evaluation of a policy iteration process, where a new kernel-based least squares temporal-difference algorithm called KLSTD-Q is proposed for efficient policy evaluation. To keep the sparsity and improve the generalization ability of KLSTD-Q solutions, a kernel sparsification procedure based on approximate linear dependency (ALD) is performed. Compared to the previous works on approximate RL methods, KLSPI makes two progresses to eliminate the main difficulties of existing results. One is the better convergence and (near) optimality guarantee by using the KLSTD-Q algorithm for policy evaluation with high precision. The other is the automatic feature selection using the ALD-based kernel sparsification. Therefore, the KLSPI algorithm provides a general RL method with generalization performance and convergence guarantee for large-scale Markov decision problems (MDPs). Experimental results on a typical RL task for a stochastic chain problem demonstrate that KLSPI can consistently achieve better learning efficiency and policy quality than the previous least squares policy iteration (LSPI) algorithm. Furthermore, the KLSPI method was also evaluated on two nonlinear feedback control problems, including a ship heading control problem and the swing up control of a double-link underactuated pendulum called acrobot. Simulation results illustrate that the proposed method can optimize controller performance using little a priori information of uncertain dynamic systems. It is also demonstrated that KLSPI can be applied to online learning control by incorporating an initial controller to ensure online performance.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号