首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 437 毫秒
1.
In this paper, we introduce an online algorithm that uses integral reinforcement knowledge for learning the continuous‐time optimal control solution for nonlinear systems with infinite horizon costs and partial knowledge of the system dynamics. This algorithm is a data‐based approach to the solution of the Hamilton–Jacobi–Bellman equation, and it does not require explicit knowledge on the system's drift dynamics. A novel adaptive control algorithm is given that is based on policy iteration and implemented using an actor/critic structure having two adaptive approximator structures. Both actor and critic approximation networks are adapted simultaneously. A persistence of excitation condition is required to guarantee convergence of the critic to the actual optimal value function. Novel adaptive control tuning algorithms are given for both critic and actor networks, with extra terms in the actor tuning law being required to guarantee closed loop dynamical stability. The approximate convergence to the optimal controller is proven, and stability of the system is also guaranteed. Simulation examples support the theoretical result. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

2.
In this paper we present an online adaptive control algorithm based on policy iteration reinforcement learning techniques to solve the continuous-time (CT) multi player non-zero-sum (NZS) game with infinite horizon for linear and nonlinear systems. NZS games allow for players to have a cooperative team component and an individual selfish component of strategy. The adaptive algorithm learns online the solution of coupled Riccati equations and coupled Hamilton–Jacobi equations for linear and nonlinear systems respectively. This adaptive control method finds in real-time approximations of the optimal value and the NZS Nash-equilibrium, while also guaranteeing closed-loop stability. The optimal-adaptive algorithm is implemented as a separate actor/critic parametric network approximator structure for every player, and involves simultaneous continuous-time adaptation of the actor/critic networks. A persistence of excitation condition is shown to guarantee convergence of every critic to the actual optimal value function for that player. A detailed mathematical analysis is done for 2-player NZS games. Novel tuning algorithms are given for the actor/critic networks. The convergence to the Nash equilibrium is proven and stability of the system is also guaranteed. This provides optimal adaptive control solutions for both non-zero-sum games and their special case, the zero-sum games. Simulation examples show the effectiveness of the new algorithm.  相似文献   

3.
In this paper we discuss an online algorithm based on policy iteration for learning the continuous-time (CT) optimal control solution with infinite horizon cost for nonlinear systems with known dynamics. That is, the algorithm learns online in real-time the solution to the optimal control design HJ equation. This method finds in real-time suitable approximations of both the optimal cost and the optimal control policy, while also guaranteeing closed-loop stability. We present an online adaptive algorithm implemented as an actor/critic structure which involves simultaneous continuous-time adaptation of both actor and critic neural networks. We call this ‘synchronous’ policy iteration. A persistence of excitation condition is shown to guarantee convergence of the critic to the actual optimal value function. Novel tuning algorithms are given for both critic and actor networks, with extra nonstandard terms in the actor tuning law being required to guarantee closed-loop dynamical stability. The convergence to the optimal controller is proven, and the stability of the system is also guaranteed. Simulation examples show the effectiveness of the new algorithm.  相似文献   

4.
A new online iterative algorithm for solving the H control problem of continuous‐time Markovian jumping linear systems is developed. For comparison, an available offline iterative algorithm for converging to the solution of the H control problem is firstly proposed. Based on the offline iterative algorithm and a new online decoupling technique named subsystems transformation method, a set of linear subsystems, which implementation in parallel, are obtained. By means of the adaptive dynamic programming technique, the two‐player zero‐sum game with the coupled game algebraic Riccati equation is solved online thereafter. The convergence of the novel policy iteration algorithm is also established. At last, simulation results have illustrated the effectiveness and applicability of these two methods. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

5.
The main bottleneck for the application of H control theory on practical nonlinear systems is the need to solve the Hamilton–Jacobi–Isaacs (HJI) equation. The HJI equation is a nonlinear partial differential equation (PDE) that has proven to be impossible to solve analytically, even the approximate solution is still difficult to obtain. In this paper, we propose a simultaneous policy update algorithm (SPUA), in which the nonlinear HJI equation is solved by iteratively solving a sequence of Lyapunov function equations that are linear PDEs. By constructing a fixed point equation, the convergence of the SPUA is established rigorously by proving that it is essentially a Newton's iteration method for finding the fixed point. Subsequently, a computationally efficient SPUA (CESPUA) based on Galerkin's method, is developed to solve Lyapunov function equations in each iterative step of SPUA. The CESPUA is simple for implementation because only one iterative loop is included. Through the simulation studies on three examples, the results demonstrate that the proposed CESPUA is valid and efficient. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

6.
This paper develops a concurrent learning-based approximate dynamic programming (ADP) algorithm for solving the two-player zero-sum (ZS) game arising in H control of continuous-time (CT) systems with unknown nonlinear dynamics. First, the H control is formulated as a ZS game and then, an online algorithm is developed that learns the solution to the Hamilton-Jacobi-Isaacs (HJI) equation without using any knowledge on the system dynamics. This is achieved by using a neural network (NN) identifier to approximate the uncertain system dynamics. The algorithm is implemented on actor-critic-disturbance NN structure along with the NN identifier to approximate the optimal value function and the corresponding Nash solution of the game. All NNs are tuned at the same time. By using the idea of concurrent learning the need to check for the persistency of excitation condition is relaxed to simplified condition. The stability of the overall system is guaranteed and the convergence to the Nash solution of the game is shown. Simulation results show the effectiveness of the algorithm.  相似文献   

7.
An Hinfin suboptimal state feedback controller for constrained input systems is derived using the Hamilton-Jacobi-Isaacs (HJI) equation of a corresponding zero-sum game that uses a special quasi-norm to encode the constraints on the input. The unique saddle point in feedback strategy form is derived. Using policy iterations on both players, the HJI equation is broken into a sequence of differential equations linear in the cost for which closed-form solutions are easier to obtain. Policy iterations on the disturbance are shown to converge to the available storage function of the associated L2-gain dissipative dynamics. The resulting constrained optimal control feedback strategy has the largest domain of validity within which L2-performance for a given gamma is guaranteed  相似文献   

8.
In this paper, a new online model‐free adaptive dynamic programming algorithm is developed to solve the H control problem of the continuous‐time linear system with completely unknown system dynamics. Solving the game algebraic Riccati equation, commonly used in H state feedback control design, is often referred to as a two‐player differential game where one player tries to minimize the predefined performance index while the other tries to maximize it. Using data generated in real time along the system trajectories, this new method can solve online the game algebraic Riccati equation without requiring the full knowledge of system dynamics. A rigorous proof of convergence of the proposed algorithm is given. Finally, simulation studies on two examples demonstrate the effectiveness of the proposed method.  相似文献   

9.
《Automatica》2014,50(12):3281-3290
This paper addresses the model-free nonlinear optimal control problem based on data by introducing the reinforcement learning (RL) technique. It is known that the nonlinear optimal control problem relies on the solution of the Hamilton–Jacobi–Bellman (HJB) equation, which is a nonlinear partial differential equation that is generally impossible to be solved analytically. Even worse, most practical systems are too complicated to establish an accurate mathematical model. To overcome these difficulties, we propose a data-based approximate policy iteration (API) method by using real system data rather than a system model. Firstly, a model-free policy iteration algorithm is derived and its convergence is proved. The implementation of the algorithm is based on the actor–critic structure, where actor and critic neural networks (NNs) are employed to approximate the control policy and cost function, respectively. To update the weights of actor and critic NNs, a least-square approach is developed based on the method of weighted residuals. The data-based API is an off-policy RL method, where the “exploration” is improved by arbitrarily sampling data on the state and input domain. Finally, we test the data-based API control design method on a simple nonlinear system, and further apply it to a rotational/translational actuator system. The simulation results demonstrate the effectiveness of the proposed method.  相似文献   

10.
非线性H∞控制的粘性解及近似逼近分析   总被引:3,自引:0,他引:3  
讨论非线性(在鞍点条件成立时)H∞控制的(干扰抑制问题的)粘性解法.此方法基 于对策论和Hamilton-Jacobi-Isaacs(HJI)不等式.主要结果分三个方面.首先,是将所求的 关于HJI不等式的解推广到不可微的粘性解情形.随后,讨论了此情形下的H∞状态控制器 对被控系统的镇定问题.最后给出了求解该问题的近似逼近的理论依据和算法的初步讨论.  相似文献   

11.
This paper develops an online adaptive critic algorithm based on policy iteration for partially unknown nonlinear optimal control with infinite horizon cost function. In the proposed method, only a critic network is established, which eliminates the action network, to simplify its architecture. The online least squares support vector machine (LS‐SVM) is utilized to approximate the gradient of the associated cost function in the critic network by updating the input‐output data. Additionally, a data buffer memory is added to alleviate computational load. Finally, the feasibility of the online learning algorithm is demonstrated in simulation on two example systems.  相似文献   

12.
The H framework provides an efficient and systemic method for the design of controllers for both linear and nonlinear systems. In the nonlinear controller synthesis, however, the limitation of this method is usually associated with the existence of a solution to the Hamilton–Jacobi–Isaac (HJI) equation. In this paper, an innovative energy compensation‐based approach to the solution of the HJI equations is presented and compared with the existing methods relying on Taylor series expansion. This new approach provides an efficient methodology that ensures the existence of a solution to the HJI equation. Numerical application to spacecraft attitude control is presented to validate the developments. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

13.
In this paper, the H tracking control of linear discrete‐time systems is studied via reinforcement learning. By defining an improved value function, the tracking game algebraic Riccati equation with a discount factor is obtained, which is solved by iteration learning algorithms. In particular, Q‐learning based on value iteration is presented for H tracking control, which does not require the system model information and the initial allowable control policy. In addition, to improve the practicability of algorithm, the convergence analysis of proposed algorithm with a discount factor is given. Finally, the feasibility of proposed algorithms is verified by simulation examples.  相似文献   

14.
ABSTRACT

In this paper, we propose an identifier–critic-based approximate dynamic programming (ADP) structure to online solve H∞ control problem of nonlinear continuous-time systems without knowing precise system dynamics, where the actor neural network (NN) that has been widely used in the standard ADP learning structure is avoided. We first use an identifier NN to approximate the completely unknown nonlinear system dynamics and disturbances. Then, another critic NN is proposed to approximate the solution of the induced optimal equation. The H∞ control pair is obtained by using the proposed identifier–critic ADP structure. A recently developed adaptation algorithm is used to online directly estimate the unknown NN weights simultaneously, where the convergence to the optimal solution can be rigorously guaranteed, and the stability of the closed-loop system is analysed. Thus, this new ADP scheme can improve the computational efficiency of H∞ control implementation. Finally, simulation results confirm the effectiveness of the proposed methods.  相似文献   

15.
In this paper, we develop a novel event‐triggered robust control strategy for continuous‐time nonlinear systems with unmatched uncertainties. First, we build a relationship to show that the event‐triggered robust control can be obtained by solving an event‐triggered nonlinear optimal control problem of the auxiliary system. Then, within the framework of reinforcement learning, we propose an adaptive critic approach to solve the event‐triggered nonlinear optimal control problem. Unlike typical actor‐critic dual approximators used in reinforcement learning, we employ a unique critic approximator to derive the solution of the event‐triggered Hamilton‐Jacobi‐Bellman equation arising in the nonlinear optimal control problem. The critic approximator is updated via the gradient descent method, and the persistence of excitation condition is necessary. Meanwhile, under a newly proposed event‐triggering condition, we prove that the developed critic approximator update rule guarantees all signals in the auxiliary closed‐loop system to be uniformly ultimately bounded. Moreover, we demonstrate that the obtained event‐triggered optimal control can ensure the original system to be stable in the sense of uniform ultimate boundedness. Finally, a F‐16 aircraft plant and a nonlinear system are provided to validate the present event‐triggered robust control scheme.  相似文献   

16.
Cai  Yuliang  Zhang  Huaguang  Zhang  Kun  Liu  Chong 《Neural computing & applications》2020,32(13):8763-8781

In this paper, a novel online iterative scheme, based on fuzzy adaptive dynamic programming, is proposed for distributed optimal leader-following consensus of heterogeneous nonlinear multi-agent systems under directed communication graph. This scheme combines game theory, adaptive dynamic programming together with generalized fuzzy hyperbolic model (GFHM). Firstly, based on precompensation technique, an appropriate model transformation is proposed to convert the error system into augmented error system, and an exquisite performance index function is defined for this system. Secondly, on the basis of Hamilton–Jacobi–Bellman (HJB) equation, the optimal consensus control is designed and a novel policy iteration (PI) algorithm is put forward to learn the solutions of the HJB equation online. Here, the proposed PI algorithm is implemented on account of GFHMs. Compared with dual-network model including critic network and action network, the proposed scheme only requires critic network. Thirdly, the augmented consensus error of each agent and the weight estimation error of each GFHM are proved to be uniformly ultimately bounded, and the stability of our method has been verified. Finally, some numerical examples and application examples are conducted to demonstrate the effectiveness of the theoretical results.

  相似文献   

17.
In this article, a novel off‐policy cooperative game Q‐learning algorithm is proposed for achieving optimal tracking control of linear discrete‐time multiplayer systems suffering from exogenous dynamic disturbance. The key strategy, for the first time, is to integrate reinforcement learning, cooperative games with output regulation under the discrete‐time sampling framework for achieving data‐driven optimal tracking control and disturbance rejection. Without the information of state and input matrices of multiplayer systems, as well as the dynamics of exogenous disturbance and command generator, the coordination equilibrium solution and the steady‐state control laws are learned using data by a novel off‐policy Q‐learning approach, such that multiplayer systems have the capability of tolerating disturbance and follow the reference signal via the optimal approach. Moreover, the rigorous theoretical proofs of unbiasedness of coordination equilibrium solution and convergence of the proposed algorithm are presented. Simulation results are given to show the efficacy of the developed approach.  相似文献   

18.
A sufficient condition for a general nonlinear stochastic system to have L2?L gain less than or equal to a prescribed positive number is established in terms of a certain Hamilton–Jacobi inequality (HJI). Based on this criterion, the existence of an L2?L filter is given by a second‐order nonlinear HJI, and the filter matrices can be obtained by solving such an HJI. Copyright © 2011 John Wiley and Sons Asia Pte Ltd and Chinese Automatic Control Society  相似文献   

19.
We propose a novel event‐triggered optimal tracking control algorithm for nonlinear systems with an infinite horizon discounted cost. The problem is formulated by appropriately augmenting the system and the reference dynamics and then using ideas from reinforcement learning to provide a solution. Namely, a critic network is used to estimate the optimal cost while an actor network is used to approximate the optimal event‐triggered controller. Because the actor network updates only when an event occurs, we shall use a zero‐order hold along with appropriate tuning laws to encounter for this behavior. Because we have dynamics that evolve in continuous and discrete time, we write the closed‐loop system as an impulsive model and prove asymptotic stability of the equilibrium point and Zeno behavior exclusion. Simulation results of a helicopter, a one‐link rigid robot under gravitation field, and a controlled Van‐der‐Pol oscillator are presented to show the efficacy of the proposed approach. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

20.
This paper proposes an intermittent model‐free learning algorithm for linear time‐invariant systems, where the control policy and transmission decisions are co‐designed simultaneously while also being subjected to worst‐case disturbances. The control policy is designed by introducing an internal dynamical system to further reduce the transmission rate and provide bandwidth flexibility in cyber‐physical systems. Moreover, a Q‐learning algorithm with two actors and a single critic structure is developed to learn the optimal parameters of a Q‐function. It is shown by using an impulsive system approach that the closed‐loop system has an asymptotically stable equilibrium and that no Zeno behavior occurs. Furthermore, a qualitative performance analysis of the model‐free dynamic intermittent framework is given and shows the degree of suboptimality concerning the optimal continuous updated controller. Finally, a numerical simulation of an unknown system is carried out to highlight the efficacy of the proposed framework.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号