首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
This paper mathematically analyzes the integral generalized policy iteration (I-GPI) algorithms applied to a class of continuous-time linear quadratic regulation (LQR) problems with the unknown system matrix AA. GPI is the general idea of interacting policy evaluation and policy improvement steps of policy iteration (PI), for computing the optimal policy. We first introduce the update horizon ??, and then show that (i) all of the I-GPI methods with the same ?? can be considered equivalent and that (ii) the value function approximated in the policy evaluation step monotonically converges to the exact one as ?→∞?. This reveals the relation between the computational complexity and the update (or time) horizon of I-GPI as well as between I-PI and I-GPI in the limit ?→∞?. We also provide and discuss two modes of convergence of I-GPI; I-GPI behaves like PI in one mode, and in the other mode, it performs like value iteration for discrete-time LQR and infinitesimal GPI (?→0?0). From these results, a new classification of the integral reinforcement learning is formed with respect to ??. Two matrix inequality conditions for stability, the region of local monotone convergence, and data-driven (adaptive) implementation methods are also provided with detailed discussion. Numerical simulations are carried out for verification and further investigations.  相似文献   

2.
Several researchers have recently investigated the connection between reinforcement learning and classification. We are motivated by proposals of approximate policy iteration schemes without value functions, which focus on policy representation using classifiers and address policy learning as a supervised learning problem. This paper proposes variants of an improved policy iteration scheme which addresses the core sampling problem in evaluating a policy through simulation as a multi-armed bandit machine. The resulting algorithm offers comparable performance to the previous algorithm achieved, however, with significantly less computational effort. An order of magnitude improvement is demonstrated experimentally in two standard reinforcement learning domains: inverted pendulum and mountain-car.  相似文献   

3.
In this article, a novel iteration algorithm named two-stage approximate dynamic programming (TSADP) is proposed to seek the solution of nonlinear switched optimal control problem. At each iteration of TSADP, a multivariate optimal control problem is transformed to be a certain number of univariate optimal control problems. It is shown that the value function at each iteration can be characterised pointwisely by a set of smooth functions recursively obtained from TSADP, and the associated control policy, continuous control and switching control law included, is explicitly provided in a state-feedback form. Moreover, the convergence and optimality of TSADP is strictly proven. To implement this algorithm efficiently, neural networks, critic and action networks, are utilised to approximate the value function and continuous control law, respectively. Thus, the value function is expressed by the weights of critic networks pointwise. Besides, redundant weights are ruled out at each iteration to simplify the exponentially increasing computation burden. Finally, a simulation example is provided to demonstrate its effectiveness.  相似文献   

4.
Adaptive Optimal Control (AOC) by reinforcement synthesis is proposed to facilitate the application of optimal control theory in feedback controls. Reinforcement synthesis uses the critic–actor architecture of reinforcement learning to carry out sequential optimization. Optimality conditions for AOC are formulated using the discrete minimum principle. A proof of the convergence conditions for the reinforcement synthesis algorithm is presented. As the final time extends to infinity, the reinforcement synthesis algorithm is equivalent to the Dual Heuristic dynamic Programming (DHP) algorithm, a version of approximate dynamic programming. Thus, formulating DHP with the AOC approach has rigorous proofs of optimality and convergence. The efficacy of AOC by reinforcement synthesis is demonstrated by solving a linear quadratic regulator problem.  相似文献   

5.
The singular optimal control problem for asymptotic stabilisation has been extensively studied in the literature. In this paper, the optimal singular control problem is extended to address a weaker version of closed-loop stability, namely, semistability, which is of paramount importance for consensus control of network dynamical systems. Three approaches are presented to address the nonlinear semistable singular control problem. Namely, a singular perturbation method is presented to construct a state-feedback singular controller that guarantees closed-loop semistability for nonlinear systems. In this approach, we show that for a non-negative cost-to-go function the minimum cost of a nonlinear semistabilising singular controller is lower than the minimum cost of a singular controller that guarantees asymptotic stability of the closed-loop system. In the second approach, we solve the nonlinear semistable singular control problem by using the cost-to-go function to cancel the singularities in the corresponding Hamilton–Jacobi–Bellman equation. For this case, we show that the minimum value of the singular performance measure is zero. Finally, we provide a framework based on the concepts of state-feedback linearisation and feedback equivalence to solve the singular control problem for semistabilisation of nonlinear dynamical systems. For this approach, we also show that the minimum value of the singular performance measure is zero. Three numerical examples are presented to demonstrate the efficacy of the proposed singular semistabilisation frameworks.  相似文献   

6.
An online adaptive optimal control is proposed for continuous-time nonlinear systems with completely unknown dynamics, which is achieved by developing a novel identifier-critic-based approximate dynamic programming algorithm with a dual neural network (NN) approximation structure. First, an adaptive NN identifier is designed to obviate the requirement of complete knowledge of system dynamics, and a critic NN is employed to approximate the optimal value function. Then, the optimal control law is computed based on the information from the identifier NN and the critic NN, so that the actor NN is not needed. In particular, a novel adaptive law design method with the parameter estimation error is proposed to online update the weights of both identifier NN and critic NN simultaneously, which converge to small neighbourhoods around their ideal values. The closed-loop system stability and the convergence to small vicinity around the optimal solution are all proved by means of the Lyapunov theory. The proposed adaptation algorithm is also improved to achieve finite-time convergence of the NN weights. Finally, simulation results are provided to exemplify the efficacy of the proposed methods.  相似文献   

7.
In this paper, a novel iterative adaptive dynamic programming (ADP) algorithm, called generalised policy iteration ADP algorithm, is developed to solve optimal tracking control problems for discrete-time nonlinear systems. The idea is to use two iteration procedures, including an i-iteration and a j-iteration, to obtain the iterative tracking control laws and the iterative value functions. By system transformation, we first convert the optimal tracking control problem into an optimal regulation problem. Then the generalised policy iteration ADP algorithm, which is a general idea of interacting policy and value iteration algorithms, is introduced to deal with the optimal regulation problem. The convergence and optimality properties of the generalised policy iteration algorithm are analysed. Three neural networks are used to implement the developed algorithm. Finally, simulation examples are given to illustrate the performance of the present algorithm.  相似文献   

8.
In this paper we propose a new scheme based on adaptive critics for finding online the state feedback, infinite horizon, optimal control solution of linear continuous-time systems using only partial knowledge regarding the system dynamics. In other words, the algorithm solves online an algebraic Riccati equation without knowing the internal dynamics model of the system. Being based on a policy iteration technique, the algorithm alternates between the policy evaluation and policy update steps until an update of the control policy will no longer improve the system performance. The result is a direct adaptive control algorithm which converges to the optimal control solution without using an explicit, a priori obtained, model of the system internal dynamics. The effectiveness of the algorithm is shown while finding the optimal-load-frequency controller for a power system.  相似文献   

9.
The optimization problems of Markov control processes (MCPs) with exact knowledge of system parameters, in the form of transition probabilities or infinitesimal transition rates, can be solved by using the concept of Markov performance potential which plays an important role in the sensitivity analysis of MCPs. In this paper, by using an equivalent infinitesimal generator, we first introduce a definition of discounted Poisson equations for semi-Markov control processes (SMCPs), which is similar to that for MCPs, and the performance potentials of SMCPs are defined as solution of the equation. Some related optimization techniques based on performance potentials for MCPs may be extended to the optimization of SMCPs if the system parameters are known with certainty. Unfortunately, exact values of the distributions of the sojourn times at some states or the transition probabilities of the embedded Markov chain for a large-scale SMCP are generally difficult or impossible to obtain, which leads to the uncertainty of the semi-Markov kernel, and thereby to the uncertainty of equivalent infinitesimal transition rates. Similar to the optimization of uncertain MCPs, a potential-based policy iteration method is proposed in this work to search for the optimal robust control policy for SMCPs with uncertain infinitesimal transition rates that are represented as compact sets. In addition, convergence of the algorithm is discussed.  相似文献   

10.
In this paper, we consider a two-player stochastic differential game problem over an infinite time horizon where the players invoke controller and stopper strategies on a nonlinear stochastic differential game problem driven by Brownian motion. The optimal strategies for the two players are given explicitly by exploiting connections between stochastic Lyapunov stability theory and stochastic Hamilton–Jacobi–Isaacs theory. In particular, we show that asymptotic stability in probability of the differential game problem is guaranteed by means of a Lyapunov function which can clearly be seen to be the solution to the steady-state form of the stochastic Hamilton–Jacobi–Isaacs equation, and hence, guaranteeing both stochastic stability and optimality of the closed-loop control and stopper policies. In addition, we develop optimal feedback controller and stopper policies for affine nonlinear systems using an inverse optimality framework tailored to the stochastic differential game problem. These results are then used to provide extensions of the linear feedback controller and stopper policies obtained in the literature to nonlinear feedback controllers and stoppers that minimise and maximise general polynomial and multilinear performance criteria.  相似文献   

11.
《国际计算机数学杂志》2012,89(16):2259-2273
In this paper, a novel hybrid method based on two approaches, evolutionary algorithms and an iterative scheme, for obtaining the approximate solution of optimal control governed by nonlinear Fredholm integral equations is presented. By converting the problem to a discretized form, it is considered as a quasi-assignment problem and then an iterative method is applied to find an approximate solution for the discretized form of the integral equation. An analysis for convergence of the proposed iterative method and its implementation for numerical examples are also given.  相似文献   

12.
    
In this paper, a controlled stochastic delay heat equation with Neumann boundary-noise and boundary-control is considered. The existence and uniqueness of the mild solution for the associated Hamilton–Jacobi–Bellman equations are obtained by means of the backward stochastic differential equations, which is applied to the optimal control problem.  相似文献   

13.
In this paper, fixed-final time optimal control laws using neural networks and HJB equations for general affine in the input nonlinear systems are proposed. The method utilizes Kronecker matrix methods along with neural network approximation over a compact set to solve a time-varying HJB equation. The result is a neural network feedback controller that has time-varying coefficients found by a priori offline tuning. Convergence results are shown. The results of this paper are demonstrated on an example.  相似文献   

14.
T.  S. S.  C. C. 《Automatica》2000,36(12)
This paper focuses on adaptive control of strict-feedback nonlinear systems using multilayer neural networks (MNNs). By introducing a modified Lyapunov function, a smooth and singularity-free adaptive controller is firstly designed for a first-order plant. Then, an extension is made to high-order nonlinear systems using neural network approximation and adaptive backstepping techniques. The developed control scheme guarantees the uniform ultimate boundedness of the closed-loop adaptive systems. In addition, the relationship between the transient performance and the design parameters is explicitly given to guide the tuning of the controller. One important feature of the proposed NN controller is the highly structural property which makes it particularly suitable for parallel processing in actual implementation. Simulation studies are included to illustrate the effectiveness of the proposed approach.  相似文献   

15.
This paper presents a numerical solution for solving a nonlinear 2-D optimal control problem (2DOP). The performance index of a nonlinear 2DOP is described with a state and a control function. Furthermore, dynamic constraint of the system is given by a classical diffusion equation. It is preferred to use the Ritz method for finding the numerical solution of the problem. The method is based upon the Legendre polynomial basis. By using this method, the given optimisation nonlinear 2DOP reduces to the problem of solving a system of algebraic equations. The benefit of the method is that it provides greater flexibility in which the given initial and boundary conditions of the problem are imposed. Moreover, compared with the eigenfunction method, the satisfactory results are obtained only in a small number of polynomials order. This numerical approach is applicable and effective for such a kind of nonlinear 2DOP. The convergence of the method is extensively discussed and finally two illustrative examples are included to observe the validity and applicability of the new technique developed in the current work.  相似文献   

16.
In this paper, an integral reinforcement learning (IRL) algorithm on an actor–critic structure is developed to learn online the solution to the Hamilton–Jacobi–Bellman equation for partially-unknown constrained-input systems. The technique of experience replay is used to update the critic weights to solve an IRL Bellman equation. This means, unlike existing reinforcement learning algorithms, recorded past experiences are used concurrently with current data for adaptation of the critic weights. It is shown that using this technique, instead of the traditional persistence of excitation condition which is often difficult or impossible to verify online, an easy-to-check condition on the richness of the recorded data is sufficient to guarantee convergence to a near-optimal control law. Stability of the proposed feedback control law is shown and the effectiveness of the proposed method is illustrated with simulation examples.  相似文献   

17.
In this paper, a robust optimal control problem during feedback disruption is considered for a class of nonlinear systems which have been controlled by an observer-based output feedback controller. It is shown that during feedback disruption, there exists an optimal control input which keeps both system states and observer errors within a specified bound for the longest time. Then, it is shown that such an optimal control input can be practically implemented by using a bang-bang control input in terms of control performance. One numerical and one practical examples are given for clear illustration.  相似文献   

18.
In this paper, a finite-horizon neuro-optimal tracking control strategy for a class of discrete-time nonlinear systems is proposed. Through system transformation, the optimal tracking problem is converted into designing a finite-horizon optimal regulator for the tracking error dynamics. Then, with convergence analysis in terms of cost function and control law, the iterative adaptive dynamic programming (ADP) algorithm via heuristic dynamic programming (HDP) technique is introduced to obtain the finite-horizon optimal tracking controller which makes the cost function close to its optimal value within an ?-error bound. Three neural networks are used as parametric structures to implement the algorithm, which aims at approximating the cost function, the control law, and the error dynamics, respectively. Two simulation examples are included to complement the theoretical discussions.  相似文献   

19.
Tapani  Matti 《Neurocomputing》2009,72(16-18):3704
This paper studies the identification and model predictive control in nonlinear hidden state-space models. Nonlinearities are modelled with neural networks and system identification is done with variational Bayesian learning. In addition to the robustness of control, the stochastic approach allows for various control schemes, including combinations of direct and indirect controls, as well as using probabilistic inference for control. We study the noise-robustness, speed, and accuracy of three different control schemes as well as the effect of changing horizon lengths and initialisation methods using a simulated cart–pole system. The simulations indicate that the proposed method is able to find a representation of the system state that makes control easier especially under high noise.  相似文献   

20.
This article presents an approximated scalar sign function-based digital design methodology to develop an optimal anti-windup digital controller for analogue nonlinear systems with input constraints. The approximated scalar sign function, a mathematically smooth nonlinear function, is utilised to represent the constrained input functions, which are often expressed by mathematically non-smooth nonlinear functions. Then, an optimal linearisation technique is applied to the resulting nonlinear system (with smooth nonlinear input functions) for finding an optimal linear model, which has the exact dynamics of the original nonlinear system at the operating point of interest. This optimal linear model is used to design an optimal anti-windup LQR, and an iterative procedure is developed to systematically adjust the weighting matrices in the performance index as the actuator saturation occurs. Hence, the designed optimal anti-windup controller would lie within the desired saturation range. In addition, the designed optimal analogue controller is digitally implemented using the prediction-based digital redesign technique for the effective digital control of stable and unstable multivariable nonlinear systems with input constraints.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号