首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In this paper, we analyse the convergence and stability properties of generalised policy iteration (GPI) applied to discrete-time linear quadratic regulation problems. GPI is one kind of the generalised adaptive dynamic programming methods used for solving optimal control problems, and is composed of policy evaluation and policy improvement steps. To analyse the convergence and stability of GPI, the dynamic programming (DP) operator is defined. Then, GPI and its equivalent formulas are presented based on the notation of DP operator. The convergence of the approximate value function to the exact one in policy evaluation is proven based on the equivalent formulas. Furthermore, the positive semi-definiteness, stability, and the monotone convergence (PI-mode and VI-mode convergence) of GPI are presented under certain conditions on the initial value function. The online least square method is also presented for the implementation of GPI. Finally, some numerical simulations are carried out to verify the effectiveness of GPI as well as to further investigate the convergence and stability properties.  相似文献   

2.
In this paper, a novel iterative adaptive dynamic programming (ADP) algorithm, called generalised policy iteration ADP algorithm, is developed to solve optimal tracking control problems for discrete-time nonlinear systems. The idea is to use two iteration procedures, including an i-iteration and a j-iteration, to obtain the iterative tracking control laws and the iterative value functions. By system transformation, we first convert the optimal tracking control problem into an optimal regulation problem. Then the generalised policy iteration ADP algorithm, which is a general idea of interacting policy and value iteration algorithms, is introduced to deal with the optimal regulation problem. The convergence and optimality properties of the generalised policy iteration algorithm are analysed. Three neural networks are used to implement the developed algorithm. Finally, simulation examples are given to illustrate the performance of the present algorithm.  相似文献   

3.
This paper studies data-driven learning-based methods for the finite-horizon optimal control of linear time-varying discrete-time systems. First, a novel finite-horizon Policy Iteration (PI) method for linear time-varying discrete-time systems is presented. Its connections with existing infinite-horizon PI methods are discussed. Then, both data-driven off-policy PI and Value Iteration (VI) algorithms are derived to find approximate optimal controllers when the system dynamics is completely unknown. Under mild conditions, the proposed data-driven off-policy algorithms converge to the optimal solution. Finally, the effectiveness and feasibility of the developed methods are validated by a practical example of spacecraft attitude control.  相似文献   

4.
王涛  张化光 《控制与决策》2015,30(9):1674-1678

针对模型参数部分未知的随机线性连续时间系统, 通过策略迭代算法求解无限时间随机线性二次(LQ) 最优控制问题. 求解随机LQ最优控制问题等价于求随机代数Riccati 方程(SARE) 的解. 首先利用伊藤公式将随机微分方程转化为确定性方程, 通过策略迭代算法给出SARE 的解序列; 然后证明SARE 的解序列收敛到SARE 的解, 而且在迭代过程中系统是均方可镇定的; 最后通过仿真例子表明策略迭代算法的可行性.

  相似文献   

5.
Xinmin  Huanshui  Lihua   《Automatica》2009,45(9):2067-2073
This paper considers the stochastic LQR problem for systems with input delay and stochastic parameter uncertainties in the state and input matrices. The problem is known to be difficult due to the presence of interactions among the delayed input channels and the stochastic parameter uncertainties in the channels. The key to our approach is to convert the LQR control problem into an optimization one in a Hilbert space for an associated backward stochastic model and then obtain the optimal solution to the stochastic LQR problem by exploiting the dynamic programming approach. Our solution is given in terms of two generalized Riccati difference equations (RDEs) of the same dimension as that of the plant.  相似文献   

6.
This paper mathematically analyzes the integral generalized policy iteration (I-GPI) algorithms applied to a class of continuous-time linear quadratic regulation (LQR) problems with the unknown system matrix AA. GPI is the general idea of interacting policy evaluation and policy improvement steps of policy iteration (PI), for computing the optimal policy. We first introduce the update horizon ??, and then show that (i) all of the I-GPI methods with the same ?? can be considered equivalent and that (ii) the value function approximated in the policy evaluation step monotonically converges to the exact one as ?→∞?. This reveals the relation between the computational complexity and the update (or time) horizon of I-GPI as well as between I-PI and I-GPI in the limit ?→∞?. We also provide and discuss two modes of convergence of I-GPI; I-GPI behaves like PI in one mode, and in the other mode, it performs like value iteration for discrete-time LQR and infinitesimal GPI (?→0?0). From these results, a new classification of the integral reinforcement learning is formed with respect to ??. Two matrix inequality conditions for stability, the region of local monotone convergence, and data-driven (adaptive) implementation methods are also provided with detailed discussion. Numerical simulations are carried out for verification and further investigations.  相似文献   

7.
8.
We investigate the optimization of linear impulse systems with the reinforcement learning based adaptive dynamic programming (ADP) method. For linear impulse systems, the optimal objective function is shown to be a quadric form of the pre-impulse states. The ADP method provides solutions that iteratively converge to the optimal objective function. If an initial guess of the pre-impulse objective function is selected as a quadratic form of the pre-impulse states, the objective function iteratively converges to the optimal one through ADP. Though direct use of the quadratic objective function of the states within the ADP method is theoretically possible, the numerical singularity problem may occur due to the matrix inversion therein when the system dimensionality increases. A neural network based ADP method can circumvent this problem. A neural network with polynomial activation functions is selected to approximate the pr~impulse objective function and trained iteratively using the ADP method to achieve optimal control. After a successful training, optimal impulse control can be derived. Simulations are presented for illustrative purposes.  相似文献   

9.
S. Sen  S. J. Yakowitz   《Automatica》1987,23(6):749-752
We develop a quasi-Newton differential dynamic programming algorithm (QDDP) for discrete-time optimal control problems. In the spirit of dynamic programming, the quasi-Newton approximations are performed in a stagewise manner. We establish the global convergence of the method and also show a superlinear convergence rate. Among other advantages of the QDDP method, second derivatives need not be calculated. In theory, the computational effort of each recursion grows proportionally to the number of stages N, whereas with conventional quasi-Newton techniques which do not take advantage of the optimal control problem structure, the growth is as N2. Computational results are also reported.  相似文献   

10.
In this article, an adaptive critic scheme with a novel performance index function is developed to solve the tracking control problem, which eliminates the tracking error and possesses the adjustable convergence rate in the offline learning process. Under some conditions, the convergence and monotonicity of the accelerated value function sequence can be guaranteed. Combining the advantages of the adjustable and general value iteration schemes, an integrated algorithm is proposed with a fast guaranteed convergence, which involves two stages, namely the acceleration stage and the convergence stage. Moreover, an effective approach is given to adaptively determine the acceleration interval. With this operation, the fast convergence of the new value iteration scheme can be fully utilized. Finally, compared with the general value iteration, the numerical results are presented to verify the fast convergence and the tracking performance of the developed adaptive critic design.  相似文献   

11.
林小峰  丁强 《控制与决策》2015,30(3):495-499
为了求解有限时域最优控制问题,自适应动态规划(ADP)算法要求受控系统能一步控制到零。针对不能一步控制到零的非线性系统,提出一种改进的ADP算法,其初始代价函数由任意的有限时间容许序列构造。推导了算法的迭代过程并证明了算法的收敛性。当考虑评价网络的近似误差并满足假设条件时,迭代代价函数将收敛到最优代价函数的有界邻域。仿真例子验证了所提出方法的有效性。  相似文献   

12.
We consider the revenue management problem of capacity control under customer choice behavior. An exact solution of the underlying stochastic dynamic program is difficult because of the multi-dimensional state space and, thus, approximate dynamic programming (ADP) techniques are widely used. The key idea of ADP is to encode the multi-dimensional state space by a small number of basis functions, often leading to a parametric approximation of the dynamic program’s value function. In general, two classes of ADP techniques for learning value function approximations exist: mathematical programming and simulation. So far, the literature on capacity control largely focuses on the first class.In this paper, we develop a least squares approximate policy iteration (API) approach which belongs to the second class. Thereby, we suggest value function approximations that are linear in the parameters, and we estimate the parameters via linear least squares regression. Exploiting both exact and heuristic knowledge from the value function, we enforce structural constraints on the parameters to facilitate learning a good policy. We perform an extensive simulation study to investigate the performance of our approach. The results show that it is able to obtain competitive revenues compared to and often outperforms state-of-the-art capacity control methods in reasonable computational time. Depending on the scarcity of capacity and the point in time, revenue improvements of around 1% or more can be observed. Furthermore, the proposed approach contributes to simulation-based ADP, bringing forth research on numerically estimating piecewise linear value function approximations and their application in revenue management environments.  相似文献   

13.
In this paper, a novel theoretic formulation based on adaptive dynamic programming (ADP) is developed to solve online the optimal tracking problem of the continuous-time linear system with unknown dynamics. First, the original system dynamics and the reference trajectory dynamics are transformed into an augmented system. Then, under the same performance index with the original system dynamics, an augmented algebraic Riccati equation is derived. Furthermore, the solutions for the optimal control problem of the augmented system are proven to be equal to the standard solutions for the optimal tracking problem of the original system dynamics. Moreover, a new online algorithm based on the ADP technique is presented to solve the optimal tracking problem of the linear system with unknown system dynamics. Finally, simulation results are given to verify the effectiveness of the theoretic results.  相似文献   

14.
针对移动装弹机械臂系统非线性、强耦合、受多种不确定因素影响的问题,本文基于自适应动态规划方法,提出了仅包含评价网络结构的轨迹跟踪控制方法,有效减小了系统跟踪误差.首先,考虑到系统非线性特性、变量间强耦合作用及重力因素的影响,通过拉格朗日方程建立了移动装弹机械臂的动力学模型.其次,针对系统存在不确定性上界未知的问题,建立...  相似文献   

15.
We consider an inverse linear programming (LP) problem in which the parameters in both the objective function and the constraint set of a given LP problem need to be adjusted as little as possible so that a known feasible solution becomes the optimal one. We formulate this problem as a linear complementarity constrained minimization problem. With the help of the smoothed Fischer–Burmeister function, we propose a perturbation approach to solve the inverse problem and demonstrate its global convergence. An inexact Newton method is constructed to solve the perturbed problem and numerical results are reported to show the effectiveness of the approach.  相似文献   

16.
We consider the problem of asymptotic rejection of exogenous harmonic inputs having unknown amplitudes, phases, and frequencies on the output for a class of uncertain and nonminimum‐phase linear systems. Special emphasis is given to the case in which the controlled system has multiple zeros at the origin. It is shown how the method recently proposed to design internal models by means of regression arguments, combined with control strategies based on the redesign of the zero dynamics of the system through redefinition of the output, can be successfully used to solve the problem in presence of plant parameter uncertainties.Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

17.
In this paper we propose a new scheme based on adaptive critics for finding online the state feedback, infinite horizon, optimal control solution of linear continuous-time systems using only partial knowledge regarding the system dynamics. In other words, the algorithm solves online an algebraic Riccati equation without knowing the internal dynamics model of the system. Being based on a policy iteration technique, the algorithm alternates between the policy evaluation and policy update steps until an update of the control policy will no longer improve the system performance. The result is a direct adaptive control algorithm which converges to the optimal control solution without using an explicit, a priori obtained, model of the system internal dynamics. The effectiveness of the algorithm is shown while finding the optimal-load-frequency controller for a power system.  相似文献   

18.
In this paper, we consider a class of infinite-horizon discrete-time linear quadratic N-person games, in which one of the players lacks complete information about the game. With the assumptions of a perfect state information pattern and steady state feedback strategies, we convert the original game problem into a multivariable adaptive control problem by making use of the concept of fictitious play and the scheme of adaptive control. For the proposed adjustment procedure, we prove that each element of the estimates converges to its corresponding true value under the condition of persistent excitation. We also carry out a sensitivity analysis of performance indices with respect to the embedded unknowns by using multiple models.  相似文献   

19.
针对基本果蝇优化算法在寻优过程中种群多样性降低导致算法易陷入早熟收敛的问题,提出了基于序列二次规划(SQP)局部搜索的多子群果蝇优化算法(MFOA-SQP)。新算法将果蝇种群均匀划分为多个子群,并引入粒子群算法中的惯性权重和学习因子,协同调节果蝇移动方向和步长;每隔一定迭代次数重新划分子群,避免种群单一化,使算法更易跳出局部最优;对子群最优个体进行SQP搜索,提高局部寻优性能。通过6个测试函数和优化广义回归神经网络对银行客户进行分类的实验结果表明,算法在寻优精度和速度方面性能优越,能够有效提高广义回归神经网络的分类准确率。  相似文献   

20.
针对一类典型的带有控制约束的非线性离散时间系统,提出了一种基于自适应动态规划(adaptive dynamic programmmg,ADP)算法的多设定值跟踪控制方法,并对其收敛性和稳定性做了严格分析.在ADP迭代跟踪控制的基础上,根据多模型控制的思想,设置阶梯状的参考轨迹,使得系统状态逐渐地跟踪到最终设定值,保证了系统的稳定性,极大地减小超调量,加快了响应时间,改善控制品质;同时由于控制器约束的存在,引入非二次型的性能指标函数,使得控制量始终在有界的范围内变化.最后对仿真结果进行了分析,结果表明了此方法的可行性和有效性.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号