首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper firstly presents necessary and sufficient conditions for the solvability of discrete time, mean-field, stochastic linear-quadratic optimal control problems. Secondly, the optimal control within a class of linear feedback controls is investigated using a matrix dynamical optimization method. Thirdly, by introducing several sequences of bounded linear operators, the problem is formulated as an operator stochastic linear-quadratic optimal control problem. By the kernel-range decomposition representation of the expectation operator and its pseudo-inverse, the optimal control is derived using solutions to two algebraic Riccati difference equations. Finally, by completing the square, the two Riccati equations and the optimal control are also obtained.  相似文献   

2.
王涛  张化光 《控制与决策》2015,30(9):1674-1678

针对模型参数部分未知的随机线性连续时间系统, 通过策略迭代算法求解无限时间随机线性二次(LQ) 最优控制问题. 求解随机LQ最优控制问题等价于求随机代数Riccati 方程(SARE) 的解. 首先利用伊藤公式将随机微分方程转化为确定性方程, 通过策略迭代算法给出SARE 的解序列; 然后证明SARE 的解序列收敛到SARE 的解, 而且在迭代过程中系统是均方可镇定的; 最后通过仿真例子表明策略迭代算法的可行性.

  相似文献   

3.
《Automatica》2014,50(12):3281-3290
This paper addresses the model-free nonlinear optimal control problem based on data by introducing the reinforcement learning (RL) technique. It is known that the nonlinear optimal control problem relies on the solution of the Hamilton–Jacobi–Bellman (HJB) equation, which is a nonlinear partial differential equation that is generally impossible to be solved analytically. Even worse, most practical systems are too complicated to establish an accurate mathematical model. To overcome these difficulties, we propose a data-based approximate policy iteration (API) method by using real system data rather than a system model. Firstly, a model-free policy iteration algorithm is derived and its convergence is proved. The implementation of the algorithm is based on the actor–critic structure, where actor and critic neural networks (NNs) are employed to approximate the control policy and cost function, respectively. To update the weights of actor and critic NNs, a least-square approach is developed based on the method of weighted residuals. The data-based API is an off-policy RL method, where the “exploration” is improved by arbitrarily sampling data on the state and input domain. Finally, we test the data-based API control design method on a simple nonlinear system, and further apply it to a rotational/translational actuator system. The simulation results demonstrate the effectiveness of the proposed method.  相似文献   

4.
The purpose of this paper is to investigate the role that the so-called constrained generalized Riccati equation plays within the context of continuous-time singular linear–quadratic (LQ) optimal control. This equation has been defined following the analogy with the discrete-time setting. However, while in the discrete-time case the connections between this equation and the linear–quadratic optimal control problem has been thoroughly investigated, to date very little is known on these connections in the continuous-time setting. This note addresses this point. We show, in particular, that when the continuous-time constrained generalized Riccati equation admits a solution, the corresponding linear–quadratic problem admits an impulse-free optimal control. We also address the corresponding infinite-horizon LQ problem for which we establish a similar result under the additional constraint that there exists a control input for which the cost index is finite.  相似文献   

5.
In this study, we investigate a continuous-time infinite-horizon linear quadratic stochastic optimal control problem with multiplicative noise in control and state variables. Using the techniques of stochastic stability, exact observability, and stochastic approximation, a value iteration algorithm is developed to solve the corresponding generalized algebraic Riccati equation. Unlike the existing policy iteration algorithm, this algorithm does not rely on an initial stabilizing control. Further,...  相似文献   

6.
A new approach to study the indefinite stochastic linear quadratic (LQ) optimal control problems, which we called the “equivalent cost functional method”, is introduced by Yu (2013) in the setup of Hamiltonian system. On the other hand, another important issue along this research direction, is the possible state feedback representation of optimal control and the solvability of associated indefinite stochastic Riccati equations. As the response, this paper continues to develop the equivalent cost functional method by extending it to the Riccati equation setup. Our analysis is featured by its introduction of some equivalent cost functionals which enable us to have the bridge between the indefinite and positive-definite stochastic LQ problems. With such bridge, some solvability relation between the indefinite and positive-definite Riccati equations is further characterized. It is remarkable the solvability of the former is rather complicated than the latter, hence our relation provides some alternative but useful viewpoint. Consequently, the corresponding indefinite linear quadratic problem is discussed for which the unique optimal control is derived in terms of state feedback via the solution of the Riccati equation. In addition, some example is studied using our theoretical results.  相似文献   

7.
8.
《国际计算机数学杂志》2012,89(14):3311-3327
In this article, singular optimal control for stochastic linear singular system with quadratic performance is obtained using ant colony programming (ACP). To obtain the optimal control, the solution of matrix Riccati differential equation is computed by solving differential algebraic equation using a novel and nontraditional ACP approach. The obtained solution in this method is equivalent or very close to the exact solution of the problem. Accuracy of the solution computed by the ACP approach to the problem is qualitatively better. The solution of this novel method is compared with the traditional Runge Kutta method. An illustrative numerical example is presented for the proposed method.  相似文献   

9.
This article proposes three novel time-varying policy iteration algorithms for finite-horizon optimal control problem of continuous-time affine nonlinear systems. We first propose a model-based time-varying policy iteration algorithm. The method considers time-varying solutions to the Hamiltonian–Jacobi–Bellman equation for finite-horizon optimal control. Based on this algorithm, value function approximation is applied to the Bellman equation by establishing neural networks with time-varying weights. A novel update law for time-varying weights is put forward based on the idea of iterative learning control, which obtains optimal solutions more efficiently compared to previous works. Considering that system models may be unknown in real applications, we propose a partially model-free time-varying policy iteration algorithm that applies integral reinforcement learning to acquiring the time-varying value function. Moreover, analysis of convergence, stability, and optimality is provided for every algorithm. Finally, simulations for different cases are given to verify the convenience and effectiveness of the proposed algorithms.  相似文献   

10.
基于拉格朗日方程,把多关节机器人机械臂动力学模型转化成一线性状态方程。然后,针对此线性状态方程,通过解一线性二次型优化问题,得到鲁棒最优控制律,保证了关节变量全局渐近收敛。最后,以两关节机器人为例,仿真结果表明所设计的控制律的有效性和鲁棒性。  相似文献   

11.
This paper mathematically analyzes the integral generalized policy iteration (I-GPI) algorithms applied to a class of continuous-time linear quadratic regulation (LQR) problems with the unknown system matrix AA. GPI is the general idea of interacting policy evaluation and policy improvement steps of policy iteration (PI), for computing the optimal policy. We first introduce the update horizon ??, and then show that (i) all of the I-GPI methods with the same ?? can be considered equivalent and that (ii) the value function approximated in the policy evaluation step monotonically converges to the exact one as ?→∞?. This reveals the relation between the computational complexity and the update (or time) horizon of I-GPI as well as between I-PI and I-GPI in the limit ?→∞?. We also provide and discuss two modes of convergence of I-GPI; I-GPI behaves like PI in one mode, and in the other mode, it performs like value iteration for discrete-time LQR and infinitesimal GPI (?→0?0). From these results, a new classification of the integral reinforcement learning is formed with respect to ??. Two matrix inequality conditions for stability, the region of local monotone convergence, and data-driven (adaptive) implementation methods are also provided with detailed discussion. Numerical simulations are carried out for verification and further investigations.  相似文献   

12.
In this paper we propose a new scheme based on adaptive critics for finding online the state feedback, infinite horizon, optimal control solution of linear continuous-time systems using only partial knowledge regarding the system dynamics. In other words, the algorithm solves online an algebraic Riccati equation without knowing the internal dynamics model of the system. Being based on a policy iteration technique, the algorithm alternates between the policy evaluation and policy update steps until an update of the control policy will no longer improve the system performance. The result is a direct adaptive control algorithm which converges to the optimal control solution without using an explicit, a priori obtained, model of the system internal dynamics. The effectiveness of the algorithm is shown while finding the optimal-load-frequency controller for a power system.  相似文献   

13.
The optimization problems of Markov control processes (MCPs) with exact knowledge of system parameters, in the form of transition probabilities or infinitesimal transition rates, can be solved by using the concept of Markov performance potential which plays an important role in the sensitivity analysis of MCPs. In this paper, by using an equivalent infinitesimal generator, we first introduce a definition of discounted Poisson equations for semi-Markov control processes (SMCPs), which is similar to that for MCPs, and the performance potentials of SMCPs are defined as solution of the equation. Some related optimization techniques based on performance potentials for MCPs may be extended to the optimization of SMCPs if the system parameters are known with certainty. Unfortunately, exact values of the distributions of the sojourn times at some states or the transition probabilities of the embedded Markov chain for a large-scale SMCP are generally difficult or impossible to obtain, which leads to the uncertainty of the semi-Markov kernel, and thereby to the uncertainty of equivalent infinitesimal transition rates. Similar to the optimization of uncertain MCPs, a potential-based policy iteration method is proposed in this work to search for the optimal robust control policy for SMCPs with uncertain infinitesimal transition rates that are represented as compact sets. In addition, convergence of the algorithm is discussed.  相似文献   

14.
为了认识Butterworth最优控制的本质,揭开Butterworth最优传递函数与加权矩阵Q,R的相互关系,本文研究Butterworth最优控制的逆问题.首先用Butterworth最优控制确定状态反馈增益阵K,然后给出计算加权矩阵Q的参数化公式,最后用一个例子说明这种确定加权矩阵Q,R的方法的有效性和简便性.  相似文献   

15.
This paper discusses discrete-time stochastic linear quadratic (LQ) problem in the infinite horizon with state and control dependent noise, where the weighting matrices in the cost function are assumed to be indefinite. The problem gives rise to a generalized algebraic Riccati equation (GARE) that involves equality and inequality constraints. The well-posedness of the indefinite LQ problem is shown to be equivalent to the feasibility of a linear matrix inequality (LMI). Moreover, the existence of a stabilizing solution to the GARE is equivalent to the attainability of the LQ problem. All the optimal controls are obtained in terms of the solution to the GARE. Finally, we give an LMI -based approach to solve the GARE via a semidefinite programming.  相似文献   

16.
在系统模型参数未知的最优控制问题中, 策略迭代能否快速收敛到最优控制策略的关键在于值函数的估计. 为了提升值函数的估计精度以及收敛速度, 本文提出一种窗口长度自适应调整的策略迭代最优控制算法. 充分利用一段时间内的历史样本数据, 通过影响力函数构建窗口长度与值函数估计性能之间的定量关系, 根据数据窗口长度对估计性能影响力的不同, 实现窗口长度的自适应调整. 最后, 将本文所提方法应用到连续发酵过程, 结果表明, 本文所提方法能够加快最优控制策略的收敛, 克服参数变化或外部扰动对控制性能的影响, 从而提升控制精度.  相似文献   

17.
In this paper we discuss an online algorithm based on policy iteration for learning the continuous-time (CT) optimal control solution with infinite horizon cost for nonlinear systems with known dynamics. That is, the algorithm learns online in real-time the solution to the optimal control design HJ equation. This method finds in real-time suitable approximations of both the optimal cost and the optimal control policy, while also guaranteeing closed-loop stability. We present an online adaptive algorithm implemented as an actor/critic structure which involves simultaneous continuous-time adaptation of both actor and critic neural networks. We call this ‘synchronous’ policy iteration. A persistence of excitation condition is shown to guarantee convergence of the critic to the actual optimal value function. Novel tuning algorithms are given for both critic and actor networks, with extra nonstandard terms in the actor tuning law being required to guarantee closed-loop dynamical stability. The convergence to the optimal controller is proven, and the stability of the system is also guaranteed. Simulation examples show the effectiveness of the new algorithm.  相似文献   

18.
Consider a discrete-time nonlinear system with random disturbances appearing in the real plant and the output channel where the randomly perturbed output is measurable. An iterative procedure based on the linear quadratic Gaussian optimal control model is developed for solving the optimal control of this stochastic system. The optimal state estimate provided by Kalman filtering theory and the optimal control law obtained from the linear quadratic regulator problem are then integrated into the dynamic integrated system optimisation and parameter estimation algorithm. The iterative solutions of the optimal control problem for the model obtained converge to the solution of the original optimal control problem of the discrete-time nonlinear system, despite model-reality differences, when the convergence is achieved. An illustrative example is solved using the method proposed. The results obtained show the effectiveness of the algorithm proposed.  相似文献   

19.
A method is presented for solving the infinite time Hamilton-Jacobi-Bellman (HJB) equation for certain state-constrained stochastic problems. The HJB equation is reformulated as an eigenvalue problem, such that the principal eigenvalue corresponds to the expected cost per unit time, and the corresponding eigenfunction gives the value function (up to an additive constant) for the optimal control policy. The eigenvalue problem is linear and hence there are fast numerical methods available for finding the solution.  相似文献   

20.
本文讨论了无限时间自由终端随机最优调节器问题和其相应的广义代数R iccati方程解之间的关系.具体而言,本文证明了无限时间自由终端随机最优调节器对应着广义代数R iccati方程的最小非负解,该最小解的核空间等于随机系统的精确不能观子空间.另外本文指出了以往文献中关于广义代数R iccati方程最大解存在性的一个证明错误,并对错误进行了分析.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号