首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Using Pontryagin’s maximum principle, the problem of the quickest transfer of a multidimensional object onto the surface of an ellipsoid is reduced to solving a scalar algebraic equation. The concentration of the endpoints of optimal trajectories in the vicinity of the points forming the boundary in the case of a degenerate ellipsoid is demonstrated. An example in which the optimal control has a jump and the Bellman function has a discontinuity when the magnitude of the initial velocity vector undergoes a small change is constructed. It is also shown that the jump in the optimal control can occur without the discontinuity of the Bellman function.  相似文献   

2.
The optimal control of deterministic discrete time-invariant automaton-type systems is considered. Changes in the system’s state are governed by a recurrence equation. The switching times and their order are not specified in advance. They are found by optimizing a functional that takes into account the cost of each switching. This problem is a generalization of the classical optimal control problem for discrete time-invariant systems. It is proved that, in the time-invariant case, switchings of the optimal trajectory (may be multiple instantaneous switchings) are possible only at the initial and (or) terminal points in time. This fact is used in the derivation of equations for finding the value (Hamilton–Jacobi–Bellman) function and its generators. The necessary and sufficient optimality conditions are proved. It is shown that the generators of the value function in linear–quadratic problems are quadratic, and the value function itself is piecewise quadratic. Algorithms for the synthesis of the optimal closed-loop control are developed. The application of the optimality conditions is demonstrated by examples.  相似文献   

3.
In a series of papers, we proved theorems characterizing the value function in exit time optimal control as the unique viscosity solution of the corresponding Bellman equation that satisfies appropriate side conditions. The results applied to problems which satisfy a positivity condition on the integral of the Lagrangian. This positive integral condition assigned a positive cost for remaining outside the target on any interval of positive length. In this note, we prove a new theorem which characterizes the exit time value function as the unique bounded-from-below viscosity solution of the Bellman equation that vanishes on the target. The theorem applies to problems satisfying an asymptotic condition on the trajectories, including cases where the positive integral condition is not satisfied. Our results are based on an extended version of “Barb lat's lemma”. We apply the theorem to variants of the Fuller problem and other examples where the Lagrangian is degenerate.  相似文献   

4.
Extremality conditions in the brachistochrone problem for a perfectly rigid body sliding without friction down an unknown (to be determined) curve in a vertical plane are found. In this case, the body has to track the tangent to the trajectory. According to the principle of constraint release, the reaction of the support creates a torque used as control. For dimensionless Appell’s equations with a single similarity coefficient, the standard problem of the fastest descent from a given initial point to a given terminal point assuming that the initial velocity is zero is formulated. The Okhotsimskii-Pontryagin method is used to analyze the differential of the objective function. Necessary optimality conditions are found, and a formula for the optimal control that does not involve adjoint variables is derived from them. Properties of the optimal trajectories are investigated analytically both in the general case and for the limiting (zero and infinite) values of the similarity coefficient. It is found that cycloid-shaped brachistochrones occur as the similarity coefficient tends to infinity. For some values of the similarity coefficient, numerical results are presented that demonstrate the shape of the corresponding brachistochrones and the optimal time of motion. The results are compared with those obtained by solving the classical brachistochrone problem.  相似文献   

5.
An approach to solve finite time horizon suboptimal feedback control problems for partial differential equations is proposed by solving dynamic programming equations on adaptive sparse grids. A semi-discrete optimal control problem is introduced and the feedback control is derived from the corresponding value function. The value function can be characterized as the solution of an evolutionary Hamilton–Jacobi Bellman (HJB) equation which is defined over a state space whose dimension is equal to the dimension of the underlying semi-discrete system. Besides a low dimensional semi-discretization it is important to solve the HJB equation efficiently to address the curse of dimensionality. We propose to apply a semi-Lagrangian scheme using spatially adaptive sparse grids. Sparse grids allow the discretization of the value functions in (higher) space dimensions since the curse of dimensionality of full grid methods arises to a much smaller extent. For additional efficiency an adaptive grid refinement procedure is explored. The approach is illustrated for the wave equation and an extension to equations of Schrödinger type is indicated. We present several numerical examples studying the effect the parameters characterizing the sparse grid have on the accuracy of the value function and the optimal trajectory.  相似文献   

6.
This paper is concerned with cost optimization of an insurance company. The surplus of the insurance company is modeled by a controlled regime-switching diffusion, in which the regime-switching mechanism provides the fluctuations of the random environment. The goal is to find an optimal control that minimizes the total cost up to a stochastic exit time. A weaker sufficient condition than that of Fleming and Soner (2006, Section V.2) for the continuity of the value function is obtained. Further, the value function is shown to be a viscosity solution of a Hamilton–Jacobi–Bellman equation.  相似文献   

7.
A discrete system that models the operation of a dynamic finite state machine (automaton) with memory is considered. In distinction from the classical model of discrete time system, in which the states are changed (switched) at prescribed instants of time, automaton-type systems may change their states at arbitrary instants. Moreover, multiple instantaneous switchings are allowed. Furthermore, the choice of the instants when the automaton “fires” and the number of switchings at these instants are considered as control resources, and they are subject to optimization. Sufficient optimality conditions for such systems are proved. Equations for the optimal open-loop control and for the value (Bellman) function are derived. A method for the synthesis of the optimal control is proposed based on the construction of the value function as the lower envelope of a family of auxiliary functions (generators). Application of the proposed method is illustrated by examples.  相似文献   

8.
We introduce the optimal control problem associated with ultradiffusion processes as a stochastic differential equation constrained optimization of the expected system performance over the set of feasible trajectories. The associated Bellman function is characterized as the solution to a Hamilton–Jacobi equation evaluated along an optimal process. For an important class of ultradiffusion processes, we define the value function in terms of the time and the natural state variables. Approximation solvability is shown and an application to mathematical finance demonstrates the applicability of the paradigm. In particular, we utilize a method-of-lines finite element method to approximate the value function of a European style call option in a market subject to asset liquidity risk (including limit orders) and brokerage fees.  相似文献   

9.
In this paper, we introduce new methods for finding functions that lower bound the value function of a stochastic control problem, using an iterated form of the Bellman inequality. Our method is based on solving linear or semidefinite programs, and produces both a bound on the optimal objective, as well as a suboptimal policy that appears to works very well. These results extend and improve bounds obtained in a previous paper using a single Bellman inequality condition. We describe the methods in a general setting and show how they can be applied in specific cases including the finite state case, constrained linear quadratic control, switched affine control, and multi‐period portfolio investment. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

10.
A two-parameter family of optimal curves in the brachistochrone problem in the case of Coulomb friction is found. The problem is represented in the form of the standard time minimization control problem. The normal component of the support reaction is used as control. It turned out that the formula for the optimal control, which does not include adjoint variables, has a singularity at the zero motion velocity. A system of ordinary differential equations is derived for which the solution of the Cauchy initial value problem makes it possible to obtain optimal trajectories that have a vertical tangent at the initial point. The self-similarity property of such trajectories is proved. It is shown how this property can be used to obtain by scaling all optimal trajectories from the set of optimal trajectories with fixed initial conditions and different terminal slope angles of the tangent.  相似文献   

11.
神经网络增强学习的梯度算法研究   总被引:11,自引:1,他引:11  
徐昕  贺汉根 《计算机学报》2003,26(2):227-233
针对具有连续状态和离散行为空间的Markov决策问题,提出了一种新的采用多层前馈神经网络进行值函数逼近的梯度下降增强学习算法,该算法采用了近似贪心且连续可微的Boltzmann分布行为选择策略,通过极小化具有非平稳行为策略的Bellman残差平方和性能指标,以实现对Markov决策过程最优值函数的逼近,对算法的收敛性和近似最优策略的性能进行了理论分析,通过Mountain-Car学习控制问题的仿真研究进一步验证了算法的学习效率和泛化性能。  相似文献   

12.
The problem of spacecraft damping (damping of initial angular velocity to zero) for a minimal time is studied. Two variants of formulation of the optimization problem are considered; these variants differ in the form of constraints on the control torque. Analytical solution to the formulated problem is obtained in the closed form and numerical expressions for synthesis of optimal angular velocity control program are given. Similar problem of time-optimal angular acceleration of the spacecraft to the given value is also solved. Procedure for determination of the control torque at the initial time instant for the problem of acceleration of the spacecraft to the required angular velocity is presented. Numerical example of solution of the problems of buildup and damping of spacecraft rotation velocity for a minimal time is given.  相似文献   

13.
基于每阶段平均费用最优的激励学习算法   总被引:4,自引:0,他引:4  
文中利用求解最优费用函数的方法给出了一种新的激励学习算法,即基于每阶段平均费用最优的激励学习算法。这种学习算法是求解信息不完全Markov决策问题的一种有效激励学习方法,它从求解分阶段最优平均费用函数的方法出发,分析了最优解的存在性,分阶段最优平均费用函数与初始状态的关系以及与之相关的Bellman方程。这种方法的建立,可以使得动态规划(DP)算法中的许多结论直接应用到激励学习的研究中来。  相似文献   

14.
Principle of optimality or dynamic programming leads to derivation of a partial differential equation (PDE) for solving optimal control problems, namely the Hamilton‐Jacobi‐Bellman (HJB) equation. In general, this equation cannot be solved analytically; thus many computing strategies have been developed for optimal control problems. Many problems in financial mathematics involve the solution of stochastic optimal control (SOC) problems. In this work, the variational iteration method (VIM) is applied for solving SOC problems. In fact, solutions for the value function and the corresponding optimal strategies are obtained numerically. We solve a stochastic linear regulator problem to investigate the applicability and simplicity of the presented method and prove its convergence. In particular, for Merton's portfolio selection model as a problem of portfolio optimization, the proposed numerical method is applied for the first time and its usefulness is demonstrated. For the nonlinear case, we investigate its convergence using Banach's fixed point theorem. The numerical results confirm the simplicity and efficiency of our method.  相似文献   

15.
A new generalized model of propagation of a medicine in tissue is considered. Instead of the traditional diffusion model described by a parabolic equation, the model described by a more general hyperbolic equation is postulated, which predicts finite velocity of disturbance propagation. As a result, the medicine is delivered to the invaded tissue with a finite velocity. The first disturbance (precursor) carries information from the injection to any point in the tissue and, what is important, to the affected zone, wherefrom information arrives at the brain in the form of neurological disorder. The generalized solution of the problem is considered. A brief generalization of the Bellman problem is given concerning the injection of medicine with respect to the time of injection.  相似文献   

16.
This paper proposes an algorithm to compute optimal trajectories for a maneuvering satellite by using a nonlinear programming representation of an optimal control problem. In this problem, a satellite must be located at a given final position with given velocity from initial position and velocity without passing a prohibited region such as the atmosphere while achieving minimum fuel consumption. Optimal control theory is applied to obtain a set of ordinary differential equations subject to two-point boundary conditions (TPB) on the adjoint system. Then an exact penalty function method is employed to obtain the optimal trajectories by solving the TPB problem as initial conditions for the adjoint system and an unknown final time are regarded as decision variables. This formulation, where the optimal control technique and the nonlinear programming method are incorporated, permits more systematic and flexible algorithm implementation.  相似文献   

17.
The brachistochrone problem in the case of dry (Coulomb) and viscous friction with the coefficient that arbitrarily depends on speed is solved. According to the principle of constraint release, the normal component of the supporting curve is used as control. The standard problem of the fastest descent from a given initial point to a given terminal point assuming that the initial velocity is zero is formulated. The Okhotsimskii-Pontryagin method is used to analyze the differential of the objective function. Necessary optimality conditions are found, and a formula for the optimal control that does not involve adjoint variables is derived from them. Differential equations that allow one to obtain extremals by solving a Cauchy problem are set up. Properties of these equations are investigated. A class of simple brachistochrones is distinguished, for which singular points constituting the terminal curve and the reachability domain in the vertical plane are found. Conditions for the existence of zero controls are obtained. For some friction laws, numerical results demonstrating the shape of the determined brachistochrones and optimal time are presented.  相似文献   

18.
The automation of many of the functions currently performed by the controllers and pilots is essential for a safe and efficient air traffic control (ATC) system. A major aspect of the overall ATC system is the guidance and flight control of aircraft. This paper deals with the horizontal guidance of aircraft in and near the terminal area. The problem of guiding an aircraft in minimum time from an arbitrary point to the outer marker is formulated as a nonlinear optimal control problem, and the control law solution is obtained by the application of the maximum principle. It is found that for some initial states the problem is singular. Furthermore, the extremal controls for this problem are not unique. Consequently, the optimal controls must be obtained on the basis of the value of the performance index. The control law is implemented in the form of a digital computer program which computes the optimal trajectory for arbitrary initial conditions.  相似文献   

19.
This article proposes three novel time-varying policy iteration algorithms for finite-horizon optimal control problem of continuous-time affine nonlinear systems. We first propose a model-based time-varying policy iteration algorithm. The method considers time-varying solutions to the Hamiltonian–Jacobi–Bellman equation for finite-horizon optimal control. Based on this algorithm, value function approximation is applied to the Bellman equation by establishing neural networks with time-varying weights. A novel update law for time-varying weights is put forward based on the idea of iterative learning control, which obtains optimal solutions more efficiently compared to previous works. Considering that system models may be unknown in real applications, we propose a partially model-free time-varying policy iteration algorithm that applies integral reinforcement learning to acquiring the time-varying value function. Moreover, analysis of convergence, stability, and optimality is provided for every algorithm. Finally, simulations for different cases are given to verify the convenience and effectiveness of the proposed algorithms.  相似文献   

20.
In this paper,a data-based scheme is proposed to solve the optimal tracking problem of autonomous nonlinear switching systems.The system state is forced to track the reference signal by minimizing the performance function.First,the problem is transformed to solve the corresponding Bellman optimality equation in terms of the Q-function(also named as action value function).Then,an iterative algorithm based on adaptive dynamic programming(ADP)is developed to find the optimal solution which is totally based on sampled data.The linear-in-parameter(LIP)neural network is taken as the value function approximator.Considering the presence of approximation error at each iteration step,the generated approximated value function sequence is proved to be boundedness around the exact optimal solution under some verifiable assumptions.Moreover,the effect that the learning process will be terminated after a finite number of iterations is investigated in this paper.A sufficient condition for asymptotically stability of the tracking error is derived.Finally,the effectiveness of the algorithm is demonstrated with three simulation examples.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号