首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
In this article, optimal control problems of differential equations with delays are investigated for which the associated Hamilton–Jacobi–Bellman (HJB) equations are nonlinear partial differential equations with delays. This type of HJB equation has not been previously studied and is difficult to solve because the state equations do not possess smoothing properties. We introduce a new notion of viscosity solutions and identify the value functional of the optimal control problems as the unique solution to the associated HJB equations. An analytical example is given as application.  相似文献   

2.
In this paper we consider nonautonomous optimal control problems of infinite horizon type, whose control actions are given by L1-functions. We verify that the value function is locally Lipschitz. The equivalence between dynamic programming inequalities and Hamilton–Jacobi–Bellman (HJB) inequalities for proximal sub (super) gradients is proven. Using this result we show that the value function is a Dini solution of the HJB equation. We obtain a verification result for the class of Dini sub-solutions of the HJB equation and also prove a minimax property of the value function with respect to the sets of Dini semi-solutions of the HJB equation. We introduce the concept of viscosity solutions of the HJB equation in infinite horizon and prove the equivalence between this and the concept of Dini solutions. In the Appendix we provide an existence theorem.  相似文献   

3.
This paper considers mobile to base station power control for lognormal fading channels in wireless communication systems within a centralized information stochastic optimal control framework. Under a bounded power rate of change constraint, the stochastic control problem and its associated Hamilton-Jacobi-Bellman (HJB) equation are analyzed by the viscosity solution method; then the degenerate HJB equation is perturbed to admit a classical solution and a suboptimal control law is designed based on the perturbed HJB equation. When a quadratic type cost is used without a bound constraint on the control, the value function is a classical solution to the degenerate HJB equation and the feedback control is affine in the system power. In addition, in this case we develop approximate, but highly scalable, solutions to the HJB equation in terms of a local polynomial expansion of the exact solution. When the channel parameters are not known a priori, one can obtain on-line estimates of the parameters and get adaptive versions of the control laws. In numerical experiments with both of the above cost functions, the following phenomenon is observed: whenever the users have different initial conditions, there is an initial convergence of the power levels to a common level and then subsequent approximately equal behavior which converges toward a stochastically varying optimum.  相似文献   

4.
In this paper we study the optimal stochastic control problem for stochastic differential equations on Riemannian manifolds. The cost functional is specified by controlled backward stochastic differential equations in Euclidean space. Under some suitable assumptions, we conclude that the value function is the unique viscosity solution to the associated Hamilton–Jacobi–Bellman equation which is a fully nonlinear parabolic partial differential equation on Riemannian manifolds.  相似文献   

5.
Munos  Rémi 《Machine Learning》2000,40(3):265-299
This paper proposes a study of Reinforcement Learning (RL) for continuous state-space and time control problems, based on the theoretical framework of viscosity solutions (VSs). We use the method of dynamic programming (DP) which introduces the value function (VF), expectation of the best future cumulative reinforcement. In the continuous case, the value function satisfies a non-linear first (or second) order (depending on the deterministic or stochastic aspect of the process) differential equation called the Hamilton-Jacobi-Bellman (HJB) equation. It is well known that there exists an infinity of generalized solutions (differentiable almost everywhere) to this equation, other than the VF. We show that gradient-descent methods may converge to one of these generalized solutions, thus failing to find the optimal control.In order to solve the HJB equation, we use the powerful framework of viscosity solutions and state that there exists a unique viscosity solution to the HJB equation, which is the value function. Then, we use another main result of VSs (their stability when passing to the limit) to prove the convergence of numerical approximations schemes based on finite difference (FD) and finite element (FE) methods. These methods discretize, at some resolution, the HJB equation into a DP equation of a Markov Decision Process (MDP), which can be solved by DP methods (thanks to a strong contraction property) if all the initial data (the state dynamics and the reinforcement function) were perfectly known. However, in the RL approach, as we consider a system in interaction with some a priori (at least partially) unknown environment, which learns from experience, the initial data are not perfectly known but have to be approximated during learning. The main contribution of this work is to derive a general convergence theorem for RL algorithms when one uses only approximations (in a sense of satisfying some weak contraction property) of the initial data. This result can be used for model-based or model-free RL algorithms, with off-line or on-line updating methods, for deterministic or stochastic state dynamics (though this latter case is not described here), and based on FE or FD discretization methods. It is illustrated with several RL algorithms and one numerical simulation for the Car on the Hill problem.  相似文献   

6.
The Hamilton–Jacobi–Bellman (HJB) equation can be solved to obtain optimal closed-loop control policies for general nonlinear systems. As it is seldom possible to solve the HJB equation exactly for nonlinear systems, either analytically or numerically, methods to build approximate solutions through simulation based learning have been studied in various names like neurodynamic programming (NDP) and approximate dynamic programming (ADP). The aspect of learning connects these methods to reinforcement learning (RL), which also tries to learn optimal decision policies through trial-and-error based learning. This study develops a model-based RL method, which iteratively learns the solution to the HJB and its associated equations. We focus particularly on the control-affine system with a quadratic objective function and the finite horizon optimal control (FHOC) problem with time-varying reference trajectories. The HJB solutions for such systems involve time-varying value, costate, and policy functions subject to boundary conditions. To represent the time-varying HJB solution in high-dimensional state space in a general and efficient way, deep neural networks (DNNs) are employed. It is shown that the use of DNNs, compared to shallow neural networks (SNNs), can significantly improve the performance of a learned policy in the presence of uncertain initial state and state noise. Examples involving a batch chemical reactor and a one-dimensional diffusion-convection-reaction system are used to demonstrate this and other key aspects of the method.  相似文献   

7.
This paper is concerned with developing accurate and efficient numerical methods for fully nonlinear second order elliptic and parabolic partial differential equations (PDEs) in multiple spatial dimensions. It presents a general framework for constructing high order local discontinuous Galerkin (LDG) methods for approximating viscosity solutions of these fully nonlinear PDEs. The proposed LDG methods are natural extensions of a narrow-stencil finite difference framework recently proposed by the authors for approximating viscosity solutions. The idea of the methodology is to use multiple approximations of first and second order derivatives as a way to resolve the potential low regularity of the underlying viscosity solution. Consistency and generalized monotonicity properties are proposed that ensure the numerical operator approximates the differential operator. The resulting algebraic system has several linear equations coupled with only one nonlinear equation that is monotone in many of its arguments. The structure can be explored to design nonlinear solvers. This paper also presents and analyzes numerical results for several numerical test problems in two dimensions which are used to gauge the accuracy and efficiency of the proposed LDG methods.  相似文献   

8.
This paper considers an infinite horizon investment-consumption model in which a single agent consumes and distributes his wealth between two assets, a bond and a stock. The problem of maximization of the total utility from consumption is treated, when state (amount allocated in assets) and control (consumption, rates of trading) constraints are present. The value function is characterized as the unique viscosity solution of the Hamilton-Jacobi-Bellman equation which, actually, is a Variational Inequality with gradient constraints. Numerical schemes are then constructed in order to compute the value function and the location of the free boundaries of the so-called transaction regions. These schemes are a combination of implicit and explicit schemes; their convergence is obtained from the uniqueness of viscosity solutions to the HJB equation.  相似文献   

9.
The Hamilton-Jacobi-Bellman (HJB) equation corresponding to constrained control is formulated using a suitable nonquadratic functional. It is shown that the constrained optimal control law has the largest region of asymptotic stability (RAS). The value function of this HJB equation is solved for by solving for a sequence of cost functions satisfying a sequence of Lyapunov equations (LE). A neural network is used to approximate the cost function associated with each LE using the method of least-squares on a well-defined region of attraction of an initial stabilizing controller. As the order of the neural network is increased, the least-squares solution of the HJB equation converges uniformly to the exact solution of the inherently nonlinear HJB equation associated with the saturating control inputs. The result is a nearly optimal constrained state feedback controller that has been tuned a priori off-line.  相似文献   

10.
Principle of optimality or dynamic programming leads to derivation of a partial differential equation (PDE) for solving optimal control problems, namely the Hamilton‐Jacobi‐Bellman (HJB) equation. In general, this equation cannot be solved analytically; thus many computing strategies have been developed for optimal control problems. Many problems in financial mathematics involve the solution of stochastic optimal control (SOC) problems. In this work, the variational iteration method (VIM) is applied for solving SOC problems. In fact, solutions for the value function and the corresponding optimal strategies are obtained numerically. We solve a stochastic linear regulator problem to investigate the applicability and simplicity of the presented method and prove its convergence. In particular, for Merton's portfolio selection model as a problem of portfolio optimization, the proposed numerical method is applied for the first time and its usefulness is demonstrated. For the nonlinear case, we investigate its convergence using Banach's fixed point theorem. The numerical results confirm the simplicity and efficiency of our method.  相似文献   

11.
An approach to solve finite time horizon suboptimal feedback control problems for partial differential equations is proposed by solving dynamic programming equations on adaptive sparse grids. A semi-discrete optimal control problem is introduced and the feedback control is derived from the corresponding value function. The value function can be characterized as the solution of an evolutionary Hamilton–Jacobi Bellman (HJB) equation which is defined over a state space whose dimension is equal to the dimension of the underlying semi-discrete system. Besides a low dimensional semi-discretization it is important to solve the HJB equation efficiently to address the curse of dimensionality. We propose to apply a semi-Lagrangian scheme using spatially adaptive sparse grids. Sparse grids allow the discretization of the value functions in (higher) space dimensions since the curse of dimensionality of full grid methods arises to a much smaller extent. For additional efficiency an adaptive grid refinement procedure is explored. The approach is illustrated for the wave equation and an extension to equations of Schrödinger type is indicated. We present several numerical examples studying the effect the parameters characterizing the sparse grid have on the accuracy of the value function and the optimal trajectory.  相似文献   

12.
A new approach for solving finite-time horizon feedback control problems for distributed parameter systems is proposed. It is based on model reduction by proper orthogonal decomposition combined with efficient numerical methods for solving the resulting low-order evolutionary Hamilton-Jacobi-Bellman (HJB) equation. The feasibility of the proposed methodology is demonstrated by means of optimal feedback control for the Burgers equation. The method for solving the HJB equation is first tested on several 1-D problems and then successfully applied to the control of the reduced order Burgers equation. The effect of noise is investigated, and parallelism is used for computational speedup.  相似文献   

13.
On reachability and minimum cost optimal control   总被引:1,自引:0,他引:1  
Questions of reachability for continuous and hybrid systems can be formulated as optimal control or game theory problems, whose solution can be characterized using variants of the Hamilton-Jacobi-Bellman or Isaacs partial differential equations. The formal link between the solution to the partial differential equation and the reachability problem is usually established in the framework of viscosity solutions. This paper establishes such a link between reachability, viability and invariance problems and viscosity solutions of a special form of the Hamilton-Jacobi equation. This equation is developed to address optimal control problems where the cost function is the minimum of a function of the state over a specified horizon. The main advantage of the proposed approach is that the properties of the value function (uniform continuity) and the form of the partial differential equation (standard Hamilton-Jacobi form, continuity of the Hamiltonian and simple boundary conditions) make the numerical solution of the problem much simpler than other approaches proposed in the literature. This fact is demonstrated by applying our approach to a reachability problem that arises in flight control and using numerical tools to compute the solution.  相似文献   

14.
针对含扩散项不可靠随机生产系统最优生产控制的优化命题, 采用数值解方法来求解该优化命题最优控制所满足的模态耦合的非线性偏微分HJB方程. 首先构造Markov链来近似生产系统状态演化, 并基于局部一致性原理, 把求解连续时间随机控制问题转化为求解离散时间的Markov决策过程问题, 然后采用数值迭代和策略迭代算法来实现最优控制数值求解过程. 文末仿真结果验证了该方法的正确性和有效性.  相似文献   

15.
In this paper, we present an empirical study of iterative least squares minimization of the Hamilton-Jacobi-Bellman (HJB) residual with a neural network (NN) approximation of the value function. Although the nonlinearities in the optimal control problem and NN approximator preclude theoretical guarantees and raise concerns of numerical instabilities, we present two simple methods for promoting convergence, the effectiveness of which is presented in a series of experiments. The first method involves the gradual increase of the horizon time scale, with a corresponding gradual increase in value function complexity. The second method involves the assumption of stochastic dynamics which introduces a regularizing second derivative term to the HJB equation. A gradual reduction of this term provides further stabilization of the convergence. We demonstrate the solution of several problems, including the 4-D inverted-pendulum system with bounded control. Our approach requires no initial stabilizing policy or any restrictive assumptions on the plant or cost function, only knowledge of the plant dynamics. In the Appendix, we provide the equations for first- and second-order differential backpropagation.  相似文献   

16.
In this paper, a new formulation for the optimal tracking control problem (OTCP) of continuous-time nonlinear systems is presented. This formulation extends the integral reinforcement learning (IRL) technique, a method for solving optimal regulation problems, to learn the solution to the OTCP. Unlike existing solutions to the OTCP, the proposed method does not need to have or to identify knowledge of the system drift dynamics, and it also takes into account the input constraints a priori. An augmented system composed of the error system dynamics and the command generator dynamics is used to introduce a new nonquadratic discounted performance function for the OTCP. This encodes the input constrains into the optimization problem. A tracking Hamilton–Jacobi–Bellman (HJB) equation associated with this nonquadratic performance function is derived which gives the optimal control solution. An online IRL algorithm is presented to learn the solution to the tracking HJB equation without knowing the system drift dynamics. Convergence to a near-optimal control solution and stability of the whole system are shown under a persistence of excitation condition. Simulation examples are provided to show the effectiveness of the proposed method.  相似文献   

17.
In this paper a generator of hybrid methods with minimal phase-lag is developed for the numerical solution of the Schrödinger equation and related problems. The generator's methods are dissipative and are of eighth algebraic order. In order to have minimal phase-lag with the new methods, their coefficients are determined automatically. Numerical results obtained by their application to some well known problems with periodic or oscillating solutions and to the coupled differential equations of the Schrödinger type indicate the efficiency of these new methods.  相似文献   

18.
Many simulation algorithms (chemical reaction systems, differential systems arising from the modelling of transient behaviour in the process industries etc.) contain the numerical solution of systems of differential equations. For the efficient solution of the above mentioned problems, linear multistep methods or Runge-Kutta single-step methods are used. For the simulation of chemical procedures the radial Schrödinger equation is used frequently. In the present paper we will study a class of linear multistep methods. More specifically, the purpose of this paper is to develop an efficient algorithm for the approximate solution of the radial Schrödinger equation and related problems. This algorithm belongs in the category of the multistep methods. In order to produce an efficient multistep method the phase-lag property and its derivatives are used. Hence the main result of this paper is the development of an efficient multistep method for the numerical solution of systems of ordinary differential equations with oscillating or periodical solutions. The reason of their efficiency, as the analysis proved, is that the phase-lag and its derivatives are eliminated. Another reason of the efficiency of the new obtained methods is that they have high algebraic order  相似文献   

19.
In this paper, we consider the use of nonlinear networks towards obtaining nearly optimal solutions to the control of nonlinear discrete-time (DT) systems. The method is based on least squares successive approximation solution of the generalized Hamilton-Jacobi-Bellman (GHJB) equation which appears in optimization problems. Successive approximation using the GHJB has not been applied for nonlinear DT systems. The proposed recursive method solves the GHJB equation in DT on a well-defined region of attraction. The definition of GHJB, pre-Hamiltonian function, HJB equation, and method of updating the control function for the affine nonlinear DT systems under small perturbation assumption are proposed. A neural network (NN) is used to approximate the GHJB solution. It is shown that the result is a closed-loop control based on an NN that has been tuned a priori in offline mode. Numerical examples show that, for the linear DT system, the updated control laws will converge to the optimal control, and for nonlinear DT systems, the updated control laws will converge to the suboptimal control.  相似文献   

20.
An optimal control problem is considered for a multi-degree-of-freedom (MDOF) system, excited by a white-noise random force. The problem is to minimize the expected response energy by a given time instantT by applying a vector control force with given bounds on magnitudes of its components. This problem is governed by the Hamilton-Jacobi-Bellman, or HJB, partial differential equation. This equation has been studied previously [1] for the case of a single-degree-of-freedom system by developing a hybrid solution. Specifically, an exact analitycal solution has been obtained within a certain outer domain of the phase plane, which provides necessary boundary conditions for numerical solution within a bounded in velocity inner domain, thereby alleviating problem of numerical analysis for an unbounded domain. This hybrid approach is extended here to MDOF systems using common transformation to modal coordinates. The multidimensional HJB equation is solved explicitly for the corresponding outer domain, thereby reducing the problem to a set of numerical solutions within bounded inner domains. Thus, the problem of bounded optimal control is solved completely as long as the necessary modal control forces can be implemented in the actuators. If, however, the control forces can be applied to the original generalized coordinates only, the resulting optimal control law may become unfeasible. The reason is the nonlinearity in maximization operation for modal control forces, which may lead to violation of some constraints after inverse transformation to original coordinates. A semioptimal control law is illustrated for this case, based on projecting boundary points of the domain of the admissible transformed control forces onto boundaries of the domain of the original control forces. Case of a single control force is considered also, and similar solution to the HJB equation is derived.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号