首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We consider Markov decisions processes with a target set, where criterion function is an expectation of minimum function. We formulate the problem as an infinite horizon case with a recurrent class. We show under some conditions that an optimal value function is a unique solution to an optimality equation and there exists a stationary optimal policy. Also we give a policy improvement method.  相似文献   

2.
In this paper we study the problem of ergodic impulsive control of Feller processes with costly information. We prove continuity of the value functions for optimal stopping and impulsive control with long run average cost. We characterize the value functions as generalized solutions of respective quasi-variational inequalities and describe optimal policies. We study also an equation associated to impulsive control with long run average cost.  相似文献   

3.
使用马氏决策过程研究了概率离散事件系统的最优控制问题.首先,通过引入费用函数、目标函数以及最优函数的定义,建立了可以确定最优监控器的最优方程.之后,又通过此最优方程获得了给定语言的极大可控、∈-包含闭语言.最后给出了获得最优费用与最优监控器的算法.  相似文献   

4.
A deterministic optimal control problem is solved for a control-affine non-linear system with a non-quadratic cost function. We algebraically solve the Hamilton–Jacobi equation for the gradient of the value function. This eliminates the need to explicitly solve the solution of a Hamilton–Jacobi partial differential equation. We interpret the value function in terms of the control Lyapunov function. Then we provide the stabilizing controller and the stability margins. Furthermore, we derive an optimal controller for a control-affine non-linear system using the state dependent Riccati equation (SDRE) method; this method gives a similar optimal controller as the controller from the algebraic method. We also find the optimal controller when the cost function is the exponential-of-integral case, which is known as risk-sensitive (RS) control. Finally, we show that SDRE and RS methods give equivalent optimal controllers for non-linear deterministic systems. Examples demonstrate the proposed methods.  相似文献   

5.
In this paper we study the average cost criterion induced by the regular utility function (U-average cost criterion) for continuous-time Markov decision processes. This criterion is a generalization of the risk-sensitive average cost and expected average cost criteria. We first introduce an auxiliary risk-sensitive first passage optimization problem and obtain the properties of the corresponding optimal value function under the slight conditions. Then we show that the pair of the optimal value functions of the risk-sensitive average cost criterion and the risk-sensitive first passage criterion is a solution to the optimality equation of the risk-sensitive average cost criterion allowing the risk-sensitivity parameter to take any nonzero value. Moreover, we have that the optimal value function of the risk-sensitive average cost criterion is continuous with respect to the risk-sensitivity parameter. Finally, we give the connections between the U-average cost criterion and the average cost criteria induced by the identity function and the exponential utility function, and prove the existence of a U-average optimal deterministic stationary policy in the class of all randomized Markov policies.  相似文献   

6.
This paper deals with Markov decision processes with a target set for nonpositive rewards. Two types of threshold probability criteria are discussed. The first criterion is a probability that a total reward is not greater than a given initial threshold value, and the second is a probability that the total reward is less than it. Our first (resp. second) optimizing problem is to minimize the first (resp. second) threshold probability. These problems suggest that the threshold value is a permissible level of the total reward to reach a goal (the target set), that is, we would reach this set over the level, if possible. For the both problems, we show that 1) the optimal threshold probability is a unique solution to an optimality equation, 2) there exists an optimal deterministic stationary policy, and 3) a value iteration and a policy space iteration are given. In addition, we prove that the first (resp. second) optimal threshold probability is a monotone increasing and right (resp. left) continuous function of the initial threshold value and propose a method to obtain an optimal policy and the optimal threshold probability in the first problem by using them in the second problem.  相似文献   

7.
In this paper we study the optimal stochastic control problem for stochastic differential equations on Riemannian manifolds. The cost functional is specified by controlled backward stochastic differential equations in Euclidean space. Under some suitable assumptions, we conclude that the value function is the unique viscosity solution to the associated Hamilton–Jacobi–Bellman equation which is a fully nonlinear parabolic partial differential equation on Riemannian manifolds.  相似文献   

8.
The Hamilton-Jacobi-Bellman (HJB) equation corresponding to constrained control is formulated using a suitable nonquadratic functional. It is shown that the constrained optimal control law has the largest region of asymptotic stability (RAS). The value function of this HJB equation is solved for by solving for a sequence of cost functions satisfying a sequence of Lyapunov equations (LE). A neural network is used to approximate the cost function associated with each LE using the method of least-squares on a well-defined region of attraction of an initial stabilizing controller. As the order of the neural network is increased, the least-squares solution of the HJB equation converges uniformly to the exact solution of the inherently nonlinear HJB equation associated with the saturating control inputs. The result is a nearly optimal constrained state feedback controller that has been tuned a priori off-line.  相似文献   

9.
In this paper we consider nonautonomous optimal control problems of infinite horizon type, whose control actions are given by L1-functions. We verify that the value function is locally Lipschitz. The equivalence between dynamic programming inequalities and Hamilton–Jacobi–Bellman (HJB) inequalities for proximal sub (super) gradients is proven. Using this result we show that the value function is a Dini solution of the HJB equation. We obtain a verification result for the class of Dini sub-solutions of the HJB equation and also prove a minimax property of the value function with respect to the sets of Dini semi-solutions of the HJB equation. We introduce the concept of viscosity solutions of the HJB equation in infinite horizon and prove the equivalence between this and the concept of Dini solutions. In the Appendix we provide an existence theorem.  相似文献   

10.
连续时间MCP在紧致行动集上的最优策略   总被引:8,自引:2,他引:8  
文中研究了一类连续时间Markov控制过程(CTMCP)无穷水平平均代价性能的最优控 制决策问题.文章采用无穷小生成元和性能势的基本性质,直接导出了平均代价模型在紧致行动 集上的最优性方程及其解的存在性定理,提出了求解ε-最优平稳控制策略的数值迭代算法,并给 出了这种算法的收敛性证明.最后通过分析一个数值例子来说明这种方法的应用.  相似文献   

11.
In this article, we consider an optimal control problem in which the controlled state dynamics is governed by a stochastic evolution equation in Hilbert spaces and the cost functional has a quadratic growth. The existence and uniqueness of the optimal control are obtained by the means of an associated backward stochastic differential equations with a quadratic growth and an unbounded terminal value. As an application, an optimal control of stochastic partial differential equations with dynamical boundary conditions is also given to illustrate our results.  相似文献   

12.
We discuss optimal control problems with integral state-control constraints. We rewrite the problem in an equivalent form as an optimal control problem with state constraints for an extended system, and prove that the value function, although possibly discontinuous, is the unique viscosity solution of the constrained boundary value problem for the corresponding Hamilton–Jacobi equation. The state constraint is the epigraph of the minimal solution of a second Hamilton–Jacobi equation. Our framework applies, for instance, to systems with design uncertainties.  相似文献   

13.
In this note, we present a sampling algorithm, called recursive automata sampling algorithm (RASA), for control of finite-horizon Markov decision processes (MDPs). By extending in a recursive manner Sastry's learning automata pursuit algorithm designed for solving nonsequential stochastic optimization problems, RASA returns an estimate of both the optimal action from a given state and the corresponding optimal value. Based on the finite-time analysis of the pursuit algorithm by Rajaraman and Sastry, we provide an analysis for the finite-time behavior of RASA. Specifically, for a given initial state, we derive the following probability bounds as a function of the number of samples: 1) a lower bound on the probability that RASA will sample the optimal action and 2) an upper bound on the probability that the deviation between the true optimal value and the RASA estimate exceeds a given error.  相似文献   

14.
In a series of papers, we proved theorems characterizing the value function in exit time optimal control as the unique viscosity solution of the corresponding Bellman equation that satisfies appropriate side conditions. The results applied to problems which satisfy a positivity condition on the integral of the Lagrangian. This positive integral condition assigned a positive cost for remaining outside the target on any interval of positive length. In this note, we prove a new theorem which characterizes the exit time value function as the unique bounded-from-below viscosity solution of the Bellman equation that vanishes on the target. The theorem applies to problems satisfying an asymptotic condition on the trajectories, including cases where the positive integral condition is not satisfied. Our results are based on an extended version of “Barb lat's lemma”. We apply the theorem to variants of the Fuller problem and other examples where the Lagrangian is degenerate.  相似文献   

15.
In this paper, we present a numerical method for optimal experiment design of nonlinear dynamic processes. Here, we suggest to optimize an approximation of the predicted variance–covariance matrix of the parameter estimates, which can be computed as the solution of a Riccati differential equation. In contrast to existing approaches, the proposed method allows us to take process noise into account and requires less derivative states to be computed compared to the traditional Fisher information matrix based approach. This process noise is assumed to be a time-varying random disturbance which is not known at the time when the experiment is designed. We illustrate the technique by solving an optimal experiment design problem for a fed-batch bioreactor benchmark case study. Here, we concentrate on how the optimal input design and associated accuracy of the parameter identification is influenced when process noise is present.  相似文献   

16.
An approach to solve finite time horizon suboptimal feedback control problems for partial differential equations is proposed by solving dynamic programming equations on adaptive sparse grids. A semi-discrete optimal control problem is introduced and the feedback control is derived from the corresponding value function. The value function can be characterized as the solution of an evolutionary Hamilton–Jacobi Bellman (HJB) equation which is defined over a state space whose dimension is equal to the dimension of the underlying semi-discrete system. Besides a low dimensional semi-discretization it is important to solve the HJB equation efficiently to address the curse of dimensionality. We propose to apply a semi-Lagrangian scheme using spatially adaptive sparse grids. Sparse grids allow the discretization of the value functions in (higher) space dimensions since the curse of dimensionality of full grid methods arises to a much smaller extent. For additional efficiency an adaptive grid refinement procedure is explored. The approach is illustrated for the wave equation and an extension to equations of Schrödinger type is indicated. We present several numerical examples studying the effect the parameters characterizing the sparse grid have on the accuracy of the value function and the optimal trajectory.  相似文献   

17.
为连续非线性系统提出了一种有效的最优控制设计方法. 广义模糊双曲模型(Generalized fuzzy hyperbolic model, GFHM)首次作为逼近器用来估计 HJB (Hamilton-Jacobi-Bellman)方程的解 (值函数,即它是状态与代价函数之间的映射), 然后,利用该近似解获得最优控制. 本文方法只需要一个GFHM估计值函数. 首先, 阐述了对于连线非线性系统最优控制的设计过程; 然后,证明了逼近误差是一致最终有界的 (Uniformly ultimately bounded, UUB); 最后, 一个数值例子验证了本文方法的有效性. 另一个例子通过与神经网络自适应动态规划的方法作比较, 演示了本文方法的优点.  相似文献   

18.
In this short note we formulate a infinite-horizon stochastic optimal control problem for jump-diffusions of Ito–Levy type as a LP problem in a measure space, and prove that the optimal value functions of both problems coincide. The main tools are the dual formulation of the LP primal problem, which is strongly connected to the notion of sub-solution of the partial integro-differential equation of Hamilton–Jacobi–Bellman type associated with the optimal control problem, and the Krylov regularization method for viscosity solutions.  相似文献   

19.
The authors develop a theory characterizing optimal stopping times for discrete-time ergodic Markov processes with discounted rewards. The theory differs from prior work by its view of per-stage and terminal reward functions as elements of a certain Hilbert space. In addition to a streamlined analysis establishing existence and uniqueness of a solution to Bellman's equation, this approach provides an elegant framework for the study of approximate solutions. In particular, the authors propose a stochastic approximation algorithm that tunes weights of a linear combination of basis functions in order to approximate a value function. They prove that this algorithm converges (almost surely) and that the limit of convergence has some desirable properties. The utility of the approximation method is illustrated via a computational case study involving the pricing of a path dependent financial derivative security that gives rise to an optimal stopping problem with a 100-dimensional state space  相似文献   

20.
The robust maximum principle applied to the minimax linear quadratic problem is derived for stochastic differential equations containing a control-dependent diffusion term. The parametric families of the first and second order adjoint stochastic processes are obtained to construct the corresponding Hamiltonian formalism. The Hamiltonian function used for the construction of the robust optimal control is shown to be equal to the sum of the standard stochastic Hamiltonians corresponding to each value of the uncertain parameter from a given finite set. The cost function is considered on a finite horizon (contains the mathematical expectation of both an integral and a terminal term) and on an infinite one (a time-averaged losses function). These problems belong to the class of minimax stochastic optimization problems. It is shown that the construction of the minimax optimal controller can be reduced to an optimization problem on a finitedimensional simplex and consists in the analysis of the dependence of Riccati equation solution on the weight parameters to be found.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号