排序方式: 共有47条查询结果,搜索用时 109 毫秒
1.
假定股票价格服从跳跃扩散过程.在传统均值-方差组合投资模型基础上,最大化最终收益的期望及最小化最终财富的方差.引进一个随机线性二次最优控制问题作为原问题的近似问题.证明了一个状态为跳跃扩散过程的一般最优控制问题的验证性定理.应用验证性定理求解HJB(Hamilton-Jacobi-Bellman)方程得到了原问题的最优策略.最后还给出了原问题有效前沿的表达式. 相似文献
2.
3.
Risk-Sensitive Reinforcement Learning 总被引:3,自引:0,他引:3
Most reinforcement learning algorithms optimize the expected return of a Markov Decision Problem. Practice has taught us the lesson that this criterion is not always the most suitable because many applications require robust control strategies which also take into account the variance of the return. Classical control literature provides several techniques to deal with risk-sensitive optimization goals like the so-called worst-case optimality criterion exclusively focusing on risk-avoiding policies or classical risk-sensitive control, which transforms the returns by exponential utility functions. While the first approach is typically too restrictive, the latter suffers from the absence of an obvious way to design a corresponding model-free reinforcement learning algorithm.Our risk-sensitive reinforcement learning algorithm is based on a very different philosophy. Instead of transforming the return of the process, we transform the temporal differences during learning. While our approach reflects important properties of the classical exponential utility framework, we avoid its serious drawbacks for learning. Based on an extended set of optimality equations we are able to formulate risk-sensitive versions of various well-known reinforcement learning algorithms which converge with probability one under the usual conditions. 相似文献
4.
抽象层次上马尔可夫决策过程的引入,使得人们可简洁地、陈述地表达复杂的马尔可夫决策过程,解决常规马尔可夫决策过程(MDPs)在实际中所遇到的大型状态空间的表达问题.介绍了结构型和概括型两种不同类型抽象马尔可夫决策过程基本概念以及在各种典型抽象MDPs中的最优策略的精确或近似算法,其中包括与常规MDPs根本不同的一个算法:把Bellman方程推广到抽象状态空间的方法,并且对它们的研究历史进行总结和对它们的发展做一些展望,使得人们对它们有一个透彻的、全面而又重点的理解. 相似文献
5.
We study sufficient conditions for optimally of the controls in a linear-autonomous optimal-time problem with Lipschitz-continuous cost functional. These conditions involve a generalized Hamilton-Jacobi-Bellman equation. 相似文献
6.
The existence and uniqueness of the solution to the Bellman equation for ergodic control of one-dimensional diffusions is established under a ‘near-monotonicity’ condition on the cost. Necessary and sufficient conditions for the optimality of a stable Markov control are given in terms of the same. 相似文献
7.
8.
In this paper an extension of some theorems of [8] is given using the notion of Clarke's generalized gradient. Moreover a feedback law for optimal controls and Bellman's equation are obtained. 相似文献
9.
In this paper we study the convergence of the Galerkin approximation method applied to the generalized Hamilton-Jacobi-Bellman (GHJB) equation over a compact set containing the origin. The GHJB equation gives the cost of an arbitrary control law and can be used to improve the performance of this control. The GHJB equation can also be used to successively approximate the Hamilton-Jacobi-Bellman equation. We state sufficient conditions that guarantee that the Galerkin approximation converges to the solution of the GHJB equation and that the resulting approximate control is stabilizing on the same region as the initial control. The method is demonstrated on a simple nonlinear system and is compared to a result obtained by using exact feedback linearization in conjunction with the LQR design method. 相似文献
10.
Rémi Munos 《Machine Learning》2000,40(3):265-299
This paper proposes a study of Reinforcement Learning (RL) for continuous state-space and time control problems, based on the theoretical framework of viscosity solutions (VSs). We use the method of dynamic programming (DP) which introduces the value function (VF), expectation of the best future cumulative reinforcement. In the continuous case, the value function satisfies a non-linear first (or second) order (depending on the deterministic or stochastic aspect of the process) differential equation called the Hamilton-Jacobi-Bellman (HJB) equation. It is well known that there exists an infinity of generalized solutions (differentiable almost everywhere) to this equation, other than the VF. We show that gradient-descent methods may converge to one of these generalized solutions, thus failing to find the optimal control.In order to solve the HJB equation, we use the powerful framework of viscosity solutions and state that there exists a unique viscosity solution to the HJB equation, which is the value function. Then, we use another main result of VSs (their stability when passing to the limit) to prove the convergence of numerical approximations schemes based on finite difference (FD) and finite element (FE) methods. These methods discretize, at some resolution, the HJB equation into a DP equation of a Markov Decision Process (MDP), which can be solved by DP methods (thanks to a strong contraction property) if all the initial data (the state dynamics and the reinforcement function) were perfectly known. However, in the RL approach, as we consider a system in interaction with some a priori (at least partially) unknown environment, which learns from experience, the initial data are not perfectly known but have to be approximated during learning. The main contribution of this work is to derive a general convergence theorem for RL algorithms when one uses only approximations (in a sense of satisfying some weak contraction property) of the initial data. This result can be used for model-based or model-free RL algorithms, with off-line or on-line updating methods, for deterministic or stochastic state dynamics (though this latter case is not described here), and based on FE or FD discretization methods. It is illustrated with several RL algorithms and one numerical simulation for the Car on the Hill problem. 相似文献