首页 | 本学科首页   官方微博 | 高级检索  
文章检索
  按 检索   检索词:      
出版年份:   被引次数:   他引次数: 提示:输入*表示无穷大
  收费全文   40篇
  国内免费   1篇
  完全免费   6篇
  自动化技术   47篇
  2021年   1篇
  2020年   1篇
  2017年   2篇
  2016年   1篇
  2014年   4篇
  2013年   4篇
  2011年   2篇
  2010年   3篇
  2009年   2篇
  2008年   4篇
  2007年   3篇
  2006年   2篇
  2005年   1篇
  2004年   2篇
  2003年   2篇
  2002年   2篇
  2001年   2篇
  2000年   2篇
  1997年   1篇
  1993年   1篇
  1991年   1篇
  1986年   1篇
  1984年   1篇
  1983年   1篇
  1981年   1篇
排序方式: 共有47条查询结果,搜索用时 109 毫秒
1.
跳跃扩散股价的最优投资组合选择   总被引:8,自引:0,他引:8       下载免费PDF全文
假定股票价格服从跳跃扩散过程.在传统均值-方差组合投资模型基础上,最大化最终收益的期望及最小化最终财富的方差.引进一个随机线性二次最优控制问题作为原问题的近似问题.证明了一个状态为跳跃扩散过程的一般最优控制问题的验证性定理.应用验证性定理求解HJB(Hamilton-Jacobi-Bellman)方程得到了原问题的最优策略.最后还给出了原问题有效前沿的表达式.  相似文献
2.
基于每阶段平均费用最优的激励学习算法   总被引:4,自引:0,他引:4  
文中利用求解最优费用函数的方法给出了一种新的激励学习算法,即基于每阶段平均费用最优的激励学习算法。这种学习算法是求解信息不完全Markov决策问题的一种有效激励学习方法,它从求解分阶段最优平均费用函数的方法出发,分析了最优解的存在性,分阶段最优平均费用函数与初始状态的关系以及与之相关的Bellman方程。这种方法的建立,可以使得动态规划(DP)算法中的许多结论直接应用到激励学习的研究中来。  相似文献
3.
Risk-Sensitive Reinforcement Learning   总被引:3,自引:0,他引:3  
Most reinforcement learning algorithms optimize the expected return of a Markov Decision Problem. Practice has taught us the lesson that this criterion is not always the most suitable because many applications require robust control strategies which also take into account the variance of the return. Classical control literature provides several techniques to deal with risk-sensitive optimization goals like the so-called worst-case optimality criterion exclusively focusing on risk-avoiding policies or classical risk-sensitive control, which transforms the returns by exponential utility functions. While the first approach is typically too restrictive, the latter suffers from the absence of an obvious way to design a corresponding model-free reinforcement learning algorithm.Our risk-sensitive reinforcement learning algorithm is based on a very different philosophy. Instead of transforming the return of the process, we transform the temporal differences during learning. While our approach reflects important properties of the classical exponential utility framework, we avoid its serious drawbacks for learning. Based on an extended set of optimality equations we are able to formulate risk-sensitive versions of various well-known reinforcement learning algorithms which converge with probability one under the usual conditions.  相似文献
4.
马尔可夫决策过程两种抽象模式   总被引:2,自引:1,他引:1       下载免费PDF全文
抽象层次上马尔可夫决策过程的引入,使得人们可简洁地、陈述地表达复杂的马尔可夫决策过程,解决常规马尔可夫决策过程(MDPs)在实际中所遇到的大型状态空间的表达问题.介绍了结构型和概括型两种不同类型抽象马尔可夫决策过程基本概念以及在各种典型抽象MDPs中的最优策略的精确或近似算法,其中包括与常规MDPs根本不同的一个算法:把Bellman方程推广到抽象状态空间的方法,并且对它们的研究历史进行总结和对它们的发展做一些展望,使得人们对它们有一个透彻的、全面而又重点的理解.  相似文献
5.
We study sufficient conditions for optimally of the controls in a linear-autonomous optimal-time problem with Lipschitz-continuous cost functional. These conditions involve a generalized Hamilton-Jacobi-Bellman equation.  相似文献
6.
The existence and uniqueness of the solution to the Bellman equation for ergodic control of one-dimensional diffusions is established under a ‘near-monotonicity’ condition on the cost. Necessary and sufficient conditions for the optimality of a stable Markov control are given in terms of the same.  相似文献
7.
8.
In this paper an extension of some theorems of [8] is given using the notion of Clarke's generalized gradient. Moreover a feedback law for optimal controls and Bellman's equation are obtained.  相似文献
9.
In this paper we study the convergence of the Galerkin approximation method applied to the generalized Hamilton-Jacobi-Bellman (GHJB) equation over a compact set containing the origin. The GHJB equation gives the cost of an arbitrary control law and can be used to improve the performance of this control. The GHJB equation can also be used to successively approximate the Hamilton-Jacobi-Bellman equation. We state sufficient conditions that guarantee that the Galerkin approximation converges to the solution of the GHJB equation and that the resulting approximate control is stabilizing on the same region as the initial control. The method is demonstrated on a simple nonlinear system and is compared to a result obtained by using exact feedback linearization in conjunction with the LQR design method.  相似文献
10.
This paper proposes a study of Reinforcement Learning (RL) for continuous state-space and time control problems, based on the theoretical framework of viscosity solutions (VSs). We use the method of dynamic programming (DP) which introduces the value function (VF), expectation of the best future cumulative reinforcement. In the continuous case, the value function satisfies a non-linear first (or second) order (depending on the deterministic or stochastic aspect of the process) differential equation called the Hamilton-Jacobi-Bellman (HJB) equation. It is well known that there exists an infinity of generalized solutions (differentiable almost everywhere) to this equation, other than the VF. We show that gradient-descent methods may converge to one of these generalized solutions, thus failing to find the optimal control.In order to solve the HJB equation, we use the powerful framework of viscosity solutions and state that there exists a unique viscosity solution to the HJB equation, which is the value function. Then, we use another main result of VSs (their stability when passing to the limit) to prove the convergence of numerical approximations schemes based on finite difference (FD) and finite element (FE) methods. These methods discretize, at some resolution, the HJB equation into a DP equation of a Markov Decision Process (MDP), which can be solved by DP methods (thanks to a strong contraction property) if all the initial data (the state dynamics and the reinforcement function) were perfectly known. However, in the RL approach, as we consider a system in interaction with some a priori (at least partially) unknown environment, which learns from experience, the initial data are not perfectly known but have to be approximated during learning. The main contribution of this work is to derive a general convergence theorem for RL algorithms when one uses only approximations (in a sense of satisfying some weak contraction property) of the initial data. This result can be used for model-based or model-free RL algorithms, with off-line or on-line updating methods, for deterministic or stochastic state dynamics (though this latter case is not described here), and based on FE or FD discretization methods. It is illustrated with several RL algorithms and one numerical simulation for the Car on the Hill problem.  相似文献
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号