首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In this paper we study the average cost criterion induced by the regular utility function (U-average cost criterion) for continuous-time Markov decision processes. This criterion is a generalization of the risk-sensitive average cost and expected average cost criteria. We first introduce an auxiliary risk-sensitive first passage optimization problem and obtain the properties of the corresponding optimal value function under the slight conditions. Then we show that the pair of the optimal value functions of the risk-sensitive average cost criterion and the risk-sensitive first passage criterion is a solution to the optimality equation of the risk-sensitive average cost criterion allowing the risk-sensitivity parameter to take any nonzero value. Moreover, we have that the optimal value function of the risk-sensitive average cost criterion is continuous with respect to the risk-sensitivity parameter. Finally, we give the connections between the U-average cost criterion and the average cost criteria induced by the identity function and the exponential utility function, and prove the existence of a U-average optimal deterministic stationary policy in the class of all randomized Markov policies.  相似文献   

2.
An M/G/1 queue where the server may take repeated vacations is considered. Whenever a busy period terminates, the server takes a vacation of random duration. At the end of each vacation, the server may either take a new vacation or resume service; if the queue is found empty, the server always takes a new vacation. The cost structure includes a holding cost per unit of time and per customer in the system and a cost each time the server is turned on. One discounted cost criterion and two average cost criteria are investigated. It is shown that the vacation policy that minimizes the discounted cost criterion over all policies (randomized, history dependent, etc.) converges to a threshold policy as the discount factor goes to zero. This result relies on a nonstandard use of the value iteration algorithm of dynamic programming and is used to prove that both average cost problems are minimized by a threshold policy  相似文献   

3.
Constrained Markov decision problems (CMDPs) with the average cost criterion and a single ergodic chain, or the discounted cost with a general multichain structure, are considered. Conditions for stability of the optimal value and control to changes of the parameters of the problem, such as immediate costs, transition probabilities, and the discount factor, are established. Singular constrained problems, for which the optimal value and controls exhibit discontinuities, are studied  相似文献   

4.
《Systems & Control Letters》2007,56(11-12):663-668
According to Assaf, a dynamic programming problem is called invariant if its transition mechanism depends only on the chosen action. This paper studies properties of risk-sensitive invariant problems with a general state space. The main result establishes the optimality equation for the risk-sensitive average cost criterion without any restrictions on the risk factor. Moreover, a practical algorithm is provided for solving the optimality equation in case of a finite action space.  相似文献   

5.
The purpose of this paper is to characterize and prove robustness properties of risk-sensitive controllers precisely. In particular, we establish a stochastic version of the small gain theorem. This theorem is expressed in terms of an inequality which bounds the average output power in terms of the input power. Since this inequality is closely related to the risk-sensitive criterion, our stochastic small gain theorem can be expressed in terms of the risk-sensitive criterion. This provides a concrete motivation for the use of the risk-sensitive criterion stochastic robustness. Date received: September 10, 1998. Date revised: June 5, 2000.  相似文献   

6.
The problem considered is one of simultaneously identifying an unknown system while adequately controlling it. The system can be any fairly general discrete-time system and the cost criterion can be either of a discounted type or of a long-term average type, the chief restriction being that the unknown parameter lies in a finite parameter set. For a previously introduced scheme of identification and control based on "biased" maximum likelihood estimates, it is shown that 1) every Cesaro-limit point of the parameter estimates is "closed-loop equivalent" to the unknown parameter; 2) for both the discounted and long-term average cost criteria, the adaptive control law Cesaro-converges to the set of optimal control laws; and 3) in the case of the long-term average cost criterion, the actual cost incurred by the use of the adaptive controller is optimal and cannot be bettered even if one knew the value of the unknown parameter at the start.  相似文献   

7.
Modified age replacement policies, where a system is not always as good as new after each maintenance, are discussed. The expected total discounted cost for an infinite time-span, introducing some costs and a continuous type discount rate is applied as a criterion of optimality, and the optimum policies minimizing this cost are obtained. It is shown that, under certain conditions, there exists a finite and unique optimum policy. Relations to the existing publications are also shown.  相似文献   

8.
9.
Finite-dimensional optimal risk-sensitive filters and smoothers are obtained for discrete-time nonlinear systems by adjusting the standard exponential of a quadratic risk-sensitive cost index to one involving the plant nonlinearity. It is seen that these filters and smoothers are the same as those for a fictitious linear plant with the exponential of squared estimation error as the corresponding risk-sensitive cost index. Such finite-dimensional filters do not exist for nonlinear systems in the case of minimum variance filtering and control  相似文献   

10.
This work concerns semi-Markov chains evolving on a finite state space. The system development generates a cost when a transition is announced, as well as a holding cost which is incurred continuously during each sojourn time. It is assumed that these costs are paid by an observer with positive and constant risk-sensitivity, and the overall performance of the system is measured by the corresponding (long-run) risk-sensitive average cost criterion. In this framework, conditions are provided under which the average index does not depend on the initial state and is characterized in terms of a single Poisson equation.  相似文献   

11.
Dynamic games in which each player has an exponential cost criterion are referred to as risk-sensitive dynamic games. In this note, Nash equilibria are considered for such games. Feedback risk-sensitive Nash equilibrium solutions are derived for two-person discrete time linear-quadratic nonzero-sum games, both under complete state observation and shared partial observation  相似文献   

12.
In this note, we study the problem of output-feedback control design for a class of strict feedback stochastic nonlinear systems. Under an infinite-horizon risk-sensitive cost criterion, the controller designed can guarantee an arbitrary small long-term average cost for arbitrary risk-sensitivity parameter and achieve boundedness in probability for the closed-loop system, using the integrator backstepping methodology. Furthermore, the controller preserves the equilibrium at the origin of the nonlinear system.  相似文献   

13.
研究了一类严格反馈随机非线性系统的输出反馈设计问题.在无限时区风险灵敏度指标下,应用积分反推(integrator backstepping)技术,设计了控制器.所设计的控制器能够保障对任意风险灵敏度系数具有任意小的指标,并且闭环系统为概率意义下有界的.特别地,所设计的控制器还能保证控制器的平衡条件.仿真例子验证了理论结果的正确性.  相似文献   

14.
A new approach to study ergodicity of filtering processes is presented. It is based on the vanishing discount approach to discounted functional of filtering process. We show that limit superior of the Cesaro averages of the functionals is the same for all initial conditions from which the uniqueness of invariant measures of filtering processes follows. The approach is based on certain assumption for which we provide a sufficient condition using concavity arguments. In addition we show the existence of solutions to the Poisson equation corresponding to filtering process with concave functional. The assumptions are then extended to the controlled case and using similar concave arguments we obtain the existence of solutions to the Bellman equation corresponding to partially observed average cost per unit time problem.  相似文献   

15.
The tracking control problem for a class of stochastic and uncertain non-linear systems is addressed. The proposed controller uses suitable radial basis function neural network designs for the approximation of the unknown non-linearities while it is arbitrarily regulated in order to effectively penalize the tracking error. This regulation is implemented through a risk-sensitivity parameter. A stability analysis based on Lyapunov functions obtained by the backstepping technique, proves that all the error variables are bounded in probability; simultaneously, for any given risk-sensitivity parameter the system performance is regulated with respect to both a desired small average tracking error and low long-term average cost in accordance to a risk-sensitive cost criterion. Moreover, the larger this parameter is, the mean square tracking error becomes semiglobally uniformly ultimately bounded in a smaller area while a lower level of a long-term average cost is achieved. The effectiveness of the design approach is illustrated by simulation results wherein it becomes clear how one can achieve a tradeoff between good response and control effort.  相似文献   

16.
Markov decision processes (MDP) with discounted cost are equivalent to processes with a finite-random duration, and, hence, the discount factor models a (random) time horizon for the life of the process. We elaborate on this idea, but show that an objective function which is a linear combination of several discounted costs (each with a different discount factor) does not, in general, model processes with several time scales, but rather processes with partial information  相似文献   

17.
在无限时区风险灵敏度指标下,研究了一类具有严格反馈形式的 Markov 跳跃非线性系统的控制器设计问题.首先,将此问题的可解性转化为一类 HJB 方程的可解性;然后根据此方程,构造性地给出一个与模态无关的控制器,此控制器可保证闭环系统是依概率有界的,且风险灵敏度指标不大于任意给定的正常数,特别地,当噪声项在原点处消逝时,能够确保风险灵敏度指标为零;最后,通过仿真例子验证了理论结果的正确性.  相似文献   

18.
In this paper, we address the problem of risk-sensitive filtering and smoothing for discrete-time Hidden Markov Models (HMM) with finite-discrete states. The objective of risk-sensitive filtering is to minimise the expectation of the exponential of the squared estimation error weighted by a risk-sensitive parameter. We use the so-called Reference Probability Method in solving this problem. We achieve finite-dimensional linear recursions in the information state, and thereby the state estimate that minimises the risk-sensitive cost index. Also, fixed-interval smoothing results are derived. We show that L2 or risk-neutral filtering for HMMs can be extracted as a limiting case of the risk-sensitive filtering problem when the risk-sensitive parameter approaches zero.  相似文献   

19.
We provide an analytical comparison between discounted and average reward temporal-difference (TD) learning with linearly parameterized approximations. We first consider the asymptotic behavior of the two algorithms. We show that as the discount factor approaches 1, the value function produced by discounted TD approaches the differential value function generated by average reward TD. We further argue that if the constant function—which is typically used as one of the basis functions in discounted TD—is appropriately scaled, the transient behaviors of the two algorithms are also similar. Our analysis suggests that the computational advantages of average reward TD that have been observed in some prior empirical work may have been caused by inappropriate basis function scaling rather than fundamental differences in problem formulations or algorithms.  相似文献   

20.
Risk-Sensitive Reinforcement Learning   总被引:3,自引:0,他引:3  
Mihatsch  Oliver  Neuneier  Ralph 《Machine Learning》2002,49(2-3):267-290
Most reinforcement learning algorithms optimize the expected return of a Markov Decision Problem. Practice has taught us the lesson that this criterion is not always the most suitable because many applications require robust control strategies which also take into account the variance of the return. Classical control literature provides several techniques to deal with risk-sensitive optimization goals like the so-called worst-case optimality criterion exclusively focusing on risk-avoiding policies or classical risk-sensitive control, which transforms the returns by exponential utility functions. While the first approach is typically too restrictive, the latter suffers from the absence of an obvious way to design a corresponding model-free reinforcement learning algorithm.Our risk-sensitive reinforcement learning algorithm is based on a very different philosophy. Instead of transforming the return of the process, we transform the temporal differences during learning. While our approach reflects important properties of the classical exponential utility framework, we avoid its serious drawbacks for learning. Based on an extended set of optimality equations we are able to formulate risk-sensitive versions of various well-known reinforcement learning algorithms which converge with probability one under the usual conditions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号