共查询到20条相似文献,搜索用时 15 毫秒
1.
In this paper we study the average cost criterion induced by the regular utility function (U-average cost criterion) for continuous-time Markov decision processes. This criterion is a generalization of the risk-sensitive average cost and expected average cost criteria. We first introduce an auxiliary risk-sensitive first passage optimization problem and obtain the properties of the corresponding optimal value function under the slight conditions. Then we show that the pair of the optimal value functions of the risk-sensitive average cost criterion and the risk-sensitive first passage criterion is a solution to the optimality equation of the risk-sensitive average cost criterion allowing the risk-sensitivity parameter to take any nonzero value. Moreover, we have that the optimal value function of the risk-sensitive average cost criterion is continuous with respect to the risk-sensitivity parameter. Finally, we give the connections between the U-average cost criterion and the average cost criteria induced by the identity function and the exponential utility function, and prove the existence of a U-average optimal deterministic stationary policy in the class of all randomized Markov policies. 相似文献
2.
An M/G/1 queue where the server may take repeated vacations is considered. Whenever a busy period terminates, the server takes a vacation of random duration. At the end of each vacation, the server may either take a new vacation or resume service; if the queue is found empty, the server always takes a new vacation. The cost structure includes a holding cost per unit of time and per customer in the system and a cost each time the server is turned on. One discounted cost criterion and two average cost criteria are investigated. It is shown that the vacation policy that minimizes the discounted cost criterion over all policies (randomized, history dependent, etc.) converges to a threshold policy as the discount factor goes to zero. This result relies on a nonstandard use of the value iteration algorithm of dynamic programming and is used to prove that both average cost problems are minimized by a threshold policy 相似文献
3.
Constrained Markov decision problems (CMDPs) with the average cost criterion and a single ergodic chain, or the discounted cost with a general multichain structure, are considered. Conditions for stability of the optimal value and control to changes of the parameters of the problem, such as immediate costs, transition probabilities, and the discount factor, are established. Singular constrained problems, for which the optimal value and controls exhibit discontinuities, are studied 相似文献
4.
《Systems & Control Letters》2007,56(11-12):663-668
According to Assaf, a dynamic programming problem is called invariant if its transition mechanism depends only on the chosen action. This paper studies properties of risk-sensitive invariant problems with a general state space. The main result establishes the optimality equation for the risk-sensitive average cost criterion without any restrictions on the risk factor. Moreover, a practical algorithm is provided for solving the optimality equation in case of a finite action space. 相似文献
5.
Paul Dupuis Matthew R. James Ian Petersen 《Mathematics of Control, Signals, and Systems (MCSS)》2000,13(4):318-332
The purpose of this paper is to characterize and prove robustness properties of risk-sensitive controllers precisely. In
particular, we establish a stochastic version of the small gain theorem. This theorem is expressed in terms of an inequality
which bounds the average output power in terms of the input power. Since this inequality is closely related to the risk-sensitive
criterion, our stochastic small gain theorem can be expressed in terms of the risk-sensitive criterion. This provides a concrete
motivation for the use of the risk-sensitive criterion stochastic robustness.
Date received: September 10, 1998. Date revised: June 5, 2000. 相似文献
6.
The problem considered is one of simultaneously identifying an unknown system while adequately controlling it. The system can be any fairly general discrete-time system and the cost criterion can be either of a discounted type or of a long-term average type, the chief restriction being that the unknown parameter lies in a finite parameter set. For a previously introduced scheme of identification and control based on "biased" maximum likelihood estimates, it is shown that 1) every Cesaro-limit point of the parameter estimates is "closed-loop equivalent" to the unknown parameter; 2) for both the discounted and long-term average cost criteria, the adaptive control law Cesaro-converges to the set of optimal control laws; and 3) in the case of the long-term average cost criterion, the actual cost incurred by the use of the adaptive controller is optimal and cannot be bettered even if one knew the value of the unknown parameter at the start. 相似文献
7.
Modified age replacement policies, where a system is not always as good as new after each maintenance, are discussed. The expected total discounted cost for an infinite time-span, introducing some costs and a continuous type discount rate is applied as a criterion of optimality, and the optimum policies minimizing this cost are obtained. It is shown that, under certain conditions, there exists a finite and unique optimum policy. Relations to the existing publications are also shown. 相似文献
8.
9.
Finite-dimensional optimal risk-sensitive filters and smoothers are obtained for discrete-time nonlinear systems by adjusting the standard exponential of a quadratic risk-sensitive cost index to one involving the plant nonlinearity. It is seen that these filters and smoothers are the same as those for a fictitious linear plant with the exponential of squared estimation error as the corresponding risk-sensitive cost index. Such finite-dimensional filters do not exist for nonlinear systems in the case of minimum variance filtering and control 相似文献
10.
Rolando Cavazos-Cadena 《Discrete Event Dynamic Systems》2016,26(4):633-656
This work concerns semi-Markov chains evolving on a finite state space. The system development generates a cost when a transition is announced, as well as a holding cost which is incurred continuously during each sojourn time. It is assumed that these costs are paid by an observer with positive and constant risk-sensitivity, and the overall performance of the system is measured by the corresponding (long-run) risk-sensitive average cost criterion. In this framework, conditions are provided under which the average index does not depend on the initial state and is characterized in terms of a single Poisson equation. 相似文献
11.
Dynamic games in which each player has an exponential cost criterion are referred to as risk-sensitive dynamic games. In this note, Nash equilibria are considered for such games. Feedback risk-sensitive Nash equilibrium solutions are derived for two-person discrete time linear-quadratic nonzero-sum games, both under complete state observation and shared partial observation 相似文献
12.
Output feedback control design for strict-feedback stochastic nonlinear systems under a risk-sensitive cost 总被引:3,自引:0,他引:3
Yungang Liu Zigang Pan Songjiao Shi 《Automatic Control, IEEE Transactions on》2003,48(3):509-513
In this note, we study the problem of output-feedback control design for a class of strict feedback stochastic nonlinear systems. Under an infinite-horizon risk-sensitive cost criterion, the controller designed can guarantee an arbitrary small long-term average cost for arbitrary risk-sensitivity parameter and achieve boundedness in probability for the closed-loop system, using the integrator backstepping methodology. Furthermore, the controller preserves the equilibrium at the origin of the nonlinear system. 相似文献
13.
14.
A new approach to study ergodicity of filtering processes is presented. It is based on the vanishing discount approach to discounted functional of filtering process. We show that limit superior of the Cesaro averages of the functionals is the same for all initial conditions from which the uniqueness of invariant measures of filtering processes follows. The approach is based on certain assumption for which we provide a sufficient condition using concavity arguments. In addition we show the existence of solutions to the Poisson equation corresponding to filtering process with concave functional. The assumptions are then extended to the controlled case and using similar concave arguments we obtain the existence of solutions to the Bellman equation corresponding to partially observed average cost per unit time problem. 相似文献
15.
H. E. Psillakis 《International journal of control》2013,86(2):107-118
The tracking control problem for a class of stochastic and uncertain non-linear systems is addressed. The proposed controller uses suitable radial basis function neural network designs for the approximation of the unknown non-linearities while it is arbitrarily regulated in order to effectively penalize the tracking error. This regulation is implemented through a risk-sensitivity parameter. A stability analysis based on Lyapunov functions obtained by the backstepping technique, proves that all the error variables are bounded in probability; simultaneously, for any given risk-sensitivity parameter the system performance is regulated with respect to both a desired small average tracking error and low long-term average cost in accordance to a risk-sensitive cost criterion. Moreover, the larger this parameter is, the mean square tracking error becomes semiglobally uniformly ultimately bounded in a smaller area while a lower level of a long-term average cost is achieved. The effectiveness of the design approach is illustrated by simulation results wherein it becomes clear how one can achieve a tradeoff between good response and control effort. 相似文献
16.
Markov decision processes (MDP) with discounted cost are equivalent to processes with a finite-random duration, and, hence, the discount factor models a (random) time horizon for the life of the process. We elaborate on this idea, but show that an objective function which is a linear combination of several discounted costs (each with a different discount factor) does not, in general, model processes with several time scales, but rather processes with partial information 相似文献
17.
18.
In this paper, we address the problem of risk-sensitive filtering and smoothing for discrete-time Hidden Markov Models (HMM) with finite-discrete states. The objective of risk-sensitive filtering is to minimise the expectation of the exponential of the squared estimation error weighted by a risk-sensitive parameter. We use the so-called Reference Probability Method in solving this problem. We achieve finite-dimensional linear recursions in the information state, and thereby the state estimate that minimises the risk-sensitive cost index. Also, fixed-interval smoothing results are derived. We show that L2 or risk-neutral filtering for HMMs can be extracted as a limiting case of the risk-sensitive filtering problem when the risk-sensitive parameter approaches zero. 相似文献
19.
We provide an analytical comparison between discounted and average reward temporal-difference (TD) learning with linearly parameterized approximations. We first consider the asymptotic behavior of the two algorithms. We show that as the discount factor approaches 1, the value function produced by discounted TD approaches the differential value function generated by average reward TD. We further argue that if the constant function—which is typically used as one of the basis functions in discounted TD—is appropriately scaled, the transient behaviors of the two algorithms are also similar. Our analysis suggests that the computational advantages of average reward TD that have been observed in some prior empirical work may have been caused by inappropriate basis function scaling rather than fundamental differences in problem formulations or algorithms. 相似文献
20.
Risk-Sensitive Reinforcement Learning 总被引:3,自引:0,他引:3
Most reinforcement learning algorithms optimize the expected return of a Markov Decision Problem. Practice has taught us the lesson that this criterion is not always the most suitable because many applications require robust control strategies which also take into account the variance of the return. Classical control literature provides several techniques to deal with risk-sensitive optimization goals like the so-called worst-case optimality criterion exclusively focusing on risk-avoiding policies or classical risk-sensitive control, which transforms the returns by exponential utility functions. While the first approach is typically too restrictive, the latter suffers from the absence of an obvious way to design a corresponding model-free reinforcement learning algorithm.Our risk-sensitive reinforcement learning algorithm is based on a very different philosophy. Instead of transforming the return of the process, we transform the temporal differences during learning. While our approach reflects important properties of the classical exponential utility framework, we avoid its serious drawbacks for learning. Based on an extended set of optimality equations we are able to formulate risk-sensitive versions of various well-known reinforcement learning algorithms which converge with probability one under the usual conditions. 相似文献