首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper deals with discrete-time Markov control processes with Borel state space, allowing unbounded costs and noncompact control sets. For these models, the existence of average optimal stationary policies has been recently established under very general assumptions, using an optimality inequality. Here we give a condition, which is a strengtened version of a variant of the ‘vanishing discount factor’ approach, for the optimality equation to hold.  相似文献   

2.
Necessary conditions are given for the existence of a bounded solution to the optimality equation arising in Markov decision processes, under a long-run, expected average cost criterion. The relationships of some of our results to known sufficient conditions are also shown.  相似文献   

3.
We consider average reward Markov decision processes with discrete time parameter and denumerable state space. We are concerned with the following problem: Find necessary and sufficient conditions so that, for arbitrary bounded reward function, the corresponding average reward optimality equation has a bounded solution. This problem is solved for a class of systems including the case in which, under the action of any stationary policy, the state space is an irreducible positive recurrent class.  相似文献   

4.
This article deals with multiconstrained continuous-time Markov decision processes in a denumerable state space, with unbounded cost and transition rates. The criterion to be optimised is the long-run expected average cost, and several kinds of constraints are imposed on some associated costs. The existence of a constrained optimal policy is ensured under suitable conditions by using a martingale technique and introducing an occupation measure. Furthermore, for the unichain model, we transform this multiconstrained problem into an equivalent linear programming problem, then construct a constrained optimal policy from an optimal solution to the linear programming. Finally, we use an example of a controlled queueing system to illustrate an application of our results.  相似文献   

5.
根据连续时间马尔可夫决策过程的平均准则, 给出了一种特殊的马尔可夫决策过程-受控排队系统平均最优以及约束最优的新条件. 这个新条件仅使用模型的初始数据, 但利用了生灭过程的遍历性理论. 可以证明受控排队系统存在平均最优平稳策略与约束平均最优策略.  相似文献   

6.
首先分别在折扣代价与平均代价性能准则下,讨论了一类半M arkov决策问题.基于性能势方法,导出了由最优平稳策略所满足的最优性方程.然后讨论了两种模型之间的关系,表明了平均模型的有关结论,可以通过对折扣模型相应结论取折扣因子趋于零时的极限来得到.  相似文献   

7.
This work considers denumerable state Markov decision processes with discrete time parameter. The performance of a control policy is measured by the (lim sup) expected average cost criterion, the action sets are compact metric and the cost function is continuous and bounded. Within this framework, necessary and sufficient conditions are given so that the vanishing interest rate (VIR) method — also known as the vanishing discount effect approach — yields an average optimal stationary policy.  相似文献   

8.
We show the existence of average cost optimal stationary policies for Markov control processes with Borel state space and unbounded costs per stage, under a set of assumptions recently introduced by L.I. Sennott (1989) for control processes with countable state space and finite control sets.  相似文献   

9.
This work is concerned with controlled Markov chains with bounded costs. Assuming that the transition probabilities satisfy a simultaneous Doeblin condition, it is shown that Schweitzer’s transformation on the transition law yields a strong ergodicity condition that implies that the solution to the average cost optimality equation can be approximated, at a geometric rate, via the value iteration scheme.  相似文献   

10.
The existence of an average cost optimal stationary policy in a countable state Markov decision chain is shown under assumptions weaker than those given by Sennott (1989). This treatment is a modification of that given by Hu (1992), and is related to conditions of Hordijk (1977). An example is given for which the new axiom set holds whereas the axiom set of Sennott (1989) fails to hold.  相似文献   

11.
This work concerns average Markov decision chains with denumerable state space. Assuming that the Lyapunov function condition holds, it is shown that the value iteration scheme yields convergent approximations to the solution of the average cost optimality equation. This result is obtained using a particular implementation of the value iteration procedure involving an artificial control action under which the system remains static.  相似文献   

12.
Usual conditions for existence of stationary average optimal policies in denumerable MDPs with general bounded rewards are shown to be also sufficient for strong 1-optimality. Moreover, we prove that all limit points of discounted optimal stationary policies when the discount factor goes to 1 are strong 1-optimal.  相似文献   

13.
This paper considers a Markov decision process in Borel state and action spaces with the aggregated (or say iterated) coherent risk measure to be minimised. For this problem, we establish the Bellman optimality equation as well as the value and policy iteration algorithms, and show the existence of a deterministic stationary optimal policy. The cost function, while being allowed to be unbounded from below (in the sense that its negative part needs be bounded by some nonnegative real-valued possibly unbounded weight function), can be arbitrarily unbounded from above and possibly infinitely valued.  相似文献   

14.
We consider Markov decisions processes with a target set, where criterion function is an expectation of minimum function. We formulate the problem as an infinite horizon case with a recurrent class. We show under some conditions that an optimal value function is a unique solution to an optimality equation and there exists a stationary optimal policy. Also we give a policy improvement method.  相似文献   

15.
This note deals with continuous-time Markov decision processes with a denumerable state space and the average cost criterion. The transition rates are allowed to be unbounded, and the action set is a Borel space. We give a new set of conditions under which the existence of optimal stationary policies is ensured by using the optimality inequality. Our results are illustrated with a controlled queueing model. Moreover, we use an example to show that our conditions do not imply the existence of a solution to the optimality equations in the previous literature  相似文献   

16.
This work is concerned with controlled Markov chains with bounded costs. Assuming that the transition probabilities satisfy a simultaneous Doeblin condition, it is shown that Schweitzer’s transformation on the transition law yields a strong ergodicity condition that implies that the solution to the average cost optimality equation can be approximated, at a geometric rate, via the value iteration scheme.  相似文献   

17.
18.
The paper discusses the robustness of discrete-time Markov control processes whose transition probabilities are known up to certain degree of accuracy. Upper bounds of increase of a discounted cost are derived when using an optimal control policy of the approximating process in order to control the original one. Bounds are given in terms of weighted total variation distance between transition probabilities. They hold for processes on Borel spaces with unbounded one-stage costs functions.  相似文献   

19.
This paper studies the problem of the existence of stationary optimal policies for finite state controlled Markov chains, with compact action space and imperfect observations, under the long-run average cost criterion. It presents sufficient conditions for existence of solutions to the associated dynamic programming equation, that strengthen past results. There is a detailed discussion comparing the different assumptions commonly found in the literature.  相似文献   

20.
使用马氏决策过程研究了概率离散事件系统的最优控制问题.首先,通过引入费用函数、目标函数以及最优函数的定义,建立了可以确定最优监控器的最优方程.之后,又通过此最优方程获得了给定语言的极大可控、∈-包含闭语言.最后给出了获得最优费用与最优监控器的算法.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号