首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
The optimal control problem is considered for a system given by the Markov chain with integral constraints. It is shown that the solution to the optimal control problem on the set of all predictable controls satisfies Markov property. This optimal Markov control can be obtained as a solution of the corresponding dual problem (in case if the regularity condition holds) or (in other case) by means of proposed regularization method. The problems arising due to the system nonregularity along with the way to cope with those problems are illustrated by an example of optimal control problem for a single channel queueing system.  相似文献   

3.
Consideration was given to some problems of estimation (filtering and identification) in the observation systems describing the Markov processes with finite state spaces. The transition intensity matrices and the observation plan are random and have unknown distributions of some class. The conditional expectations of the accessible observations of some quadratic functions of the estimate errors are used as the performance criteria. The estimation problems under study lie in constructing estimates minimizing the conditional mean losses corresponding to the least favorable distribution of the “transition intensity matrix-observation plan matrix” pair from the set of permissible distributions. For the corresponding minimax problems, existence of the saddle points was proved, and the form of the corresponding minimax estimates was established.  相似文献   

4.
5.
We consider a problem of time reversal for a nonstationary continuous-time Markov processes with k states. The proposed approach is strongly based on the backward representation of the Gaussian analog of the original Markov process.  相似文献   

6.
For a countable-state Markov decision process we introduce an embedding which produces a finite-state Markov decision process. The finite-state embedded process has the same optimal cost, and moreover, it has the same dynamics as the original process when restricting to the approximating set. The embedded process can be used as an approximation which, being finite, is more convenient for computation and implementation.  相似文献   

7.
Continuous time Markov decision processes (CTMDPs) with a finite state and action space have been considered for a long time. It is known that under fairly general conditions the reward gained over a finite horizon can be maximized by a so-called piecewise constant policy which changes only finitely often in a finite interval. Although this result is available for more than 30 years, numerical analysis approaches to compute the optimal policy and reward are restricted to discretization methods which are known to converge to the true solution if the discretization step goes to zero. In this paper, we present a new method that is based on uniformization of the CTMDP and allows one to compute an ε-optimalε-optimal policy up to a predefined precision in a numerically stable way using adaptive time steps.  相似文献   

8.
Peter 《Performance Evaluation》2005,62(1-4):349-365
A new method to compute bounds on stationary results of finite Markov processes in discrete or continuous time is introduced. The method extends previously published approaches using polyhedra of eigenvectors for stochastic matrices with a known lower and upper bound of their elements. Known techniques compute bounds for the elements of the stationary vector with respect to the lower bounds of the matrix elements and another set of bounds with respect to the upper bounds of matrix elements. The resulting bounds are usually not sharp, if lower and upper bounds for the elements are known. The new approach combines lower and upper bounds resulting in sharp bounds which are often much tighter than bounds computed using only one bounding value for the matrix elements.  相似文献   

9.
10.
The computation of ϵ-optimal policies for continuous time Markov decision processes (CTMDPs) over finite time intervals is a sophisticated problem because the optimal policy may change at arbitrary times. Numerical algorithms based on time discretization or uniformization have been proposed for the computation of optimal policies. The uniformization based algorithm has shown to be more reliable and often also more efficient but is currently only available for processes where the gain or reward does not depend on the decision taken in a state. In this paper, we present two new uniformization based algorithms for computing ϵ-optimal policies for CTMDPs with decision dependent rewards over a finite time horizon. Due to a new and tighter upper bound the newly proposed algorithms cannot only be applied for decision dependent rewards, they also outperform the available approach for rewards that do not depend on the decision. In particular for models where the policy only rarely changes, optimal policies can be computed much faster.  相似文献   

11.
We consider Markov decisions processes with a target set, where criterion function is an expectation of minimum function. We formulate the problem as an infinite horizon case with a recurrent class. We show under some conditions that an optimal value function is a unique solution to an optimality equation and there exists a stationary optimal policy. Also we give a policy improvement method.  相似文献   

12.
13.
We give a randomized algorithm (the “Wedge Algorithm”) of competitiveness for any metrical task system on a uniform space of k points, for any k?2, where , the kth harmonic number. This algorithm has better competitiveness than the Irani-Seiden algorithm if k is smaller than 108. The algorithm is better by a factor of 2 if k<47.  相似文献   

14.
15.
16.
17.
We consider utility-constrained Markov decision processes. The expected utility of the total discounted reward is maximized subject to multiple expected utility constraints. By introducing a corresponding Lagrange function, a saddle-point theorem of the utility constrained optimization is derived. The existence of a constrained optimal policy is characterized by optimal action sets specified with a parametric utility.  相似文献   

18.
19.
20.
In this article we propose a synthesis of recent works concerning a qualitative approach, based on possibility theory, to multi-stage decision under uncertainty. Our framework is a qualitative possibilistic counterpart to Markov decision processes (MDP), for which we propose dynamic programming-like algorithms. The classical MDP algorithms and their possibilistic counterparts are then experimentally compared on a family of benchmark examples. Finally, we also explore the case of partial observability, thus providing qualitative counterparts to the partially observable Markov decision processes framework.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号