首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 1 毫秒
1.
2.
3.
In this article we propose a synthesis of recent works concerning a qualitative approach, based on possibility theory, to multi-stage decision under uncertainty. Our framework is a qualitative possibilistic counterpart to Markov decision processes (MDP), for which we propose dynamic programming-like algorithms. The classical MDP algorithms and their possibilistic counterparts are then experimentally compared on a family of benchmark examples. Finally, we also explore the case of partial observability, thus providing qualitative counterparts to the partially observable Markov decision processes framework.  相似文献   

4.
5.
Desirable properties of the infinite histories of a finite-state Markov decision process are specified in terms of a finite number of events represented as ω-regular sets. An infinite history of the process produces a reward which depends on the properties it satisfies. The authors investigate the existence of optimal policies and provide algorithms for the construction of such policies  相似文献   

6.
Target-sensitive control of Markov and semi-Markov processes   总被引:1,自引:0,他引:1  
We develop the theory for Markov and semi-Markov control using dynamic programming and reinforcement learning in which a form of semi-variance which computes the variability of rewards below a pre-specified target is penalized. The objective is to optimize a function of the rewards and risk where risk is penalized. Penalizing variance, which is popular in the literature, has some drawbacks that can be avoided with semi-variance.  相似文献   

7.
This paper proposes a simple analytical model called M time scale Markov decision process (MMDPs) for hierarchically structured sequential decision making processes, where decisions in each level in the M-level hierarchy are made in M different discrete time scales. In this model, the state-space and the control-space of each level in the hierarchy are nonoverlapping with those of the other levels, respectively, and the hierarchy is structured in a "pyramid" sense such that a decision made at level m (slower time scale) state and/or the state will affect the evolutionary decision making process of the lower level m+1 (faster time scale) until a new decision is made at the higher level but the lower level decisions themselves do not affect the transition dynamics of higher levels. The performance produced by the lower level decisions will affect the higher level decisions. A hierarchical objective function is defined such that the finite-horizon value of following a (nonstationary) policy at level m+1 over a decision epoch of level m plus an immediate reward at level m is the single-step reward for the decision making process at level m. From this we define "multi-level optimal value function" and derive "multi-level optimality equation." We discuss how to solve MMDPs exactly and study some approximation methods, along with heuristic sampling-based schemes, to solve MMDPs.  相似文献   

8.
While complaints about typical edge operators are common, proposals articulating a notion of the "perfect" edge map are comparatively rare, hindering the improvement of contour enhancement techniques. To address this situation, we suggest that one objective of visual contour computation is the estimation of a clean sketch from a corrupted rendition, the latter modeling noisy and low contrast edge or line operator responses to an image. Our formal model of this clean sketch is the curve indicator random field (CIRF), whose role is to provide a basis for defining edge likelihood models by eliminating the parameter along each curve to create an image of curves. For curves modeled with stationary Markov processes, this ideal edge prior is non-Gaussian and its moment generating functional has a form closely related to the Feynman-Kac formula. This sketch model leads to a nonlinear, minimum mean squared error contour enhancement filter that requires the solution of two elliptic partial differential equations. The framework is also independent of the order of the contour model, allowing us to introduce a Markov process model for contour curvature. We analyze the distribution of such curves and show that its mode is the Euler spiral, a curve minimizing changes in curvature. Example computations using the contour enhancement filter with the curvature-based contour model are provided, highlighting how the filter is curvature-selective even when curvature is absent in the input.  相似文献   

9.
We show that it is possible to learn the forces causing an observed two-dimensional stochastic Markov process. Hereby, we extend the ideas presented in our earlier work [1–3], where we discussed one-dimensional processes. Appropriate short-time correlation function measurements are used as constraints in the maximum information principle of Jaynes, allowing us to formulate the joint probability distribution function of the process. This is done using the method of Lagrange multipliers, which we determine by means of a dynamical learning method. Next, we derive explicit formulas expressing the drift- and diffusion coefficients of the Ito-Langevin equation corresponding to the process in terms of the Lagrange multipliers. This provides us with the sought for underlying deterministic and stochastic dynamics. The method was tested on a simulated Ornstein-Uhlenbeck process, showing good confirmation of the theory.  相似文献   

10.
Simulation-based optimization of Markov reward processes   总被引:4,自引:0,他引:4  
This paper proposes a simulation-based algorithm for optimizing the average reward in a finite-state Markov reward process that depends on a set of parameters. As a special case, the method applies to Markov decision processes where optimization takes place within a parametrized set of policies. The algorithm relies on the regenerative structure of finite-state Markov processes, involves the simulation of a single sample path, and can be implemented online. A convergence result (with probability 1) is provided  相似文献   

11.
The notion of a successful coupling of Markov processes, based on the idea that both components of a coupled system “intersect” in finite time with probability 1, is extended to cover situations where the coupling is not necessarily Markovian and its components only converge (in a certain sense) to each other with time. Under these assumptions the unique ergodicity of the original Markov process is proved. The price for this generalization is the weak convergence to the unique invariant measure instead of the strong convergence. Applying these ideas to infinite interacting particle systems, we consider even more involved situations where the unique ergodicity can be proved only for a restriction of the original system to a certain class of initial distributions (e.g., translation-invariant). Questions about the existence of invariant measures with a given particle density are also discussed.  相似文献   

12.
In control systems theory, the Markov decision process (MDP) is a widely used optimization model involving selection of the optimal action in each state visited by a discrete-event system driven by Markov chains. The classical MDP model is suitable for an agent/decision-maker interested in maximizing expected revenues, but does not account for minimizing variability in the revenues. An MDP model in which the agent can maximize the revenues while simultaneously controlling the variance in the revenues is proposed. This work is rooted in machine learning/neural network concepts, where updating is based on system feedback and step sizes. First, a Bellman equation for the problem is proposed. Thereafter, convergent dynamic programming and reinforcement learning techniques for solving the MDP are provided along with encouraging numerical results on a small MDP and a preventive maintenance problem.  相似文献   

13.
Opacity is a generic security property, that has been defined on (non-probabilistic) transition systems and later on Markov chains with labels. For a secret predicate, given as a subset of runs, and a function describing the view of an external observer, the value of interest for opacity is a measure of the set of runs disclosing the secret. We extend this definition to the richer framework of Markov decision processes, where non-deterministic choice is combined with probabilistic transitions, and we study related decidability problems with partial or complete observation hypotheses for the schedulers. We prove that all questions are decidable with complete observation and ω-regular secrets. With partial observation, we prove that all quantitative questions are undecidable but the question whether a system is almost surely non-opaque becomes decidable for a restricted class of ω-regular secrets, as well as for all ω-regular secrets under finite-memory schedulers.  相似文献   

14.
15.
This paper treats the problem of optimal control of finite-state Markov processes observed in noise. Two types of noisy observations are considered: additive white Gaussian noise and jump-type observations. Sufficient conditions for the optimality of a control law are obtained similar to the stochastic Hamilton-Jacobi equation for perfectly observed Markov processes. An illustrative example concludes the paper.  相似文献   

16.
17.
18.
19.
Hierarchical algorithms for Markov decision processes have been proved to be useful for the problem domains with multiple subtasks. Although the existing hierarchical approaches are strong in task decomposition, they are weak in task abstraction, which is more important for task analysis and modeling. In this paper, we propose a task-oriented design to strengthen the task abstraction. Our approach learns an episodic task model from the problem domain, with which the planner obtains the same control effect, with concise structure and much improved performance than the original model. According to our analysis and experimental evaluation, our approach has better performance than the existing hierarchical algorithms, such as MAXQ and HEXQ.  相似文献   

20.
This communique presents an algorithm called “value set iteration” (VSI) for solving infinite horizon discounted Markov decision processes with finite state and action spaces as a simple generalization of value iteration (VI) and as a counterpart to Chang’s policy set iteration. A sequence of value functions is generated by VSI based on manipulating a set of value functions at each iteration and it converges to the optimal value function. VSI preserves convergence properties of VI while converging no slower than VI and in particular, if the set used in VSI contains the value functions of independently generated sample-policies from a given distribution and a properly defined policy switching policy, a probabilistic exponential convergence rate of VSI can be established. Because the set used in VSI can contain the value functions of any policies generated by other existing algorithms, VSI is also a general framework of combining multiple solution methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号