首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Adaptive control of finite-state Markov chains is discussed, The optimal performance is characterized through the minimization of a long-run average cost functional, subject to constraints on several other such functionals. Under mild structural and feasibility conditions, two explicit adaptive control policies are exhibited for the case where the transition probabilities are unknown. The policies are optimal under the constrained optimization criterion. They rely on a powerful estimation scheme which provides consistent estimators for the transition probabilities. This scheme is of independent interest, as it provides strong consistency under a large number of adaptive schemes and is independent of any identifiability conditions. As an application, an optimal adaptive policy is derived for a system of K competing queues with countable state space, for which the constrained criteria arise naturally in the context of communication networks  相似文献   

2.
We consider Markov decision processes with denumerable state space and finite control sets; the performance index of a control policy is a long-run expected average cost criterion and the cost function is bounded below. For these models, the existence of average optimal stationary policies was recently established in [11] under very general assumptions. Such a result was obtained via an optimality inequality. Here, we use a simple example to prove that the conditions in [11] do not imply the existence of a solution to the average cost optimality equation.  相似文献   

3.
Consider a controlled Markov chain whose transition probabilities depend upon an unknown parameter α taking values in finite setA. To each α is associated a prespecified stationary control lawphi(alpha). The adaptive control law selects at each timetthe control action indicated byphi(alpha_{t})where αtis the maximum likelihood estimate of α. It is shown that αtconverges to a parameter α*such that the "closed-loop" transition probabilities corresponding to α*andphi(alpha^{ast})are the same as those corresponding to α0andphi(alpha)where α0is the true parameter. The situation when α0does not belong to the model setAis briefly discussed.  相似文献   

4.
This paper deals with discrete-time Markov control processes with Borel state space, allowing unbounded costs and noncompact control sets. For these models, the existence of average optimal stationary policies has been recently established under very general assumptions, using an optimality inequality. Here we give a condition, which is a strengtened version of a variant of the ‘vanishing discount factor’ approach, for the optimality equation to hold.  相似文献   

5.
The computational burden associated with controlling a plant modeled as a Markov chain with a large number of states is addressed by proposing a two-layer feedback control structure. At the lower layer a regulator continuously monitors the plant. When the state of the plant reaches an extreme value, the supervisor at the higher layer intervenes to reset the regulator. It is shown that the plant dynamics and cost originally defined at the lower layer can be "lifted" to the supervisor layer and that the supervisor's control task can be defined in a way that permits wide flexibility in the design of the regulator.  相似文献   

6.
An optimal control problem with constraints is considered on a finite interval for a non-stationary Markov chain with a finite state space. The constraints are given as a set of inequalities. The optimal solution existence is proved under a natural assumption that the set of admissible controls is non-empty. The stochastic control problem is reduced to a deterministic one and it is shown that the optimal solution satisfies the maximum principle, moreover it can be chosen within a class of Markov controls. On the basis of this result an approach to the numerical solution is proposed and its implementation is illustrated by examples.  相似文献   

7.
We give conditions for the existence of average optimal policies for continuous-time controlled Markov chains with a denumerable state-space and Borel action sets. The transition rates are allowed to be unbounded, and the reward/cost rates may have neither upper nor lower bounds. In the spirit of the "drift and monotonicity" conditions for continuous-time Markov processes, we propose a new set of conditions on the controlled process' primitive data under which the existence of optimal (deterministic) stationary policies in the class of randomized Markov policies is proved using the extended generator approach instead of Kolmogorov's forward equation used in the previous literature, and under which the convergence of a policy iteration method is also shown. Moreover, we use a controlled queueing system to show that all of our conditions are satisfied, whereas those in the previous literature fail to hold.  相似文献   

8.
The stochastic model considered is a linear jump diffusion process X for which the coefficients and the jump processes depend on a Markov chain Z with finite state space. First, we study the optimal filtering and control problem for these systems with non-Gaussian initial conditions, given noisy observations of the state X and perfect measurements of Z. We derive a new sufficient condition which ensures the existence and the uniqueness of the solution of the nonlinear stochastic differential equations satisfied by the output of the filter. We study a quadratic control problem and show that the separation principle holds. Next, we investigate an adaptive control problem for a state process X defined by a linear diffusion for which the coefficients depend on a Markov chain, the processes X and Z being observed in independent white noises. Suboptimal estimates for the process X, Z and approximate control law are investigated for a large class of probability distributions of the initial state. Asymptotic properties of these filters and this control law are obtained. Upper bounds for the corresponding error are given  相似文献   

9.
This note deals with continuous-time Markov decision processes with a denumerable state space and the average cost criterion. The transition rates are allowed to be unbounded, and the action set is a Borel space. We give a new set of conditions under which the existence of optimal stationary policies is ensured by using the optimality inequality. Our results are illustrated with a controlled queueing model. Moreover, we use an example to show that our conditions do not imply the existence of a solution to the optimality equations in the previous literature  相似文献   

10.
For a Markovian decision problem in which the transition probabilities are unknown, two learning algorithms are devised from the viewpoint of asymptotic optimality. Each time the algorithms select decisions to be used on the basis of not only the estimates of the unknown probabilities but also uncertainty of them. It is shown that the algorithms are asymptotically optimal in the sense that the probability of selecting an optimal policy converges to unity.  相似文献   

11.
A Markov chain is controlled by a decision maker receiving his observations of the state via a noisy memoriless channel. That information is encoded causally. The encoder is assumed to have perfect channel feedback information.Separation results are derived and used to prove that encoding is useless for a class of symmetric channels.This paper extends the results of the authors (1983) by using methods similar to those of that paper.  相似文献   

12.
This work concerns semi-Markov chains evolving on a finite state space. The system development generates a cost when a transition is announced, as well as a holding cost which is incurred continuously during each sojourn time. It is assumed that these costs are paid by an observer with positive and constant risk-sensitivity, and the overall performance of the system is measured by the corresponding (long-run) risk-sensitive average cost criterion. In this framework, conditions are provided under which the average index does not depend on the initial state and is characterized in terms of a single Poisson equation.  相似文献   

13.
This paper aims to investigate the problem of robust nonfragile guaranteed cost control for uncertain Takagi-Sugeno fuzzy systems with Markov jump parameters, time-varying delay and input constraint. A nonfragile mode-dependent fuzzy controller with mode-dependent average dwell time (MDADT) is designed with input constraint. A sufficient condition is developed to ensure that the resulting closed-loop system is robust almost surely asymptotically stable with guaranteed cost index not exceeding the specified upper bound. Subsequently, the controller gain and upper bound of the guaranteed cost index can be obtained by solving a set of linear matrix inequalities. Finally, numerical and practical examples are provided to demonstrate the performance of the proposed approach.  相似文献   

14.
Stabilization of linear Markov jump systems via adaptive control is considered in this paper. The switching law is assumed to be unobservable Markov process. A sufficient condition is obtained for the stochastic stabilizability based on common quadratic Lyapunov functions (QLFs). The constructive proof provides a method to construct a sampling adaptive stabilizer. An example is used to describe the design of adaptive control, which stabilizes the system.  相似文献   

15.
Adaptive guaranteed cost control of systems with uncertain parameters   总被引:10,自引:0,他引:10  
Guaranteed cost control is a method of synthesizing a closed-loop system in which the controlled plant has large parameter uncertainty. This paper gives the basic theoretical development of guaranteed cost control, and shows how it can be incorporated into an adaptive system. The uncertainty in system parameters is reduced first by either: 1) on-line measurement and evaluation, or 2) prior knowledge on the parametric dependence of a certain easily measured situation parameter. Guaranteed cost control is then used to take up the residual uncertainty. It is shown that the uncertainty in system parameters can be taken care of by an additional term in the Riccati equation. A Fortran program for computing the guaranteed cost matrix and control law is developed and applied to an airframe control problem with large parameter variations.  相似文献   

16.
Modeling genetic algorithms with Markov chains   总被引:12,自引:0,他引:12  
We model a simple genetic algorithm as a Markov chain. Our method is both complete (selection, mutation, and crossover are incorporated into an explicitly given transition matrix) and exact; no special assumptions are made which restrict populations or population trajectories. We also consider the asymptotics of the steady state distributions as population size increases.This research was supported by the National Science Foundation (IRI-8917545).  相似文献   

17.
Pairwise Markov chains   总被引:1,自引:0,他引:1  
We propose a model called a pairwise Markov chain (PMC), which generalizes the classical hidden Markov chain (HMC) model. The generalization, which allows one to model more complex situations, in particular implies that in PMC the hidden process is not necessarily a Markov process. However, PMC allows one to use the classical Bayesian restoration methods like maximum a posteriori (MAP), or maximal posterior mode (MPM). So, akin to HMC, PMC allows one to restore hidden stochastic processes, with numerous applications to signal and image processing, such as speech recognition, image segmentation, and symbol detection or classification, among others. Furthermore, we propose an original method of parameter estimation, which generalizes the classical iterative conditional estimation (ICE) valid for a classical hidden Markov chain model, and whose extension to possibly non-Gaussian and correlated noise is briefly treated. Some preliminary experiments validate the interest of the new model.  相似文献   

18.
We consider the control of a dynamic system modeled as a Markov chain. The transition probability matrix of the Markov chain depends on the controluand also on an unknown parameter α0. The unknown parameter belongs to a given finite setA. The long run average cost depends on the control policy and the unknown parameter. Thus, a direct approach to the optimization of the performance is not feasible. A common procedure calls for an on-line estimation of the unknown parameter and the minimization of the cost functional using the estimate in lieu of the true parameter. It is well known that this "certainty equivalence" (CE) solution may fail to achieve optimal performance, even asymptotically. In this presentation of a new optimization-oriented approach to adaptive control, we consider a composite functional which simultaneously takes care of the estimation and control needs. The global minimum of this composite functional coincides with the minimum of the original cost functional. Thus, its joint minimization with respect to control and parameter estimates would yield the optimal control policy. This joint minimization is not feasible, but it suggests an algorithm that asymptotically achieves the desired goal. The transient behavior of the algorithm, as well as the situation whenalpha^{0}notin Aare also investigated.  相似文献   

19.
The existence of an average cost optimal stationary policy in a countable state Markov decision chain is shown under assumptions weaker than those given by Sennott (1989). This treatment is a modification of that given by Hu (1992), and is related to conditions of Hordijk (1977). An example is given for which the new axiom set holds whereas the axiom set of Sennott (1989) fails to hold.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号