首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Optimization algorithms are studied for a class of semi-Markov control processes (SMCPs) with compact action set. Both the long-run average and discounted cost problems are considered. Formulas of performance potentials and optimality equations for SMCPs are derived. A policy iteration algorithm and a value iteration algorithm are proposed, which can lead to an optimal or suboptimal stationary policy in a finite number of iterations. The convergence of these algorithms is established, without the assumption of the corresponding iteration operator being a span-contraction. In addition, an online policy iteration optimization algorithm based on a single sample path is provided to escape ‘curse of dimensionality’. In the end, a numerical example is provided to illustrate the application of the algorithms.  相似文献   

2.
Solution techniques for Markov decision problems rely on exact knowledge of the transition rates, which may be difficult or impossible to obtain. In this paper, we consider Markov decision problems with uncertain transition rates represented as compact sets. We first consider the problem of sensitivity analysis where the aim is to quantify the range of uncertainty of the average per‐unit‐time reward given the range of uncertainty of the transition rates. We then develop solution techniques for the problem of obtaining the max‐min optimal policy, which maximizes the worst‐case average per‐unit‐time reward. In each of these problems, we distinguish between systems that can have their transition rates chosen independently and those where the transition rates depend on each other. Our solution techniques are applicable to Markov decision processes with fixed but unknown transition rates and to those with time‐varying transition rates.  相似文献   

3.
具有不确定性路径概率的闭排队网络鲁棒控制策略   总被引:1,自引:0,他引:1  
The paper is concerned with the robust control problems for exponential controlled closed queuing networks (CCQNs) under uncertain routing probabilities. As the rows of some parameter matrices such as infinitesimal generators may be dependent, we first transform the objective vector under discounted-cost criteria into a weighed-average cost. Through the solution to Poisson equation, i.e., Markov performance potentials, we then unify both discounted-cost and average-cost problems to study, and derive the gradient formula of the new objective function with respect to the routing probabilities. Some solution techniques are related for searching the optimal robust control policy. Finally, a numerical example is presented and analyzed.  相似文献   

4.
The paper is concerned with the robust control problems for exponential controlled closed queuing networks (CCQNs) under uncertain routing probabilities.As the rows of some parameter matrices such as infinitesimal generators may be dependent,we first transform the objective vector under discounted-cost criteria into a weighed-average cost.Through the solution to Poisson equation, i.e.,Markov performance potentials,we then unify both discounted-cost and average-cost problems to study,and derive the gradient formula of the new objective function with respect to the routing probabilities.Some solution techniques are related for searching the optimal robust control policy. Finally,a numerical example is presented and analyzed.  相似文献   

5.
基于性能势理论和等价Markov过程方法,研究了一类半Markov决策过程(SMDP)在参数化随机平稳策略下的仿真优化算法,并简要分析了算法的收敛性.通过SMDP的等价Markov过程,定义了一个一致化Markov链,然后根据该一致化Markov链的单个样本轨道来估计SMDP的平均代价性能指标关于策略参数的梯度,以寻找最优(或次优)策略.文中给出的算法是利用神经元网络来逼近参数化随机平稳策略,以节省计算机内存,避免了“维数灾”问题,适合于解决大状态空间系统的性能优化问题.最后给出了一个仿真实例来说明算法的应用.  相似文献   

6.
This paper provides an introductory discussion for an important concept, the performance potentials of Markov processes, and its relations with perturbation analysis (PA), average-cost Markov decision processes (MDP), Poisson equations, -potentials, the fundamental matrix, and the group inverse of the transition matrix (or the infinitesimal generators). Applications to single sample path-based performance sensitivity estimation and performance optimization are also discussed. On-line algorithms for performance sensitivity estimates and on-line schemes for policy iteration methods are presented. The approach is closely related to reinforcement learning algorithms.  相似文献   

7.
The paper discusses the robustness of discrete-time Markov control processes whose transition probabilities are known up to certain degree of accuracy. Upper bounds of increase of a discounted cost are derived when using an optimal control policy of the approximating process in order to control the original one. Bounds are given in terms of weighted total variation distance between transition probabilities. They hold for processes on Borel spaces with unbounded one-stage costs functions.  相似文献   

8.
平均和折扣准则MDP基于TD(0)学习的统一NDP方法   总被引:3,自引:0,他引:3  
为适应实际大规模M arkov系统的需要,讨论M arkov决策过程(MDP)基于仿真的学习优化问题.根据定义式,建立性能势在平均和折扣性能准则下统一的即时差分公式,并利用一个神经元网络来表示性能势的估计值,导出参数TD(0)学习公式和算法,进行逼近策略评估;然后,根据性能势的逼近值,通过逼近策略迭代来实现两种准则下统一的神经元动态规划(neuro-dynam ic programm ing,NDP)优化方法.研究结果适用于半M arkov决策过程,并通过一个数值例子,说明了文中的神经元策略迭代算法对两种准则都适用,验证了平均问题是折扣问题当折扣因子趋近于零时的极限情况.  相似文献   

9.
闻继伟  刘飞 《控制与决策》2010,25(6):916-920
针对一类受限不确定离散Markov跳变系统,提出一种反馈预测控制器设计方法.为便于工程应用,该方法考虑了各模态下的动态系统参数存在多胞不确定性,以及各模态间的跳变转移概率部分未知的情形.通过优化无穷时域的二次型性能指标来确定预测控制器及其对应的椭圆不变集,控制器保证了闭环系统鲁棒均方稳定.同时,用所求得的控制增益在线构造了一组渐近稳定的多面体不变集,在一定程度上扩大了系统状态可行集的范围.数值示例验证了所提方法的可行性和有效性.  相似文献   

10.
Markov控制过程基于单个样本轨道的在线优化算法   总被引:3,自引:1,他引:3  
在Markov性能势理论基础上, 研究了Markov控制过程的性能优化算法. 不同于传统的基于计算的方法, 文中的算法是根据单个样本轨道的仿真来估计性能指标关于策略参数的梯度, 以寻找最优 (或次优 )随机平稳策略. 由于可根据不同实际系统的特征来选择适当的算法参数, 因此它能满足不同实际工程系统在线优化的需要. 最后简要分析了这些算法在一个无限长的样本轨道上以概率 1的收敛性, 并给出了一个三 状态受控Markov过程的数值实例.  相似文献   

11.
Cao's work shows that, by defining an α-dependent equivalent infinitesimal generator A α, a semi-Markov decision process (SMDP) with both average- and discounted-cost criteria can be treated as an α-equivalent Markov decision process (MDP), and the performance potential theory can also be developed for SMDPs. In this work, we focus on establishing error bounds for potential and A α-based iterative optimization methods. First, we introduce an α-uniformized Markov chain (UMC) for a SMDP via A α and a uniformized parameter, and show their relations. Especially, we obtain that their performance potentials, as solutions of corresponding Poisson equations, are proportional, so that the studies of a SMDP and the α-UMC based on potentials are unified. Using these relations, we derive the error bounds for a potential-based policy-iteration algorithm and a value-iteration algorithm, respectively, when there exist various calculation errors. The obtained results can be applied directly to the special models, i.e., continuous-time MDPs and Markov chains, and can be extended to some simulation-based optimization methods such as reinforcement learning and neuro-dynamic programming, where estimation errors or approximation errors are common cases. Finally, we give an application example on the look-ahead control of a conveyor-serviced production station (CSPS), and show the corresponding error bounds.  相似文献   

12.
Performance potentials play a crucial role in performance sensitivity analysis and policy iteration of Markov decision processes. The potentials can be estimated on a single sample path of a Markov process. In this paper, we propose two potential-based online policy iteration algorithms for performance optimization of Markov systems. The algorithms are based on online estimation of potentials and stochastic approximation. We prove that with these two algorithms the optimal policy can be attained after a finite number of iterations. A simulation example is given to illustrate the main ideas and the convergence rates of the algorithms.  相似文献   

13.
Semi-Markov decision problems and performance sensitivity analysis   总被引:1,自引:0,他引:1  
Recent research indicates that Markov decision processes (MDPs) can be viewed from a sensitivity point of view; and the perturbation analysis (PA), MDPs, and reinforcement learning (RL) are three closely related areas in optimization of discrete-event dynamic systems that can be modeled as Markov processes. The goal of this paper is two-fold. First, we develop the PA theory for semi-Markov processes (SMPs); and then we extend the aforementioned results about the relation among PA, MDP, and RL to SMPs. In particular, we show that performance sensitivity formulas and policy iteration algorithms of semi-Markov decision processes can be derived based on the performance potential and realization matrix. Both the long-run average and discounted-cost problems are considered. This approach provides a unified framework for both problems, and the long-run average problem corresponds to the discounted factor being zero. The results indicate that performance sensitivities and optimization depend only on first-order statistics. Single sample path-based implementations are discussed.  相似文献   

14.
Adaptive control of finite-state Markov chains is discussed, The optimal performance is characterized through the minimization of a long-run average cost functional, subject to constraints on several other such functionals. Under mild structural and feasibility conditions, two explicit adaptive control policies are exhibited for the case where the transition probabilities are unknown. The policies are optimal under the constrained optimization criterion. They rely on a powerful estimation scheme which provides consistent estimators for the transition probabilities. This scheme is of independent interest, as it provides strong consistency under a large number of adaptive schemes and is independent of any identifiability conditions. As an application, an optimal adaptive policy is derived for a system of K competing queues with countable state space, for which the constrained criteria arise naturally in the context of communication networks  相似文献   

15.
Basic Ideas for Event-Based Optimization of Markov Systems   总被引:5,自引:0,他引:5  
The goal of this paper is two-fold: First, we present a sensitivity point of view on the optimization of Markov systems. We show that Markov decision processes (MDPs) and the policy-gradient approach, or perturbation analysis (PA), can be derived easily from two fundamental sensitivity formulas, and such formulas can be flexibly constructed, by first principles, with performance potentials as building blocks. Second, with this sensitivity view we propose an event-based optimization approach, including the event-based sensitivity analysis and event-based policy iteration. This approach utilizes the special feature of a system characterized by events and illustrates how the potentials can be aggregated using the special feature and how the aggregated potential can be used in policy iteration. Compared with the traditional MDP approach, the event-based approach has its advantages: the number of aggregated potentials may scale to the system size despite that the number of states grows exponentially in the system size, this reduces the policy space and saves computation; the approach does not require actions at different states to be independent; and it utilizes the special feature of a system and does not need to know the exact transition probability matrix. The main ideas of the approach are illustrated by an admission control problem.Supported in part by a grant from Hong Kong UGC.  相似文献   

16.
High-level semi-Markov modelling paradigms such as semi-Markov stochastic Petri nets and process algebras are used to capture realistic performance models of computer and communication systems but often have the drawback of generating huge underlying semi-Markov processes. Extraction of performance measures such as steady-state probabilities and passage-time distributions therefore relies on sparse matrix-vector operations involving very large transition matrices. Previous studies have shown that exact state-by-state aggregation of semi-Markov processes can be applied to reduce the number of states. This can, however, lead to a dramatic increase in matrix density caused by the creation of additional transitions between remaining states. Our paper addresses this issue by presenting the concept of state space partitioning for aggregation.We present a new deterministic partitioning method which we term barrier partitioning. We show that barrier partitioning is capable of splitting very large semi-Markov models into a number of partitions such that first passage-time analysis can be performed more quickly and using up to 99% less memory than existing algorithms.  相似文献   

17.
状态相关闭排队网络中的性能指标灵敏度公式   总被引:12,自引:3,他引:9  
本文通过研究一类Markov过程在无穷小矩阵的参数摄动下稳态性能指标的灵敏度,运用无穷小矩阵的群逆,实现矩阵和势能这三个描述稳定性能指标灵敏度的等价量,给出了状态相关闭排队网络在参数摄动下的稳态性能指标灵敏度公式,这些结果可直接用于排队网络的控制和优化。  相似文献   

18.
In this paper, a feedback model predictive control method is presented to tackle control problems with constrained multivariables for uncertain discrete‐time nonlinear Markovian jump systems. An uncertain Markovian jump fuzzy system (MJFS) is obtained by employing the Takagi‐Sugeno (T‐S) fuzzy model to represent a discrete‐time nonlinear system with norm bounded uncertainties and Markovain jump parameters. To achieve more generality, the transition probabilities of the Markov chain are assumed to be partly unknown and partly accessible. The predictive formulation adopts an on‐line optimization paradigm that utilizes the closed‐loop state feedback controller and is solved using the standard semi‐definite programming (SDP). To reduce the on‐line computational burden, a mode independent control move is calculated at every sampling time based on a stochastic fuzzy Lyapunov function (FLF) and a parallel distributed compensation (PDC) scheme. The robust mean square stability, performance minimization and constraint satisfaction properties are guaranteed under the control move for all admissible uncertainties. A numerical example is given to show the efficiency of the developed approach. Copyright © 2011 John Wiley and Sons Asia Pte Ltd and Chinese Automatic Control Society  相似文献   

19.
We consider the look-ahead control of a conveyor-serviced production station (CSPS) in the context of semi-Markov decision process (SMDP) model, and our goal is to find an optimal control policy under either average- or discounted-cost criteria. Policy iteration (PI), combined with the concept of performance potential, can be applied to provide a unified optimisation framework for both criteria. However, a major difficulty arises in the exact solution scheme, that is, it requires not only the full knowledge of model parameters, but also a considerable amount of work to obtain and process the necessary system and performance matrices. To overcome this difficulty, we propose a potential-based online PI algorithm in this article. During implementation, by analysing and utilising the historic information of all the past operation of a practical CSPS system, the potentials and state-action values are learned on line through an effective exploration scheme. We finally illustrate the successful application of this learning-based technique in CSPS systems by an example.  相似文献   

20.
研究一类具有数据包丢失及状态转移概率部分未知的网络控制系统随机稳定性及H∞控制问题.传感器与控制器之间、控制器与执行器之间存在数据包丢失的网络控制系统被建模成具有4个子系统的跳变系统,4个子系统之间的跳变遵行Markov跳变过程,并具有部分未知的跳变概率.利用Lyapunov稳定性定理及线性矩阵不等式的求解方法得到该类系统随机稳定的充分条件,并给出了相应的∞状态反馈控制器的设计方法.数值仿真结果验证了本文方法的正确性和有效性.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号