首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Basic Ideas for Event-Based Optimization of Markov Systems   总被引:5,自引:0,他引:5  
The goal of this paper is two-fold: First, we present a sensitivity point of view on the optimization of Markov systems. We show that Markov decision processes (MDPs) and the policy-gradient approach, or perturbation analysis (PA), can be derived easily from two fundamental sensitivity formulas, and such formulas can be flexibly constructed, by first principles, with performance potentials as building blocks. Second, with this sensitivity view we propose an event-based optimization approach, including the event-based sensitivity analysis and event-based policy iteration. This approach utilizes the special feature of a system characterized by events and illustrates how the potentials can be aggregated using the special feature and how the aggregated potential can be used in policy iteration. Compared with the traditional MDP approach, the event-based approach has its advantages: the number of aggregated potentials may scale to the system size despite that the number of states grows exponentially in the system size, this reduces the policy space and saves computation; the approach does not require actions at different states to be independent; and it utilizes the special feature of a system and does not need to know the exact transition probability matrix. The main ideas of the approach are illustrated by an admission control problem.Supported in part by a grant from Hong Kong UGC.  相似文献   

2.
We propose a unified framework to Markov decision problems and performance sensitivity analysis for multichain Markov processes with both discounted and average-cost performance criteria. With the fundamental concept of performance potentials, we derive both performance-gradient and performance-difference formulas, which play the central role in performance optimization. The standard policy iteration algorithms for both discounted- and average-reward MDPs can be established using the performance-difference formulas in a simple and intuitive way; and the performance-gradient formulas together with stochastic approximation may lead to new optimization schemes. This sensitivity-based point of view of performance optimization provides some insights that link perturbation analysis, Markov decision processes, and reinforcement learning together. The research is an extension of the previous work on ergodic Markov chains (Cao, Automatica 36 (2000) 771).  相似文献   

3.
Semi-Markov decision problems and performance sensitivity analysis   总被引:1,自引:0,他引:1  
Recent research indicates that Markov decision processes (MDPs) can be viewed from a sensitivity point of view; and the perturbation analysis (PA), MDPs, and reinforcement learning (RL) are three closely related areas in optimization of discrete-event dynamic systems that can be modeled as Markov processes. The goal of this paper is two-fold. First, we develop the PA theory for semi-Markov processes (SMPs); and then we extend the aforementioned results about the relation among PA, MDP, and RL to SMPs. In particular, we show that performance sensitivity formulas and policy iteration algorithms of semi-Markov decision processes can be derived based on the performance potential and realization matrix. Both the long-run average and discounted-cost problems are considered. This approach provides a unified framework for both problems, and the long-run average problem corresponds to the discounted factor being zero. The results indicate that performance sensitivities and optimization depend only on first-order statistics. Single sample path-based implementations are discussed.  相似文献   

4.
We introduce a sensitivity-based view to the area of learning and optimization of stochastic dynamic systems. We show that this sensitivity-based view provides a unified framework for many different disciplines in this area, including perturbation analysis, Markov decision processes, reinforcement learning, identification and adaptive control, and singular stochastic control; and that this unified framework applies to both the discrete event dynamic systems and continuous-time continuous-state systems. Many results in these disciplines can be simply derived and intuitively explained by using two performance sensitivity formulas. In addition, we show that this sensitivity-based view leads to new results and opens up new directions for future research. For example, the n th bias optimality of Markov processes has been established and the event-based optimization may be developed; this approach has computational and other advantages over the state-based approaches.  相似文献   

5.
The goals of perturbation analysis (PA), Markov decision processes (MDPs), and reinforcement learning (RL) are common: to make decisions to improve the system performance based on the information obtained by analyzing the current system behavior. In this paper, we study the relations among these closely related fields. We show that MDP solutions can be derived naturally from performance sensitivity analysis provided by PA. Performance potential plays an important role in both PA and MDPs; it also offers a clear intuitive interpretation for many results. Reinforcement learning, TD(), neuro-dynamic programming, etc., are efficient ways of estimating the performance potentials and related quantities based on sample paths. The sensitivity point of view of PA, MDP, and RL brings in some new insight to the area of learning and optimization. In particular, gradient-based optimization can be applied to parameterized systems with large state spaces, and gradient-based policy iteration can be applied to some nonstandard MDPs such as systems with correlated actions, etc. Potential-based on-line approaches and their advantages are also discussed.  相似文献   

6.
平均和折扣准则MDP基于TD(0)学习的统一NDP方法   总被引:3,自引:0,他引:3  
为适应实际大规模M arkov系统的需要,讨论M arkov决策过程(MDP)基于仿真的学习优化问题.根据定义式,建立性能势在平均和折扣性能准则下统一的即时差分公式,并利用一个神经元网络来表示性能势的估计值,导出参数TD(0)学习公式和算法,进行逼近策略评估;然后,根据性能势的逼近值,通过逼近策略迭代来实现两种准则下统一的神经元动态规划(neuro-dynam ic programm ing,NDP)优化方法.研究结果适用于半M arkov决策过程,并通过一个数值例子,说明了文中的神经元策略迭代算法对两种准则都适用,验证了平均问题是折扣问题当折扣因子趋近于零时的极限情况.  相似文献   

7.
借助于无穷小矩阵摄动方法, 讨论了一类Markov过程, 其稳态性能关于参数摄动的灵敏度分析问题. 然后研究了闭排队网络的稳态性能灵敏度分析问题, 并在参数相关性能函数的情况下, 给出了网络的几种稳态性能的灵敏度公式. 这些公式表明稳态性能灵敏度很容易通过网络势能进行计算.  相似文献   

8.
This paper provides an introductory discussion for an important concept, the performance potentials of Markov processes, and its relations with perturbation analysis (PA), average-cost Markov decision processes (MDP), Poisson equations, -potentials, the fundamental matrix, and the group inverse of the transition matrix (or the infinitesimal generators). Applications to single sample path-based performance sensitivity estimation and performance optimization are also discussed. On-line algorithms for performance sensitivity estimates and on-line schemes for policy iteration methods are presented. The approach is closely related to reinforcement learning algorithms.  相似文献   

9.
Performance potentials play a crucial role in performance sensitivity analysis and policy iteration of Markov decision processes. The potentials can be estimated on a single sample path of a Markov process. In this paper, we propose two potential-based online policy iteration algorithms for performance optimization of Markov systems. The algorithms are based on online estimation of potentials and stochastic approximation. We prove that with these two algorithms the optimal policy can be attained after a finite number of iterations. A simulation example is given to illustrate the main ideas and the convergence rates of the algorithms.  相似文献   

10.
Two fundamental concepts and quantities, realization factors and performance potentials, are introduced for Markov processes. The relations among these two quantities and the group inverse of the infinitesimal generator are studied. It is shown that the sensitivity of the steady-state performance with respect to the change of the infinitesimal generator can be easily calculated by using either of these three quantities and that these quantities can be estimated by analyzing a single sample path of a Markov process. Based on these results, algorithms for estimating performance sensitivities on a single sample path of a Markov process can be proposed. The potentials in this paper are defined through realization factors and are shown to be the same as those defined by Poisson equations. The results provide a uniform framework of perturbation realization for infinitesimal perturbation analysis (IPA) and non-IPA approaches to the sensitivity analysis of steady-state performance; they also provide a theoretical background for the PA algorithms developed in recent years  相似文献   

11.
Using a sample path approach, we derive a new formula for performance sensitivities of discrete-time Markov chains. A distinguished feature of this formula is that the quantities involved can be estimated by analyzing a single sample path of a Markov chain. Thus, the formula provides a new direction for sensitivity analysis and can be viewed as an extension of the perturbation realization theory to problems where infinitesimal perturbation analysis does not work well  相似文献   

12.
In this paper, two single sample path‐based recursive approaches for Markov decision problems are proposed. One is based on the simultaneous perturbation approach and can be applied to the general state problem, but its convergence rate is low. In this algorithm, the small perturbation on current parameters is necessary to get another sample path for comparison, but it may worsen the system. Hence, we introduce another approach, which directly estimates the gradient of the performance for optimization by “potential” theory. This algorithm, however, is limited to finite state space systems, but its convergence speed is higher than the first one. The estimate for gradient can be obtained by using the sample path with current parameters without any perturbation. This approach is more acceptable for practical applications.  相似文献   

13.
状态相关闭排队网络中的性能指标灵敏度公式   总被引:12,自引:3,他引:9  
本文通过研究一类Markov过程在无穷小矩阵的参数摄动下稳态性能指标的灵敏度,运用无穷小矩阵的群逆,实现矩阵和势能这三个描述稳定性能指标灵敏度的等价量,给出了状态相关闭排队网络在参数摄动下的稳态性能指标灵敏度公式,这些结果可直接用于排队网络的控制和优化。  相似文献   

14.
The standard approach to stochastic control is dynamic programming. In this paper, we introduce an alternative approach based on direct comparison of the performance of any two policies. This is achieved by modeling the state process as a continuous-time and continuous-state Markov process and applying the same ideas as for the discrete-time and discrete-state case. This approach is simple and intuitively clear; it applies to different problems with, finite and infinite horizons, discounted and long-run-average performance, continuous and jump diffusions, in the same way. Discounting is not needed when dealing with long-run average performance. The approach provides a unified framework for stochastic control and other optimization theory and methodologies, including Markov decision processes, perturbation analysis, and reinforcement learning.  相似文献   

15.
CTMDP基于随机平稳策略的仿真优化算法   总被引:2,自引:2,他引:2  
基于Markov性能势理论和神经元动态规划(NDP)方法,研究一类连续时间Markov决 策过程(MDP)在随机平稳策略下的仿真优化问题,给出的算法是把一个连续时间过程转换成其 一致化Markov链,然后通过其单个样本轨道来估计平均代价性能指标关于策略参数的梯度,以 寻找次优策略,该方法适合于解决大状态空间系统的性能优化问题.并给出了一个受控Markov 过程的数值实例.  相似文献   

16.
Markov控制过程基于单个样本轨道的在线优化算法   总被引:3,自引:1,他引:3  
在Markov性能势理论基础上, 研究了Markov控制过程的性能优化算法. 不同于传统的基于计算的方法, 文中的算法是根据单个样本轨道的仿真来估计性能指标关于策略参数的梯度, 以寻找最优 (或次优 )随机平稳策略. 由于可根据不同实际系统的特征来选择适当的算法参数, 因此它能满足不同实际工程系统在线优化的需要. 最后简要分析了这些算法在一个无限长的样本轨道上以概率 1的收敛性, 并给出了一个三 状态受控Markov过程的数值实例.  相似文献   

17.
A basic formula for online policy gradient algorithms   总被引:2,自引:0,他引:2  
This note presents a (new) basic formula for sample-path-based estimates for performance gradients for Markov systems (called policy gradients in reinforcement learning literature). With this basic formula, many policy-gradient algorithms, including those that have previously appeared in the literature, can be easily developed. The formula follows naturally from a sensitivity equation in perturbation analysis. New research direction is discussed.  相似文献   

18.
It is known that the performance potentials (or equivalently, perturbation realization factors) can be used as building blocks for performance sensitivities of Markov systems. In parameterized systerns, the changes in parameters may only affect some states, and the explicit transition probability matrix may not be known. In this paper, we use an example to show that we can use potentials to construct performance sensitivities m a more flexible way; only the potentials at the affected states need to be estimated, and the transition probability matrix need not be known. Policy iteration algorithms, which are simpler than the standard one, can be established.  相似文献   

19.
基于性能势理论和等价Markov过程方法,研究了一类半Markov决策过程(SMDP)在参数化随机平稳策略下的仿真优化算法,并简要分析了算法的收敛性.通过SMDP的等价Markov过程,定义了一个一致化Markov链,然后根据该一致化Markov链的单个样本轨道来估计SMDP的平均代价性能指标关于策略参数的梯度,以寻找最优(或次优)策略.文中给出的算法是利用神经元网络来逼近参数化随机平稳策略,以节省计算机内存,避免了“维数灾”问题,适合于解决大状态空间系统的性能优化问题.最后给出了一个仿真实例来说明算法的应用.  相似文献   

20.
A new sample path analysis approach based on the smoothing property of conditional expectation for estimating the performance sensitivity of discrete event dynamical systems is proposed. Several examples are presented to show how this approach overcomes a difficulty of the ordinary infinitesimal perturbation analysis. The basic message is that one can get more knowledge about the system performance by observing and analyzing the sample path than by using the conventional simulation approach. It is also pointed out that the classical queueing theory approach for getting the performance sensitivity and the sample path based infinitesimal perturbation analysis approach can be unified in the framework of the new approach, the smoothed (conditional) perturbation analysis.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号