首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 389 毫秒
1.
We study the structure of sample paths of Markov systems by using performance potentials as the fundamental units. With a sample path-based approach, we show that performance sensitivity formulas (performance gradients and performance differences) of Markov systems can be constructed intuitively, by first principles, with performance potentials (or equivalently, perturbation realization factors) as building blocks. In particular, we derive sensitivity formulas for two Markov chains with possibly different state spaces. The proposed approach can be used to obtain flexibly the sensitivity formulas for a wide range of problems, including those with partial information. These formulas are the basis for performance optimization of discrete event dynamic systems, including perturbation analysis, Markov decision processes, and reinforcement learning. The approach thus provides insight on on-line learning and performance optimization and opens up new research directions. Sample path based algorithms can be developed.  相似文献   

2.
Two fundamental concepts and quantities, realization factors and performance potentials, are introduced for Markov processes. The relations among these two quantities and the group inverse of the infinitesimal generator are studied. It is shown that the sensitivity of the steady-state performance with respect to the change of the infinitesimal generator can be easily calculated by using either of these three quantities and that these quantities can be estimated by analyzing a single sample path of a Markov process. Based on these results, algorithms for estimating performance sensitivities on a single sample path of a Markov process can be proposed. The potentials in this paper are defined through realization factors and are shown to be the same as those defined by Poisson equations. The results provide a uniform framework of perturbation realization for infinitesimal perturbation analysis (IPA) and non-IPA approaches to the sensitivity analysis of steady-state performance; they also provide a theoretical background for the PA algorithms developed in recent years  相似文献   

3.
A new sample path analysis approach based on the smoothing property of conditional expectation for estimating the performance sensitivity of discrete event dynamical systems is proposed. Several examples are presented to show how this approach overcomes a difficulty of the ordinary infinitesimal perturbation analysis. The basic message is that one can get more knowledge about the system performance by observing and analyzing the sample path than by using the conventional simulation approach. It is also pointed out that the classical queueing theory approach for getting the performance sensitivity and the sample path based infinitesimal perturbation analysis approach can be unified in the framework of the new approach, the smoothed (conditional) perturbation analysis.  相似文献   

4.
A basic formula for online policy gradient algorithms   总被引:2,自引:0,他引:2  
This note presents a (new) basic formula for sample-path-based estimates for performance gradients for Markov systems (called policy gradients in reinforcement learning literature). With this basic formula, many policy-gradient algorithms, including those that have previously appeared in the literature, can be easily developed. The formula follows naturally from a sensitivity equation in perturbation analysis. New research direction is discussed.  相似文献   

5.
In this paper, two single sample path‐based recursive approaches for Markov decision problems are proposed. One is based on the simultaneous perturbation approach and can be applied to the general state problem, but its convergence rate is low. In this algorithm, the small perturbation on current parameters is necessary to get another sample path for comparison, but it may worsen the system. Hence, we introduce another approach, which directly estimates the gradient of the performance for optimization by “potential” theory. This algorithm, however, is limited to finite state space systems, but its convergence speed is higher than the first one. The estimate for gradient can be obtained by using the sample path with current parameters without any perturbation. This approach is more acceptable for practical applications.  相似文献   

6.
An extension of the concept of perturbation analysis (PA) is introduced that can be applied to cases where sample path discontinuity with respect to system parameters has prevented the application of simple PA rules. Both Markov and generalized semi-Markov processes are considered. The robustness of the extended PA is examined and it is found to be sufficiently robust  相似文献   

7.
Perturbation realization factor is an important concept in perturbation analysis of both queueing systems and Markov systems. A perturbation realization factor measures the effect of a perturbation on the system performance. This concept is important for the performance sensitivity and performance optimization of these systems. Since the perturbations in queueing systems are continuous in nature and those in Markov systems are discrete, it is not straightforward to establish the relationship between these two types of fundamental concepts. This note solves this long-standing problem. We find a formula that links these two types of perturbation realization factors in Gordon-Newell and open Jackson networks together. The results enhance our understanding of perturbation analysis and lead to new research directions.  相似文献   

8.
Perturbation analysis via coupling   总被引:1,自引:0,他引:1  
Perturbation analysis is an efficient method for performance analysis of discrete event dynamic systems. It yields gradient information from only one sample path observation. Over the last two decades, various perturbation analysis techniques have been developed to handle a large class of problems. Coupling is a method of generating multiple random samples. It has wide range of applications in applied probability. The paper is concerned with perturbation analysis via coupling. This approach offers a great versatility of the form of gradient estimators, which is potentially useful for variance reduction and for efficient implementation. Several known perturbation analysis techniques can be reviewed as special ways of coupling. The coupling method is further applied to gradient estimation of Markov chains. The method is used not only in deriving a gradient estimator but also in its implementation. It is proved that the estimator is strongly consistent. Finally, different coupling schemes are compared using an illustrative example  相似文献   

9.
With a long‐run average performance as the primary criterion for a Markov decision process, variance measures are studied as its secondary criteria. The steady‐state variance and the limiting average variance along a sample path are discussed. The latter one is difficult to handle due to its special form. With a sensitivity‐based approach, the difference formula for the sample‐path variance under different policies is intuitively constructed and then the optimality equation is presented. Moreover a policy iteration algorithm is developed. This work extends the sensitivity‐based construction approach to Markov decision processes with non‐standard performance criteria. The difference between these two types of variance and bias criteria is illustrated with a numerical example.  相似文献   

10.
Performance potentials play a crucial role in performance sensitivity analysis and policy iteration of Markov decision processes. The potentials can be estimated on a single sample path of a Markov process. In this paper, we propose two potential-based online policy iteration algorithms for performance optimization of Markov systems. The algorithms are based on online estimation of potentials and stochastic approximation. We prove that with these two algorithms the optimal policy can be attained after a finite number of iterations. A simulation example is given to illustrate the main ideas and the convergence rates of the algorithms.  相似文献   

11.
The stability of piecewise deterministic linear systems driven by an underlying finite Markov chain is analyzed. Necessary and sufficient conditions for moment stability are obtained by means of an explicit formula for the corresponding Liapunov exponent. The relationship between almost sure and moment stability is elucidated, revealing the possible occurrence of high order moment instability and large deviations from a stable sample path behavior.  相似文献   

12.
We propose a unified framework to Markov decision problems and performance sensitivity analysis for multichain Markov processes with both discounted and average-cost performance criteria. With the fundamental concept of performance potentials, we derive both performance-gradient and performance-difference formulas, which play the central role in performance optimization. The standard policy iteration algorithms for both discounted- and average-reward MDPs can be established using the performance-difference formulas in a simple and intuitive way; and the performance-gradient formulas together with stochastic approximation may lead to new optimization schemes. This sensitivity-based point of view of performance optimization provides some insights that link perturbation analysis, Markov decision processes, and reinforcement learning together. The research is an extension of the previous work on ergodic Markov chains (Cao, Automatica 36 (2000) 771).  相似文献   

13.
张凯  刘京菊 《计算机科学》2021,48(5):294-300
从攻击者角度对网络进行入侵路径分析对于指导网络安全防御具有重要意义。针对现有的基于吸收Markov链的分析方法中存在的对状态转移情形考虑不全面的问题和状态转移概率计算不合理的问题,提出了一种基于吸收Markov链的入侵路径分析方法。该方法在生成攻击图的基础上,根据攻击图中实现状态转移所利用的漏洞的可利用性得分,充分考虑了非吸收节点状态转移失败的情况,提出了一种新的状态转移概率计算方法,将攻击图映射到吸收Markov链模型;利用吸收Markov链的状态转移概率矩阵的性质,计算入侵路径中节点的威胁度排序和入侵路径长度的期望值。实验结果表明,该方法能够有效计算节点威胁度排序和路径长度期望;通过对比分析,该方法的计算结果相比现有方法更符合网络攻防的实际情况。  相似文献   

14.
Simulation-based optimization of Markov reward processes   总被引:4,自引:0,他引:4  
This paper proposes a simulation-based algorithm for optimizing the average reward in a finite-state Markov reward process that depends on a set of parameters. As a special case, the method applies to Markov decision processes where optimization takes place within a parametrized set of policies. The algorithm relies on the regenerative structure of finite-state Markov processes, involves the simulation of a single sample path, and can be implemented online. A convergence result (with probability 1) is provided  相似文献   

15.
Markov控制过程基于单个样本轨道的在线优化算法   总被引:3,自引:1,他引:3  
在Markov性能势理论基础上, 研究了Markov控制过程的性能优化算法. 不同于传统的基于计算的方法, 文中的算法是根据单个样本轨道的仿真来估计性能指标关于策略参数的梯度, 以寻找最优 (或次优 )随机平稳策略. 由于可根据不同实际系统的特征来选择适当的算法参数, 因此它能满足不同实际工程系统在线优化的需要. 最后简要分析了这些算法在一个无限长的样本轨道上以概率 1的收敛性, 并给出了一个三 状态受控Markov过程的数值实例.  相似文献   

16.
关于加工生产线的一种新的模型及其扰动分析   总被引:2,自引:1,他引:1  
本文首先建立在实际生产中应用较广的串联生产线和装配生产线的新的代数摸型(在新 韵代数系下).基于这类模型,在扰动幅度大小没有任何限制的情况下,对平均服务时间的扰 动,建立扰动传播的数学方程,给出系统输出率对平均服务时间的灵敏度公式.仿真结果验证 了理论分析结果,其精度令人满意.  相似文献   

17.
Model Checking Markov Chains with Actions and State Labels   总被引:2,自引:0,他引:2  
In the past, logics of several kinds have been proposed for reasoning about discrete-time or continuous-time Markov chains. Most of these logics rely on either state labels (atomic propositions) or on transition labels (actions). However, in several applications it is useful to reason about both state properties and action sequences. For this purpose, we introduce the logic as CSL which provides a powerful means to characterize execution paths of Markov chains with actions and state labels. asCSL can be regarded as an extension of the purely state-based logic CSL (continuous stochastic logic). In asCSL, path properties are characterized by regular expressions over actions and state formulas. Thus, the truth value of path formulas depends not only on the available actions in a given time interval, but also on the validity of certain state formulas in intermediate states. We compare the expressive power of CSL and asCSL and show that even the state-based fragment of asCSL is strictly more expressive than CSL if time intervals starting at zero are employed. Using an automaton-based technique, an asCSL formula and a Markov chain with actions and state labels are combined into a product Markov chain. For time intervals starting at zero, we establish a reduction of the model checking problem for asCSL to CSL model checking on this product Markov chain. The usefulness of our approach is illustrated with an elaborate model of a scalable cellular communication system, for which several properties are formalized by means of asCSL formulas and checked using the new procedure  相似文献   

18.
Semi-Markov decision problems and performance sensitivity analysis   总被引:1,自引:0,他引:1  
Recent research indicates that Markov decision processes (MDPs) can be viewed from a sensitivity point of view; and the perturbation analysis (PA), MDPs, and reinforcement learning (RL) are three closely related areas in optimization of discrete-event dynamic systems that can be modeled as Markov processes. The goal of this paper is two-fold. First, we develop the PA theory for semi-Markov processes (SMPs); and then we extend the aforementioned results about the relation among PA, MDP, and RL to SMPs. In particular, we show that performance sensitivity formulas and policy iteration algorithms of semi-Markov decision processes can be derived based on the performance potential and realization matrix. Both the long-run average and discounted-cost problems are considered. This approach provides a unified framework for both problems, and the long-run average problem corresponds to the discounted factor being zero. The results indicate that performance sensitivities and optimization depend only on first-order statistics. Single sample path-based implementations are discussed.  相似文献   

19.
The sensitivity-based optimization of Markov systems has become an increasingly important area. From the perspective of performance sensitivity analysis, policy-iteration algorithms and gradient estimation methods can be directly obtained for Markov decision processes (MDPs). In this correspondence, the sensitivity-based optimization is extended to average reward partially observable MDPs (POMDPs). We derive the performance-difference and performance-derivative formulas of POMDPs. On the basis of the performance-derivative formula, we present a new method to estimate the performance gradients. From the performance-difference formula, we obtain a sufficient optimality condition without the discounted reward formulation. We also propose a policy-iteration algorithm to obtain a nearly optimal finite-state-controller policy.   相似文献   

20.
We consider a closed Jackson—like queueing network with arbitrary service time distributions and derive an unbiased second derivative estimator of the throughput over N customers served at some node with respect to a parameter of the service distribution at that node. Our approach is based on observing a single sample path of this system, and evaluating all second-order effects on interdeparture times as a result of the parameter perturbation. We then define an estimator as a conditional expectation over appropriate observable quantities, as in Smoothed Perturbation Analysis (SPA). This process recovers the first derivative estimator along the way (which can also be derived using other techniques), and gives new insights into event order change phenomena which are of higher order, and on the type of sample path information we need to condition on for higher-order derivative estimation. Despite the complexity of the analysis, the final algorithm we obtain is relatively simple. Our estimators can be used in conjunction with other techniques to obtain rational approximations of the entire throughput response surface as a function of system parameters.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号