期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A single sample path-based performance sensitivity formula forMarkov chains

Xi-Ren Cao Xue-Ming Yuan Li Qiu 《Automatic Control, IEEE Transactions on》1996,41(12):1814-1817

Using a sample path approach, we derive a new formula for performance sensitivities of discrete-time Markov chains. A distinguished feature of this formula is that the quantities involved can be estimated by analyzing a single sample path of a Markov chain. Thus, the formula provides a new direction for sensitivity analysis and can be viewed as an extension of the perturbation realization theory to problems where infinitesimal perturbation analysis does not work well 相似文献

2.

闭排队网络当性能函数与参数相关时的性能灵敏度分析 总被引：1，自引：0，他引：1

下载免费PDF全文

殷保群周亚平奚宏生孙德敏《控制理论与应用》2002,19(2):311-312

借助于无穷小矩阵摄动方法, 讨论了一类Markov过程, 其稳态性能关于参数摄动的灵敏度分析问题. 然后研究了闭排队网络的稳态性能灵敏度分析问题, 并在参数相关性能函数的情况下, 给出了网络的几种稳态性能的灵敏度公式. 这些公式表明稳态性能灵敏度很容易通过网络势能进行计算. 相似文献

3.

The potential structure of sample paths and performance sensitivities of Markov systems 总被引：1，自引：0，他引：1

Xi-Ren Cao 《Automatic Control, IEEE Transactions on》2004,49(12):2129-2142

We study the structure of sample paths of Markov systems by using performance potentials as the fundamental units. With a sample path-based approach, we show that performance sensitivity formulas (performance gradients and performance differences) of Markov systems can be constructed intuitively, by first principles, with performance potentials (or equivalently, perturbation realization factors) as building blocks. In particular, we derive sensitivity formulas for two Markov chains with possibly different state spaces. The proposed approach can be used to obtain flexibly the sensitivity formulas for a wide range of problems, including those with partial information. These formulas are the basis for performance optimization of discrete event dynamic systems, including perturbation analysis, Markov decision processes, and reinforcement learning. The approach thus provides insight on on-line learning and performance optimization and opens up new research directions. Sample path based algorithms can be developed. 相似文献

4.

The Relations Among Potentials, Perturbation Analysis, and Markov Decision Processes 总被引：16，自引：0，他引：16

Xi-Ren Cao 《Discrete Event Dynamic Systems》1998,8(1):71-87

This paper provides an introductory discussion for an important concept, the performance potentials of Markov processes, and its relations with perturbation analysis (PA), average-cost Markov decision processes (MDP), Poisson equations, -potentials, the fundamental matrix, and the group inverse of the transition matrix (or the infinitesimal generators). Applications to single sample path-based performance sensitivity estimation and performance optimization are also discussed. On-line algorithms for performance sensitivity estimates and on-line schemes for policy iteration methods are presented. The approach is closely related to reinforcement learning algorithms. 相似文献

5.

Potential-based online policy iteration algorithms for Markov decision processes

Hai-Tao Fang Xi-Ren Cao 《Automatic Control, IEEE Transactions on》2004,49(4):493-505

Performance potentials play a crucial role in performance sensitivity analysis and policy iteration of Markov decision processes. The potentials can be estimated on a single sample path of a Markov process. In this paper, we propose two potential-based online policy iteration algorithms for performance optimization of Markov systems. The algorithms are based on online estimation of potentials and stochastic approximation. We prove that with these two algorithms the optimal policy can be attained after a finite number of iterations. A simulation example is given to illustrate the main ideas and the convergence rates of the algorithms. 相似文献

6.

Smoothed (conditional) perturbation analysis of discrete event dynamical systems 总被引：1，自引：0，他引：1

Wei-Bo Gong Yu-Chi Ho 《Automatic Control, IEEE Transactions on》1987,32(10):858-866

A new sample path analysis approach based on the smoothing property of conditional expectation for estimating the performance sensitivity of discrete event dynamical systems is proposed. Several examples are presented to show how this approach overcomes a difficulty of the ordinary infinitesimal perturbation analysis. The basic message is that one can get more knowledge about the system performance by observing and analyzing the sample path than by using the conventional simulation approach. It is also pointed out that the classical queueing theory approach for getting the performance sensitivity and the sample path based infinitesimal perturbation analysis approach can be unified in the framework of the new approach, the smoothed (conditional) perturbation analysis. 相似文献

7.

The optimal robust control policy for uncertain semi-Markov control processes

Tang Hao Xi Hongsheng Yin Baoqun 《International journal of systems science》2013,44(13):791-800

The optimization problems of Markov control processes (MCPs) with exact knowledge of system parameters, in the form of transition probabilities or infinitesimal transition rates, can be solved by using the concept of Markov performance potential which plays an important role in the sensitivity analysis of MCPs. In this paper, by using an equivalent infinitesimal generator, we first introduce a definition of discounted Poisson equations for semi-Markov control processes (SMCPs), which is similar to that for MCPs, and the performance potentials of SMCPs are defined as solution of the equation. Some related optimization techniques based on performance potentials for MCPs may be extended to the optimization of SMCPs if the system parameters are known with certainty. Unfortunately, exact values of the distributions of the sojourn times at some states or the transition probabilities of the embedded Markov chain for a large-scale SMCP are generally difficult or impossible to obtain, which leads to the uncertainty of the semi-Markov kernel, and thereby to the uncertainty of equivalent infinitesimal transition rates. Similar to the optimization of uncertain MCPs, a potential-based policy iteration method is proposed in this work to search for the optimal robust control policy for SMCPs with uncertain infinitesimal transition rates that are represented as compact sets. In addition, convergence of the algorithm is discussed. 相似文献

8.

Sensitivity formulas of performance in two-server cyclic queueing networks with phase-type distributed service times

B. Yin Y. Zhou H. Xi D. Sun 《International Transactions in Operational Research》1999,6(6):649-663

With the approach of the infinitesimal generator perturbation, sensitivities of the steady-state performance are studied in two-server cyclic queueing networks with phase-type distributed service times. Sensitivity formulas expressed by the potentials of the networks are given for parameter-dependent performance functions. An algorithm to compute the potentials and the performance derivatives of the networks is given, and a numerical example is provided. 相似文献

9.

Performance sensitivities for parameterized Markov systems

Xiren Cao Junyu Zhang 《控制理论与应用(英文版)》2004,2(1):65-68

It is known that the performance potentials (or equivalently, perturbation realization factors) can be used as building blocks for performance sensitivities of Markov systems. In parameterized systerns, the changes in parameters may only affect some states, and the explicit transition probability matrix may not be known. In this paper, we use an example to show that we can use potentials to construct performance sensitivities m a more flexible way; only the potentials at the affected states need to be estimated, and the transition probability matrix need not be known. Policy iteration algorithms, which are simpler than the standard one, can be established. 相似文献

10.

Semi-Markov decision problems and performance sensitivity analysis 总被引：1，自引：0，他引：1

Xi-Ren Cao 《Automatic Control, IEEE Transactions on》2003,48(5):758-769

Recent research indicates that Markov decision processes (MDPs) can be viewed from a sensitivity point of view; and the perturbation analysis (PA), MDPs, and reinforcement learning (RL) are three closely related areas in optimization of discrete-event dynamic systems that can be modeled as Markov processes. The goal of this paper is two-fold. First, we develop the PA theory for semi-Markov processes (SMPs); and then we extend the aforementioned results about the relation among PA, MDP, and RL to SMPs. In particular, we show that performance sensitivity formulas and policy iteration algorithms of semi-Markov decision processes can be derived based on the performance potential and realization matrix. Both the long-run average and discounted-cost problems are considered. This approach provides a unified framework for both problems, and the long-run average problem corresponds to the discounted factor being zero. The results indicate that performance sensitivities and optimization depend only on first-order statistics. Single sample path-based implementations are discussed. 相似文献

11.

Relationship Between Perturbation Realization Factors With Queueing Models and Markov Models 总被引：2，自引：0，他引：2

《Automatic Control, IEEE Transactions on》2006,51(10):1699-1704

Perturbation realization factor is an important concept in perturbation analysis of both queueing systems and Markov systems. A perturbation realization factor measures the effect of a perturbation on the system performance. This concept is important for the performance sensitivity and performance optimization of these systems. Since the perturbations in queueing systems are continuous in nature and those in Markov systems are discrete, it is not straightforward to establish the relationship between these two types of fundamental concepts. This note solves this long-standing problem. We find a formula that links these two types of perturbation realization factors in Gordon-Newell and open Jackson networks together. The results enhance our understanding of perturbation analysis and lead to new research directions. 相似文献

12.

状态相关闭排队网络中的性能指标灵敏度公式 总被引：12，自引：3，他引：9

殷保群周亚平杨孝先溪宏生孙德敏《控制理论与应用》1999,16(2):255-257

本文通过研究一类Ｍａｒｋｏｖ过程在无穷小矩阵的参数摄动下稳态性能指标的灵敏度,运用无穷小矩阵的群逆,实现矩阵和势能这三个描述稳定性能指标灵敏度的等价量,给出了状态相关闭排队网络在参数摄动下的稳态性能指标灵敏度公式,这些结果可直接用于排队网络的控制和优化。相似文献

13.

Recursive Approaches for Single Sample Path Based Markov Reward Processes

Hai‐Tao Fang Han‐Fu Chen Xi‐Ren Cao 《Asian journal of control》2001,3(1):21-26

In this paper, two single sample path‐based recursive approaches for Markov decision problems are proposed. One is based on the simultaneous perturbation approach and can be applied to the general state problem, but its convergence rate is low. In this algorithm, the small perturbation on current parameters is necessary to get another sample path for comparison, but it may worsen the system. Hence, we introduce another approach, which directly estimates the gradient of the performance for optimization by “potential” theory. This algorithm, however, is limited to finite state space systems, but its convergence speed is higher than the first one. The estimate for gradient can be obtained by using the sample path with current parameters without any perturbation. This approach is more acceptable for practical applications. 相似文献

14.

A unified approach to Markov decision problems and performance sensitivity analysis with discounted and average criteria: multichain cases

Xi-Ren Cao Author Vitae Xianping Guo^{Author Vitae} 《Automatica》2004,40(10):1749-1759

We propose a unified framework to Markov decision problems and performance sensitivity analysis for multichain Markov processes with both discounted and average-cost performance criteria. With the fundamental concept of performance potentials, we derive both performance-gradient and performance-difference formulas, which play the central role in performance optimization. The standard policy iteration algorithms for both discounted- and average-reward MDPs can be established using the performance-difference formulas in a simple and intuitive way; and the performance-gradient formulas together with stochastic approximation may lead to new optimization schemes. This sensitivity-based point of view of performance optimization provides some insights that link perturbation analysis, Markov decision processes, and reinforcement learning together. The research is an extension of the previous work on ergodic Markov chains (Cao, Automatica 36 (2000) 771). 相似文献

15.

From Perturbation Analysis to Markov Decision Processes and Reinforcement Learning 总被引：8，自引：0，他引：8

Xi-Ren Cao 《Discrete Event Dynamic Systems》2003,13(1-2):9-39

The goals of perturbation analysis (PA), Markov decision processes (MDPs), and reinforcement learning (RL) are common: to make decisions to improve the system performance based on the information obtained by analyzing the current system behavior. In this paper, we study the relations among these closely related fields. We show that MDP solutions can be derived naturally from performance sensitivity analysis provided by PA. Performance potential plays an important role in both PA and MDPs; it also offers a clear intuitive interpretation for many results. Reinforcement learning, TD(), neuro-dynamic programming, etc., are efficient ways of estimating the performance potentials and related quantities based on sample paths. The sensitivity point of view of PA, MDP, and RL brings in some new insight to the area of learning and optimization. In particular, gradient-based optimization can be applied to parameterized systems with large state spaces, and gradient-based policy iteration can be applied to some nonstandard MDPs such as systems with correlated actions, etc. Potential-based on-line approaches and their advantages are also discussed. 相似文献

16.

Markov控制过程基于单个样本轨道的在线优化算法 总被引：3，自引：1，他引：3

唐昊奚宏生殷保群《控制理论与应用》2002,19(6):865-871

在Markov性能势理论基础上, 研究了Markov控制过程的性能优化算法. 不同于传统的基于计算的方法, 文中的算法是根据单个样本轨道的仿真来估计性能指标关于策略参数的梯度, 以寻找最优 (或次优 )随机平稳策略. 由于可根据不同实际系统的特征来选择适当的算法参数, 因此它能满足不同实际工程系统在线优化的需要. 最后简要分析了这些算法在一个无限长的样本轨道上以概率 1的收敛性, 并给出了一个三状态受控Markov过程的数值实例. 相似文献

17.

随机平稳策略下半Markov决策过程的仿真优化算法

代桂平唐昊奚宏生《控制理论与应用》2006,23(4):547-551

基于性能势理论和等价Markov过程方法,研究了一类半Markov决策过程(SMDP)在参数化随机平稳策略下的仿真优化算法,并简要分析了算法的收敛性．通过SMDP的等价Markov过程,定义了一个一致化Markov链,然后根据该一致化Markov链的单个样本轨道来估计SMDP的平均代价性能指标关于策略参数的梯度,以寻找最优(或次优)策略．文中给出的算法是利用神经元网络来逼近参数化随机平稳策略,以节省计算机内存,避免了“维数灾”问题,适合于解决大状态空间系统的性能优化问题．最后给出了一个仿真实例来说明算法的应用．相似文献

18.

On choosing the characterization for smoothed perturbation analysis 总被引：1，自引：0，他引：1

Fu M.C. Hu J.-Q. 《Automatic Control, IEEE Transactions on》1991,36(11):1331-1336

Using second derivative estimators of the GI/G/1 queue as an illustrative example, the authors demonstrate that smoothed perturbation analysis estimators are not necessarily as distribution-free as infinitesimal perturbation analysis estimators, in the sense that the appropriate choice of conditioning quantities-the so-called characterization-may depend on the underlying distribution. Through a different choice of characterization, the authors derive an estimator that works for distributions for which a previously derived estimator fails. The importance of finding the appropriate characterization, or set of conditioning quantities, when applying the technique of smoothed perturbation analysis is illustrated by the contrast between two estimators of the same quantity (second derivative of mean steady-state system time) based on different characterizations of the simple path 相似文献

19.

Perturbation analysis of closed queueing networks with generalservice time distributions

Cao X.-R. 《Automatic Control, IEEE Transactions on》1991,36(11):1327-1331

Perturbation analysis of closed queuing networks with nonexponential service time distributions is studied. Perturbation analysis formulas using realization probabilities are extended to these networks. A perturbation generation function, which generalizes the perturbation generation rule, is defined. equations for realization probability and formulas for sensitivity of the system throughput with respect to service time distribution parameters are presented. The formulas provide an analytical method of calculating throughput sensitivity and an explanation of the application of perturbation analysis algorithms for networks with general service time distributions. The author focuses on the extension of concepts and intuitive explanations of the formulas rather than on mathematical derivations 相似文献

20.

CTMDP基于随机平稳策略的仿真优化算法 总被引：2，自引：2，他引：2

唐昊奚宏生殷保群《自动化学报》2004,30(2):229-234

基于Markov性能势理论和神经元动态规划(NDP)方法,研究一类连续时间Markov决策过程(MDP)在随机平稳策略下的仿真优化问题,给出的算法是把一个连续时间过程转换成其一致化Markov链,然后通过其单个样本轨道来估计平均代价性能指标关于策略参数的梯度,以寻找次优策略,该方法适合于解决大状态空间系统的性能优化问题.并给出了一个受控Markov 过程的数值实例. 相似文献