共查询到20条相似文献,搜索用时 31 毫秒
1.
Y. J. Liu G. Yin Q. Zhang J. B. Moore 《Mathematics of Control, Signals, and Systems (MCSS)》2007,19(3):207-234
In this work, we establish a framework for balanced realization of linear systems subject to regime switching modulated by
a continuous-time Markov chain with a finite state space. First, a definition of balanced realization is given. Then a ρ-balanced
realization is developed to approximate the system of balancing equations, which is a system of time-varying algebraic equations.
When the state space of the Markov chain is large, the computational effort becomes a real concern. To resolve this problem,
we introduce a two-time-scale formulation and use decomposition/aggregation and averaging techniques to reduce the computational
complexity. Based on the two-time-scale formulation, further approximation procedures are developed. Numerical examples are
also presented for demonstration. 相似文献
2.
In this paper, the optimal maintenance policy for a multi-state system with no observation is considered. Different from most existing works, only a limited number of imperfect preventive maintenance actions can be performed between two successive replacements. Assume that the system's deterioration state cannot be observed during its operation expected after each replacement, and it evolves as a discrete-time Markov chain with a finite state space. After choosing the information state as state variable, the problem is then formulated as a Markov decision process over the infinite time horizon. In order to increase the computational efficiency, several key structural properties are developed by minimising the total expected cost per unit time. The existence of the optimal threshold-type maintenance policy is proved and the monotonicity of the threshold is obtained. Finally, a numerical example is given to illustrate the optimal policy. 相似文献
3.
Chris M. Strickland Ian. W. Turner Robert Denham Kerrie L. Mengersen 《Computational statistics & data analysis》2009,53(12):4116-4125
A Bayesian Markov chain Monte Carlo methodology is developed for the estimation of multivariate linear Gaussian state space models. In particular, an efficient simulation smoothing algorithm is proposed that makes use of the univariate representation of the state space model. Substantial gains over existing algorithms in computational efficiency are achieved using the new simulation smoother for the analysis of high dimensional multivariate time series.The methodology is used to analyse a multivariate time series dataset of the Normalised Difference Vegetation Index (NDVI), which is a proxy for the level of live vegetation, for a particular grazing property located in Queensland, Australia. 相似文献
4.
In this paper we present a new algorithm for the approximate transient analysis of large stochastic models. The new algorithm is based on the self-correcting analysis principle for continuous-time Markov chains (CTMC). The approach uses different time dependent aggregations of the CTMC of a stochastic model. With the method of uniformization the transient state probabilities of each aggregated CTMC for a time step are calculated. The derived probabilities are used for the construction of stronger aggregations, which are applied for the correction of the transition rates of the previous aggregations. This is done step by step, until the final time is reached. High aggregations of the original continuous-time Markov chain lead to a time and space efficient computational effort. Therefore the approximate transient analysis method based on the self-correcting aggregation can be used for models with large state spaces. For queuing networks with phase-type distributions of the service times this newly developed algorithm is implemented in WinPEPSY-QNS, a tool for performance evaluation and prediction of stochastic models based on queuing networks. It consists of a graphical editor for the construction of queuing networks and an easy-to-use evaluation component, which offers suitable analysis methods. The newly implemented algorithm is used for the analysis of several examples, and the results are compared to the results of simulation runs where exact values cannot be achieved. 相似文献
5.
《Computers & Mathematics with Applications》2006,51(2):189-198
This paper observes a job search problem on a partially observable Markov chain, which can be considered as an extension of a job search in a dynamic economy in [1]. This problem is formulated as the state changes according to a partially observable Markov chain, i.e., the current state cannot be observed but there exists some information regarding what a present state is. All information about the unobservable state are summarized by the probability distributions on the state space, and we employ the Bayes' theorem as a learning procedure. The total positivity of order two, or simply TP2, is a fundamental property to investigate sequential decision problems, and it also plays an important role in the Bayesian learning procedure for a partially observable Markov process. By using this property, we consider some relationships among prior and posterior information, and the optimal policy. We will also observe the probabilities to make a transition into each state after some additional transitions by empolying the optimal policy. In the stock market, suppose that the states correspond to the business situation of one company and if there is a state designating the default, then the problem is what time the stocks are sold off before bankrupt, and the probability to become bankrupt will be also observed. 相似文献
6.
针对含扩散项的线性混杂切换系统优化控制问题, 为降低优化求解的计算复杂性, 提出了Monte Carlo统计预测方法. 首先通过数值求解技术把连续时间优化控制问题转化为离散时间的Markov决策过程问题; 然后在若干有限状态子空间内, 利用反射边界技术来求解相应子空间的最优控制策略; 最后根据最优控制策略的结构特性, 采用统计预测方法来预测出整个状态空间的最优控制策略. 该方法能有效降低求解涉及大状态空间及多维变量的线性混杂切换系统优化控制的计算复杂性, 文末的仿真结果验证了方法的有效性. 相似文献
7.
Yat-wah Wan Author Vitae Author Vitae 《Automatica》2006,42(3):393-403
The solution of Markov Decision Processes (MDPs) often relies on special properties of the processes. For two-level MDPs, the difference in the rates of state changes of the upper and lower levels has led to limiting or approximate solutions of such problems. In this paper, we solve a two-level MDP without making any assumption on the rates of state changes of the two levels. We first show that such a two-level MDP is a non-standard one where the optimal actions of different states can be related to each other. Then we give assumptions (conditions) under which such a specially constrained MDP can be solved by policy iteration. We further show that the computational effort can be reduced by decomposing the MDP. A two-level MDP with M upper-level states can be decomposed into one MDP for the upper level and M to M(M-1) MDPs for the lower level, depending on the structure of the two-level MDP. The upper-level MDP is solved by time aggregation, a technique introduced in a recent paper [Cao, X.-R., Ren, Z. Y., Bhatnagar, S., Fu, M., & Marcus, S. (2002). A time aggregation approach to Markov decision processes. Automatica, 38(6), 929-943.], and the lower-level MDPs are solved by embedded Markov chains. 相似文献
8.
This paper studies maintenance policies for multi-component systems which have failure interaction among their components. Component failure might accelerate deterioration processes or induce instantaneous failures of the remaining components. We formulate this maintenance problem as a Markov decision process (MDP) with an objective of minimising a total discounted maintenance cost. However, the action set and state space in MDP exponentially grow as the number of components increases. This makes traditional approaches computationally intractable. To deal with this curse of dimensionality, a modified iterative aggregation procedure (MIAP) is proposed. We mathematically prove that iterations in MIAP guarantee the convergence and the policy obtained is optimal. Numerical case studies find that failure interaction should not be ignored in a maintenance policy decision making and the proposed MIAP is faster and requires less computational memory size than that of linear programming. 相似文献
9.
State based analysis of Markovian models is faced with the problem of state space explosion. To handle huge state spaces often compositional modeling and aggregation of components are used. Exact aggregation resulting in exact transient or stationary results is only possible in some cases, when the Markov process is lumpable. Therefore approximate aggregation is often applied to reduce the state space. Several approximate aggregation methods exist which are usually based on heuristics.This paper presents a new aggregation approach for Markovian components which computes aggregates that minimize the difference according to some algebraically defined function which describes the difference between the component and the aggregate. If the difference becomes zero, aggregation is exact, which means that component and aggregate are indistinguishable in the sense that transient and stationary results in any environment are identical. For the computation of aggregates, an alternating least squares approach is presented which tries to minimize the norm-wise difference between the original component and the aggregate. Algorithms to compute aggregates are also introduced and the quality of the approximation is evaluated by means of several examples. 相似文献
10.
Annalisa Cesaro Dario Pacciarelli 《Computers & Industrial Engineering》2011,61(1):150-160
This study compares approximation techniques for the estimation of the operational availability of a corrective maintenance system. The assessment is based on the practical maintenance of safety equipments in operation in 38 Italian Airports. A single echelon one-for-one ordering policy with complete pooling is analyzed, with a deterministic rule for lateral transshipments. With this policy, the state probabilities of the associated Markov model cannot be expressed in product form. Since the exact computation of the state probabilities is not practical as the number of states in the Markov chain increases, this study describes three approximation techniques and assesses their performance in terms of computational effort, memory requirement and error with respect to the exact value. The first two techniques are based on a method by Alfredsson and Verrijdt and on the Equivalent Random Traffic method, respectively. The idea of both methods is to approximate the state probabilities with a product form, so that the Markov chain can be decomposed. The third technique is based on the multi-dimensional scaling down approach, which studies an equivalent reduced Markov chain rather than decomposing the original one. 相似文献
11.
12.
本文研究了一种最优马尔可夫控制系统,这种控制系统以统计规律依赖于决定序列的马尔可夫链描述.我们称决定序列为决策.存在一具有下述性质的目标状态,一日系统到达此状态,状态就不再改变.我们的目的是要选取一决策,使所有从每一初始状态出发最终到达此目标状态的概率都达到最大.我们先提出在平稳决策集合中求最优决策的决策迭代法.然后证明,此决策在包含平稳及不平稳决策的决策集合上也是最优的. 相似文献
13.
Antonio Pietrabissa 《International journal of control》2013,86(10):1814-1827
This article presents a connection admission control (CAC) algorithm for UMTS networks based on the Markov decision process (MDP) approach. To deal with the non-stationary environment due to the time-varying statistical characteristics of the offered traffic, the admission policy has to be computed periodically based on on-line measurements, and the optimal policy computation is excessively time-consuming to be performed on-line. Thus, this article proposes a reduction of the policy space coupled with an aggregation of the state space for the fast computation of a sub-optimal admission policy. Theoretical results and numerical simulations show the effectiveness of the proposed approach. 相似文献
14.
Feature-Based Aggregation and Deep Reinforcement Learning: A Survey and Some New Implementations 下载免费PDF全文
Dimitri P. Bertsekas 《IEEE/CAA Journal of Automatica Sinica》2019,6(1):1-31
In this paper we discuss policy iteration methods for approximate solution of a finite-state discounted Markov decision problem, with a focus on feature-based aggregation methods and their connection with deep reinforcement learning schemes. We introduce features of the states of the original problem, and we formulate a smaller "aggregate" Markov decision problem, whose states relate to the features. We discuss properties and possible implementations of this type of aggregation, including a new approach to approximate policy iteration. In this approach the policy improvement operation combines feature-based aggregation with feature construction using deep neural networks or other calculations. We argue that the cost function of a policy may be approximated much more accurately by the nonlinear function of the features provided by aggregation, than by the linear function of the features provided by neural networkbased reinforcement learning, thereby potentially leading to more effective policy improvement. 相似文献
15.
A stochastic games framework for verification and control of discrete time stochastic hybrid systems
Jerry Ding Maryam Kamgarpour Sean Summers Alessandro Abate John Lygeros Claire Tomlin 《Automatica》2013
We describe a framework for analyzing probabilistic reachability and safety problems for discrete time stochastic hybrid systems within a dynamic games setting. In particular, we consider finite horizon zero-sum stochastic games in which a control has the objective of reaching a target set while avoiding an unsafe set in the hybrid state space, and a rational adversary has the opposing objective. We derive an algorithm for computing the maximal probability of achieving the control objective, subject to the worst-case adversary behavior. From this algorithm, sufficient conditions of optimality are also derived for the synthesis of optimal control policies and worst-case disturbance strategies. These results are then specialized to the safety problem, in which the control objective is to remain within a safe set. We illustrate our modeling framework and computational approach using both a tutorial example with jump Markov dynamics and a practical application in the domain of air traffic management. 相似文献
16.
17.
A primary challenge of agent-based policy learning in complex and uncertain environments is escalating computational complexity with the size of the task space(action choices and world states) and the number of agents.Nonetheless,there is ample evidence in the natural world that high-functioning social mammals learn to solve complex problems with ease,both individually and cooperatively.This ability to solve computationally intractable problems stems from both brain circuits for hierarchical representation of state and action spaces and learned policies as well as constraints imposed by social cognition.Using biologically derived mechanisms for state representation and mammalian social intelligence,we constrain state-action choices in reinforcement learning in order to improve learning efficiency.Analysis results bound the reduction in computational complexity due to stateion,hierarchical representation,and socially constrained action selection in agent-based learning problems that can be described as variants of Markov decision processes.Investigation of two task domains,single-robot herding and multirobot foraging,shows that theoretical bounds hold and that acceptable policies emerge,which reduce task completion time,computational cost,and/or memory resources compared to learning without hierarchical representations and with no social knowledge. 相似文献
18.
Zobel C.W. Scherer W.T. 《IEEE transactions on systems, man, and cybernetics. Part A, Systems and humans : a publication of the IEEE Systems, Man, and Cybernetics Society》2001,31(6):609-622
This paper presents a new problem-solving approach, termed simulation-based policy generation (SPG), that is able to generate solutions to problems that may otherwise be computationally intractable. The SPG method uses a simulation of the original problem to create an approximating Markov decision process (MDP) model which is then solved via traditional MDP solution approaches. Since this approximating MDP is a fairly rich and robust sequential optimization model, solution policies can be created which represent an intelligent and structured search of the policy space. An important feature of the SPG approach is its adaptive nature, in that it uses the original simulation model to generate improved aggregation schemes, allowing the approach to be applied in situations where the underlying problem structure is largely unknown. In order to illustrate the performance of the SPG methodology, we apply it to a common but computationally complex problem of inventory control, and we briefly discuss its application to a large-scale telephone network routing problem 相似文献
19.
We consider average reward Markov decision processes with discrete time parameter and denumerable state space. We are concerned with the following problem: Find necessary and sufficient conditions so that, for arbitrary bounded reward function, the corresponding average reward optimality equation has a bounded solution. This problem is solved for a class of systems including the case in which, under the action of any stationary policy, the state space is an irreducible positive recurrent class. 相似文献
20.
This paper is concerned with filtering of hidden Markov processes (HMP) which possess (or approximately possess) the property of lumpability. This property is a generalization of the property of lumpability of a Markov chain which has been previously addressed by others. In essence, the property of lumpability means that there is a partition of the (atomic) states of the Markov chain into aggregated sets which act in a similar manner as far as the state dynamics and observation statistics are concerned. We prove necessary and sufficient conditions on the HMP for exact lumpability to hold. For a particular class of hidden Markov models (HMM), namely finite output alphabet models, conditions for lumpability of all HMP representable by a specified HMM are given. The corresponding optimal filter algorithms for the aggregated states are then derived. The paper also describes an approach to efficient suboptimal filtering for HMP which are approximately lumpable. By this we mean that the HMM generating the process may be approximated by a lumpable HMM. This approach involves directly finding a lumped HMM which approximates the original HMM well, in a matrix norm sense. An alternative approach for model reduction based on approximating a given HMM by an exactly lumpable HMM is also derived. This method is based on the alternating convex projections algorithm. Some simulation examples are presented which illustrate the performance of the suboptimal filtering algorithms 相似文献