首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In this work, we establish a framework for balanced realization of linear systems subject to regime switching modulated by a continuous-time Markov chain with a finite state space. First, a definition of balanced realization is given. Then a ρ-balanced realization is developed to approximate the system of balancing equations, which is a system of time-varying algebraic equations. When the state space of the Markov chain is large, the computational effort becomes a real concern. To resolve this problem, we introduce a two-time-scale formulation and use decomposition/aggregation and averaging techniques to reduce the computational complexity. Based on the two-time-scale formulation, further approximation procedures are developed. Numerical examples are also presented for demonstration.  相似文献   

2.
In this paper, the optimal maintenance policy for a multi-state system with no observation is considered. Different from most existing works, only a limited number of imperfect preventive maintenance actions can be performed between two successive replacements. Assume that the system's deterioration state cannot be observed during its operation expected after each replacement, and it evolves as a discrete-time Markov chain with a finite state space. After choosing the information state as state variable, the problem is then formulated as a Markov decision process over the infinite time horizon. In order to increase the computational efficiency, several key structural properties are developed by minimising the total expected cost per unit time. The existence of the optimal threshold-type maintenance policy is proved and the monotonicity of the threshold is obtained. Finally, a numerical example is given to illustrate the optimal policy.  相似文献   

3.
A Bayesian Markov chain Monte Carlo methodology is developed for the estimation of multivariate linear Gaussian state space models. In particular, an efficient simulation smoothing algorithm is proposed that makes use of the univariate representation of the state space model. Substantial gains over existing algorithms in computational efficiency are achieved using the new simulation smoother for the analysis of high dimensional multivariate time series.The methodology is used to analyse a multivariate time series dataset of the Normalised Difference Vegetation Index (NDVI), which is a proxy for the level of live vegetation, for a particular grazing property located in Queensland, Australia.  相似文献   

4.
In this paper we present a new algorithm for the approximate transient analysis of large stochastic models. The new algorithm is based on the self-correcting analysis principle for continuous-time Markov chains (CTMC). The approach uses different time dependent aggregations of the CTMC of a stochastic model. With the method of uniformization the transient state probabilities of each aggregated CTMC for a time step are calculated. The derived probabilities are used for the construction of stronger aggregations, which are applied for the correction of the transition rates of the previous aggregations. This is done step by step, until the final time is reached. High aggregations of the original continuous-time Markov chain lead to a time and space efficient computational effort. Therefore the approximate transient analysis method based on the self-correcting aggregation can be used for models with large state spaces. For queuing networks with phase-type distributions of the service times this newly developed algorithm is implemented in WinPEPSY-QNS, a tool for performance evaluation and prediction of stochastic models based on queuing networks. It consists of a graphical editor for the construction of queuing networks and an easy-to-use evaluation component, which offers suitable analysis methods. The newly implemented algorithm is used for the analysis of several examples, and the results are compared to the results of simulation runs where exact values cannot be achieved.  相似文献   

5.
This paper observes a job search problem on a partially observable Markov chain, which can be considered as an extension of a job search in a dynamic economy in [1]. This problem is formulated as the state changes according to a partially observable Markov chain, i.e., the current state cannot be observed but there exists some information regarding what a present state is. All information about the unobservable state are summarized by the probability distributions on the state space, and we employ the Bayes' theorem as a learning procedure. The total positivity of order two, or simply TP2, is a fundamental property to investigate sequential decision problems, and it also plays an important role in the Bayesian learning procedure for a partially observable Markov process. By using this property, we consider some relationships among prior and posterior information, and the optimal policy. We will also observe the probabilities to make a transition into each state after some additional transitions by empolying the optimal policy. In the stock market, suppose that the states correspond to the business situation of one company and if there is a state designating the default, then the problem is what time the stocks are sold off before bankrupt, and the probability to become bankrupt will be also observed.  相似文献   

6.
宋春跃  WANG Hui  李平 《自动化学报》2008,34(8):1028-1032
针对含扩散项的线性混杂切换系统优化控制问题, 为降低优化求解的计算复杂性, 提出了Monte Carlo统计预测方法. 首先通过数值求解技术把连续时间优化控制问题转化为离散时间的Markov决策过程问题; 然后在若干有限状态子空间内, 利用反射边界技术来求解相应子空间的最优控制策略; 最后根据最优控制策略的结构特性, 采用统计预测方法来预测出整个状态空间的最优控制策略. 该方法能有效降低求解涉及大状态空间及多维变量的线性混杂切换系统优化控制的计算复杂性, 文末的仿真结果验证了方法的有效性.  相似文献   

7.
The solution of Markov Decision Processes (MDPs) often relies on special properties of the processes. For two-level MDPs, the difference in the rates of state changes of the upper and lower levels has led to limiting or approximate solutions of such problems. In this paper, we solve a two-level MDP without making any assumption on the rates of state changes of the two levels. We first show that such a two-level MDP is a non-standard one where the optimal actions of different states can be related to each other. Then we give assumptions (conditions) under which such a specially constrained MDP can be solved by policy iteration. We further show that the computational effort can be reduced by decomposing the MDP. A two-level MDP with M upper-level states can be decomposed into one MDP for the upper level and M to M(M-1) MDPs for the lower level, depending on the structure of the two-level MDP. The upper-level MDP is solved by time aggregation, a technique introduced in a recent paper [Cao, X.-R., Ren, Z. Y., Bhatnagar, S., Fu, M., & Marcus, S. (2002). A time aggregation approach to Markov decision processes. Automatica, 38(6), 929-943.], and the lower-level MDPs are solved by embedded Markov chains.  相似文献   

8.
This paper studies maintenance policies for multi-component systems which have failure interaction among their components. Component failure might accelerate deterioration processes or induce instantaneous failures of the remaining components. We formulate this maintenance problem as a Markov decision process (MDP) with an objective of minimising a total discounted maintenance cost. However, the action set and state space in MDP exponentially grow as the number of components increases. This makes traditional approaches computationally intractable. To deal with this curse of dimensionality, a modified iterative aggregation procedure (MIAP) is proposed. We mathematically prove that iterations in MIAP guarantee the convergence and the policy obtained is optimal. Numerical case studies find that failure interaction should not be ignored in a maintenance policy decision making and the proposed MIAP is faster and requires less computational memory size than that of linear programming.  相似文献   

9.
State based analysis of Markovian models is faced with the problem of state space explosion. To handle huge state spaces often compositional modeling and aggregation of components are used. Exact aggregation resulting in exact transient or stationary results is only possible in some cases, when the Markov process is lumpable. Therefore approximate aggregation is often applied to reduce the state space. Several approximate aggregation methods exist which are usually based on heuristics.This paper presents a new aggregation approach for Markovian components which computes aggregates that minimize the difference according to some algebraically defined function which describes the difference between the component and the aggregate. If the difference becomes zero, aggregation is exact, which means that component and aggregate are indistinguishable in the sense that transient and stationary results in any environment are identical. For the computation of aggregates, an alternating least squares approach is presented which tries to minimize the norm-wise difference between the original component and the aggregate. Algorithms to compute aggregates are also introduced and the quality of the approximation is evaluated by means of several examples.  相似文献   

10.
This study compares approximation techniques for the estimation of the operational availability of a corrective maintenance system. The assessment is based on the practical maintenance of safety equipments in operation in 38 Italian Airports. A single echelon one-for-one ordering policy with complete pooling is analyzed, with a deterministic rule for lateral transshipments. With this policy, the state probabilities of the associated Markov model cannot be expressed in product form. Since the exact computation of the state probabilities is not practical as the number of states in the Markov chain increases, this study describes three approximation techniques and assesses their performance in terms of computational effort, memory requirement and error with respect to the exact value. The first two techniques are based on a method by Alfredsson and Verrijdt and on the Equivalent Random Traffic method, respectively. The idea of both methods is to approximate the state probabilities with a product form, so that the Markov chain can be decomposed. The third technique is based on the multi-dimensional scaling down approach, which studies an equivalent reduced Markov chain rather than decomposing the original one.  相似文献   

11.
对一类用于分布式VOD系统的混合了单步k随机漫步和全局中心索引的资源定位服务,用基于Markov过程的模型来描述其中的定位过程,并引入可以描述额外信息的事件,使得其控制不但依赖于状态,而且还依赖于事件.在此基础上,给出了可以直接处理事件的策略优化方法以及状态聚集的相关问题.最后,通过实例仿真验证了所给出的模型和方法.  相似文献   

12.
吳滄浦 《自动化学报》1964,2(3):146-154
本文研究了一种最优马尔可夫控制系统,这种控制系统以统计规律依赖于决定序列的马尔可夫链描述.我们称决定序列为决策.存在一具有下述性质的目标状态,一日系统到达此状态,状态就不再改变.我们的目的是要选取一决策,使所有从每一初始状态出发最终到达此目标状态的概率都达到最大.我们先提出在平稳决策集合中求最优决策的决策迭代法.然后证明,此决策在包含平稳及不平稳决策的决策集合上也是最优的.  相似文献   

13.
This article presents a connection admission control (CAC) algorithm for UMTS networks based on the Markov decision process (MDP) approach. To deal with the non-stationary environment due to the time-varying statistical characteristics of the offered traffic, the admission policy has to be computed periodically based on on-line measurements, and the optimal policy computation is excessively time-consuming to be performed on-line. Thus, this article proposes a reduction of the policy space coupled with an aggregation of the state space for the fast computation of a sub-optimal admission policy. Theoretical results and numerical simulations show the effectiveness of the proposed approach.  相似文献   

14.
In this paper we discuss policy iteration methods for approximate solution of a finite-state discounted Markov decision problem, with a focus on feature-based aggregation methods and their connection with deep reinforcement learning schemes. We introduce features of the states of the original problem, and we formulate a smaller "aggregate" Markov decision problem, whose states relate to the features. We discuss properties and possible implementations of this type of aggregation, including a new approach to approximate policy iteration. In this approach the policy improvement operation combines feature-based aggregation with feature construction using deep neural networks or other calculations. We argue that the cost function of a policy may be approximated much more accurately by the nonlinear function of the features provided by aggregation, than by the linear function of the features provided by neural networkbased reinforcement learning, thereby potentially leading to more effective policy improvement.   相似文献   

15.
We describe a framework for analyzing probabilistic reachability and safety problems for discrete time stochastic hybrid systems within a dynamic games setting. In particular, we consider finite horizon zero-sum stochastic games in which a control has the objective of reaching a target set while avoiding an unsafe set in the hybrid state space, and a rational adversary has the opposing objective. We derive an algorithm for computing the maximal probability of achieving the control objective, subject to the worst-case adversary behavior. From this algorithm, sufficient conditions of optimality are also derived for the synthesis of optimal control policies and worst-case disturbance strategies. These results are then specialized to the safety problem, in which the control objective is to remain within a safe set. We illustrate our modeling framework and computational approach using both a tutorial example with jump Markov dynamics and a practical application in the domain of air traffic management.  相似文献   

16.
CTMDP基于随机平稳策略的仿真优化算法   总被引:2,自引:2,他引:2  
基于Markov性能势理论和神经元动态规划(NDP)方法,研究一类连续时间Markov决 策过程(MDP)在随机平稳策略下的仿真优化问题,给出的算法是把一个连续时间过程转换成其 一致化Markov链,然后通过其单个样本轨道来估计平均代价性能指标关于策略参数的梯度,以 寻找次优策略,该方法适合于解决大状态空间系统的性能优化问题.并给出了一个受控Markov 过程的数值实例.  相似文献   

17.
A primary challenge of agent-based policy learning in complex and uncertain environments is escalating computational complexity with the size of the task space(action choices and world states) and the number of agents.Nonetheless,there is ample evidence in the natural world that high-functioning social mammals learn to solve complex problems with ease,both individually and cooperatively.This ability to solve computationally intractable problems stems from both brain circuits for hierarchical representation of state and action spaces and learned policies as well as constraints imposed by social cognition.Using biologically derived mechanisms for state representation and mammalian social intelligence,we constrain state-action choices in reinforcement learning in order to improve learning efficiency.Analysis results bound the reduction in computational complexity due to stateion,hierarchical representation,and socially constrained action selection in agent-based learning problems that can be described as variants of Markov decision processes.Investigation of two task domains,single-robot herding and multirobot foraging,shows that theoretical bounds hold and that acceptable policies emerge,which reduce task completion time,computational cost,and/or memory resources compared to learning without hierarchical representations and with no social knowledge.  相似文献   

18.
This paper presents a new problem-solving approach, termed simulation-based policy generation (SPG), that is able to generate solutions to problems that may otherwise be computationally intractable. The SPG method uses a simulation of the original problem to create an approximating Markov decision process (MDP) model which is then solved via traditional MDP solution approaches. Since this approximating MDP is a fairly rich and robust sequential optimization model, solution policies can be created which represent an intelligent and structured search of the policy space. An important feature of the SPG approach is its adaptive nature, in that it uses the original simulation model to generate improved aggregation schemes, allowing the approach to be applied in situations where the underlying problem structure is largely unknown. In order to illustrate the performance of the SPG methodology, we apply it to a common but computationally complex problem of inventory control, and we briefly discuss its application to a large-scale telephone network routing problem  相似文献   

19.
We consider average reward Markov decision processes with discrete time parameter and denumerable state space. We are concerned with the following problem: Find necessary and sufficient conditions so that, for arbitrary bounded reward function, the corresponding average reward optimality equation has a bounded solution. This problem is solved for a class of systems including the case in which, under the action of any stationary policy, the state space is an irreducible positive recurrent class.  相似文献   

20.
This paper is concerned with filtering of hidden Markov processes (HMP) which possess (or approximately possess) the property of lumpability. This property is a generalization of the property of lumpability of a Markov chain which has been previously addressed by others. In essence, the property of lumpability means that there is a partition of the (atomic) states of the Markov chain into aggregated sets which act in a similar manner as far as the state dynamics and observation statistics are concerned. We prove necessary and sufficient conditions on the HMP for exact lumpability to hold. For a particular class of hidden Markov models (HMM), namely finite output alphabet models, conditions for lumpability of all HMP representable by a specified HMM are given. The corresponding optimal filter algorithms for the aggregated states are then derived. The paper also describes an approach to efficient suboptimal filtering for HMP which are approximately lumpable. By this we mean that the HMM generating the process may be approximated by a lumpable HMM. This approach involves directly finding a lumped HMM which approximates the original HMM well, in a matrix norm sense. An alternative approach for model reduction based on approximating a given HMM by an exactly lumpable HMM is also derived. This method is based on the alternating convex projections algorithm. Some simulation examples are presented which illustrate the performance of the suboptimal filtering algorithms  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号