共查询到18条相似文献,搜索用时 62 毫秒
1.
2.
在Markov性能势理论基础上, 研究了Markov控制过程的性能优化算法. 不同于传统的基于计算的方法, 文中的算法是根据单个样本轨道的仿真来估计性能指标关于策略参数的梯度, 以寻找最优 (或次优 )随机平稳策略. 由于可根据不同实际系统的特征来选择适当的算法参数, 因此它能满足不同实际工程系统在线优化的需要. 最后简要分析了这些算法在一个无限长的样本轨道上以概率 1的收敛性, 并给出了一个三 状态受控Markov过程的数值实例. 相似文献
3.
4.
5.
6.
在智能规划问题上,寻找规划解都是NP甚至NP完全问题,如果动作的执行效果带有不确定性,如在Markov决策过程的规划问题中,规划的求解将会更加困难,现有的Markov决策过程的规划算法往往用一个整体状态节点来描述某个动作的实际执行效果,试图回避状态内部的复杂性,而现实中的大量动作往往都会产生多个命题效果,对应多个命题节点。为了能够处理和解决这个问题,提出了映像动作,映像路节和映像规划图等概念,并在其基础上提出了Markov决策过程的蚁群规划算法,从而解决了这一问题。并且证明了算法得到的解,即使在不确定的执行环境下,也具有不低于一定概率的可靠性。 相似文献
7.
8.
动态电源管理的随机切换模型与策略优化 总被引:2,自引:0,他引:2
提出一种基于连续时间Markov决策过程的动态电源管理策略优化方法.通过建立动态电源管理系统的随机切换模型,将动态电源管理问题转化为带约束的策略优化问题,并给出一种基于矢量合成的策略梯度优化算法.随机切换模型对动态电源管理系统的描述精确,策略优化算法简便有效,既能离线计算,也适用于在线优化.仿真实验验证了该方法的有效性. 相似文献
9.
利用模糊数学相关理论,对具有可转移效用的动态合作博弈的区间模糊稳定集进行了研究。首先利用Markov随机过程对动态合作联盟的结构转移进行描述,并考虑到支付函数是三角模糊数的情形,构造了在不同置信度α下的合作博弈的截集取值区域,进而结合动态联盟状态转移矩阵计算出不同时刻点的区间模糊稳定集。考虑到盟友在合作结束后需要对具体的联盟收益进行分配,利用构造的区间模糊稳定集给出了盟友可行的收益分配势值区间。最后利用实例对该方法的有效性和可行性进行了说明。 相似文献
10.
11.
Generating pseudo random objects is one of the key issues in computer simulation of complex systems. Most earlier simulation systems include procedures for the generation of independent and identically distributed random variables or some classical random processes, such as white noise. In this paper we propose a new approach to the generation of wide ranges of processes that are characterized by marginal distribution and autocorrelation function that are significant in many cases. The proposed algorithm is based on the use of truncated distribution that gives more simplicity and efficiency in comparison with the previous one. The effectiveness of the proposed algorithm is verified using computer simulation of various real examples. 相似文献
12.
13.
14.
Some recently proposed exact simulation methods are extended to the case of marked point processes. Four families of algorithms are considered: coupling from the past, the clan of ancestors technique, the Gibbs sampler, and a Metropolis-Hastings algorithm based on birth and death proposals. From a theoretical point of view, conditions are given under which the algorithms yield unbiased samples in finite time. For practical application, a C++ library for marked point processes is described. The various algorithms are tested on several models, including the Widom-Rowlinson mixture model, multi-type pairwise interaction processes, and the Candy line segment model. A simulation study is carried out in order to analyse the proposed methods in terms of speed of convergence in relation to the parameters of the model. For the range of models investigated, the clan of ancestors algorithm using the incompatibility index is the fastest method among the ones analysed, while coupling from the past is applicable to the widest range of parameter values, and usually faster than the Metropolis-Hastings sampler. The latter two methods tend to be cumbersome if the underlying model is neither attractive nor repulsive. If one is prepared to approximate by discretisation, a proper choice of Gibbs sampler makes it possible to obtain samples from models that lack monotonicity or have such a high local stability bound as to rule out coupling from the past or clan of ancestor approaches in practice. 相似文献
15.
Martin L. Hazelton 《Networks and Spatial Economics》2003,3(4):457-466
Deterministic assignment models are sometimes used to approximate properties of more complex stochastic models. One property that is of particular interest from a system optimization viewpoint is total travel cost. This paper looks at the approximation of mean total travel cost. It is shown that deterministic models will underestimate this quantity in many common situations. Furthermore, discrepancies between total travel cost under the different modelling frameworks can lead to situations in which network modifications which are detrimental according to a stochastic model appear beneficial when using the natural deterministic approximation. We conclude that estimation of mean travel cost in stochastic assignment is often best done using simulation. Some suggestions are made regarding the implementation of traffic assignment simulation. 相似文献
16.
为适应实际大规模M arkov系统的需要,讨论M arkov决策过程(MDP)基于仿真的学习优化问题.根据定义式,建立性能势在平均和折扣性能准则下统一的即时差分公式,并利用一个神经元网络来表示性能势的估计值,导出参数TD(0)学习公式和算法,进行逼近策略评估;然后,根据性能势的逼近值,通过逼近策略迭代来实现两种准则下统一的神经元动态规划(neuro-dynam ic programm ing,NDP)优化方法.研究结果适用于半M arkov决策过程,并通过一个数值例子,说明了文中的神经元策略迭代算法对两种准则都适用,验证了平均问题是折扣问题当折扣因子趋近于零时的极限情况. 相似文献
17.
On the Convergence of Temporal-Difference Learning with Linear Function Approximation 总被引:1,自引:0,他引:1
Vladislav Tadić 《Machine Learning》2001,42(3):241-267
The asymptotic properties of temporal-difference learning algorithms with linear function approximation are analyzed in this paper. The analysis is carried out in the context of the approximation of a discounted cost-to-go function associated with an uncontrolled Markov chain with an uncountable finite-dimensional state-space. Under mild conditions, the almost sure convergence of temporal-difference learning algorithms with linear function approximation is established and an upper bound for their asymptotic approximation error is determined. The obtained results are a generalization and extension of the existing results related to the asymptotic behavior of temporal-difference learning. Moreover, they cover cases to which the existing results cannot be applied, while the adopted assumptions seem to be the weakest possible under which the almost sure convergence of temporal-difference learning algorithms is still possible to be demonstrated. 相似文献
18.
The coefficient of stationary average productivity of a system of the type "aggregate 1 + bunker 1 + aggregate 2 + bunker 2 + customer" is investigated as a function of system parameters in the balance case. 相似文献