期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

CTMDP基于随机平稳策略的仿真优化算法 总被引：4，自引：2，他引：2

唐昊奚宏生殷保群《自动化学报》2004,30(2):229-234

基于Markov性能势理论和神经元动态规划(NDP)方法,研究一类连续时间Markov决策过程(MDP)在随机平稳策略下的仿真优化问题,给出的算法是把一个连续时间过程转换成其一致化Markov链,然后通过其单个样本轨道来估计平均代价性能指标关于策略参数的梯度,以寻找次优策略,该方法适合于解决大状态空间系统的性能优化问题.并给出了一个受控Markov过程的数值实例. 相似文献

2.

Markov控制过程基于单个样本轨道的在线优化算法 总被引：3，自引：1，他引：3

下载免费PDF全文

唐昊奚宏生殷保群《控制理论与应用》2002,19(6):865-871

在Markov性能势理论基础上, 研究了Markov控制过程的性能优化算法. 不同于传统的基于计算的方法, 文中的算法是根据单个样本轨道的仿真来估计性能指标关于策略参数的梯度, 以寻找最优 (或次优 )随机平稳策略. 由于可根据不同实际系统的特征来选择适当的算法参数, 因此它能满足不同实际工程系统在线优化的需要. 最后简要分析了这些算法在一个无限长的样本轨道上以概率 1的收敛性, 并给出了一个三状态受控Markov过程的数值实例. 相似文献

3.

半Markov 控制过程在折扣代价准则下的最优平稳策略 总被引：1，自引：1，他引：0

殷保群李衍杰周亚平奚宏生《控制与决策》2004,19(6):691-694

讨论一类半Markov控制过程(SMCP)的折扣代价性能优化问题．通过引入一个矩阵,该矩阵可作为一个Markov过程的无穷小矩阵,对一个SMCP定义了折扣Poisson方程,并由这个方程定义了α-势．基于α-势,给出了由最优平稳策略所满足的最优性方程．最后给出一个求解最优平稳策略的迭代算法,并提供一个数值例子以表明该算法的应用．相似文献

4.

连续时间部分可观Markov决策过程的策略梯度估计

下载免费PDF全文

唐波李衍杰殷保群《控制理论与应用》2009,26(7):805-808

针对连续时间部分可观Markov决策过程(CTPOMDP)的优化问题,本文提出一种策略梯度估计方法. 运用一致化方法,将离散时间部分可观Markov决策过程(DTPOMDP)的梯度估计算法推广到连续时间模型, 研究了算法的收敛性和误差估计问题,并用一个数值例子来说明该算法的应用. 相似文献

5.

一类可数Markov控制过程的最优平稳策略

下载免费PDF全文

殷保群李衍杰奚宏生周亚平《控制理论与应用》2005,22(1):43-46

研究了一类具有可数状态空间的Markov控制过程在无限水平平均代价准则下的最优平稳策略问题.对此类过程,引入了折扣Poisson方程,运用无穷小矩阵和性能势的基本性质,导出了平均代价模型在紧致行动集上的最优性方程,并证明了其解的一个存在性定理. 相似文献

6.

Markov决策过程的蚁群规划算法

柴啸龙胡桂武陈蔼祥《计算机工程与应用》2010,46(20):40-41

在智能规划问题上,寻找规划解都是NP甚至NP完全问题,如果动作的执行效果带有不确定性,如在Markov决策过程的规划问题中,规划的求解将会更加困难,现有的Markov决策过程的规划算法往往用一个整体状态节点来描述某个动作的实际执行效果,试图回避状态内部的复杂性,而现实中的大量动作往往都会产生多个命题效果,对应多个命题节点。为了能够处理和解决这个问题,提出了映像动作,映像路节和映像规划图等概念,并在其基础上提出了Markov决策过程的蚁群规划算法,从而解决了这一问题。并且证明了算法得到的解,即使在不确定的执行环境下,也具有不低于一定概率的可靠性。相似文献

7.

半Markov决策过程折扣模型与平均模型之间的关系

下载免费PDF全文

殷保群李衍杰唐昊代桂平奚宏生《控制理论与应用》2006,23(1):65-68

首先分别在折扣代价与平均代价性能准则下,讨论了一类半M arkov决策问题.基于性能势方法,导出了由最优平稳策略所满足的最优性方程.然后讨论了两种模型之间的关系,表明了平均模型的有关结论,可以通过对折扣模型相应结论取折扣因子趋于零时的极限来得到. 相似文献

8.

动态电源管理的随机切换模型与策略优化 总被引：2，自引：0，他引：2

江琦奚宏生殷保群《计算机辅助设计与图形学学报》2006,18(5):680-686

提出一种基于连续时间Markov决策过程的动态电源管理策略优化方法.通过建立动态电源管理系统的随机切换模型,将动态电源管理问题转化为带约束的策略优化问题,并给出一种基于矢量合成的策略梯度优化算法.随机切换模型对动态电源管理系统的描述精确,策略优化算法简便有效,既能离线计算,也适用于在线优化.仿真实验验证了该方法的有效性. 相似文献

9.

基于Markov随机过程的动态合作博弈的模糊稳定集

刘天虎许维胜吴启迪《计算机工程与应用》2009,45(12):15-19

利用模糊数学相关理论,对具有可转移效用的动态合作博弈的区间模糊稳定集进行了研究。首先利用Markov随机过程对动态合作联盟的结构转移进行描述,并考虑到支付函数是三角模糊数的情形,构造了在不同置信度α下的合作博弈的截集取值区域,进而结合动态联盟状态转移矩阵计算出不同时刻点的区间模糊稳定集。考虑到盟友在合作结束后需要对具体的联盟收益进行分配,利用构造的区间模糊稳定集给出了盟友可行的收益分配势值区间。最后利用实例对该方法的有效性和可行性进行了说明。相似文献

10.

具有非有理谱平稳随机过程仿真的谱方法

下载免费PDF全文

赵希人刘胜《自动化学报》1990,16(2):161-165

本文依据平稳随机过程谱分解理论,推导出具有非有理谱平稳随机过程的仿真过程的数学模型,并给出仿真误差公式,最后介绍它在海浪模拟中的应用. 相似文献

11.

Process Simulation Using Randomized Markov Chain and Truncated Marginal Distribution

Rodionov Alexei S. Choo Hyunseung Youn Hee Y. 《The Journal of supercomputing》2002,22(1):69-85

Generating pseudo random objects is one of the key issues in computer simulation of complex systems. Most earlier simulation systems include procedures for the generation of independent and identically distributed random variables or some classical random processes, such as white noise. In this paper we propose a new approach to the generation of wide ranges of processes that are characterized by marginal distribution and autocorrelation function that are significant in many cases. The proposed algorithm is based on the use of truncated distribution that gives more simplicity and efficiency in comparison with the previous one. The effectiveness of the proposed algorithm is verified using computer simulation of various real examples. 相似文献

12.

NDPS：一种无线多媒体网络分组调度算法 总被引：1，自引：1，他引：0

余荣贾志鹏梅顺良《计算机工程》2008,34(12):70-72

分组调度是实现未来无线多媒体网络的关键技术之一。解决该技术的主要困难在于无线链路的高差错率、业务类型的多样性和分组到达模型的未知性。该文引入马尔可夫决策过程对分组调度过程进行建模,运用神经动态规划方法求解相应的马尔可夫决策过程问题。提出一种无线多媒体网络分组调度算法(NDPS)可以同时实现3个性能目标：对不同业务类型提供差分服务,最大化无线带宽的利用率和保证服务公平性。仿真实验结果证明,NDPS算法比两种流行的调度算法具有更好的性能。相似文献

13.

基于神经元动态规划的可重入生产系统调度的仿真框架

王颖朱顺痣许威缪克华李茂青《信息与控制》2007,36(2):218-223

提出一个基于神经元动态规划解决可重入生产系统调度问题的仿真框架．根据可重入生产系统的特点建立状态集,并将调度问题表示成相应的马尔可夫决策过程．选择合理的性能指标,采用神经元动态规划产生每一步的调度,并在仿真中优化策略．仿真算例验证了该方法的有效性,三种调度策略的结果比较表明了神经元动态规划方法的优越性．本仿真框架还可拓展至其他类型的生产调度问题．相似文献

14.

Perfect simulation for marked point processes

M.N.M. van Lieshout R.S. Stoica 《Computational statistics & data analysis》2006,51(2):679-698

Some recently proposed exact simulation methods are extended to the case of marked point processes. Four families of algorithms are considered: coupling from the past, the clan of ancestors technique, the Gibbs sampler, and a Metropolis-Hastings algorithm based on birth and death proposals. From a theoretical point of view, conditions are given under which the algorithms yield unbiased samples in finite time. For practical application, a C++ library for marked point processes is described. The various algorithms are tested on several models, including the Widom-Rowlinson mixture model, multi-type pairwise interaction processes, and the Candy line segment model. A simulation study is carried out in order to analyse the proposed methods in terms of speed of convergence in relation to the parameters of the model. For the range of models investigated, the clan of ancestors algorithm using the incompatibility index is the fastest method among the ones analysed, while coupling from the past is applicable to the widest range of parameter values, and usually faster than the Metropolis-Hastings sampler. The latter two methods tend to be cumbersome if the underlying model is neither attractive nor repulsive. If one is prepared to approximate by discretisation, a proper choice of Gibbs sampler makes it possible to obtain samples from models that lack monotonicity or have such a high local stability bound as to rule out coupling from the past or clan of ancestor approaches in practice. 相似文献

15.

Total Travel Cost in Stochastic Assignment Models

Martin L. Hazelton 《Networks and Spatial Economics》2003,3(4):457-466

Deterministic assignment models are sometimes used to approximate properties of more complex stochastic models. One property that is of particular interest from a system optimization viewpoint is total travel cost. This paper looks at the approximation of mean total travel cost. It is shown that deterministic models will underestimate this quantity in many common situations. Furthermore, discrepancies between total travel cost under the different modelling frameworks can lead to situations in which network modifications which are detrimental according to a stochastic model appear beneficial when using the natural deterministic approximation. We conclude that estimation of mean travel cost in stochastic assignment is often best done using simulation. Some suggestions are made regarding the implementation of traffic assignment simulation. 相似文献

16.

平均和折扣准则MDP基于TD(0)学习的统一NDP方法 总被引：3，自引：0，他引：3

下载免费PDF全文

唐昊周雷袁继彬《控制理论与应用》2006,23(2):292-296

为适应实际大规模M arkov系统的需要,讨论M arkov决策过程(MDP)基于仿真的学习优化问题.根据定义式,建立性能势在平均和折扣性能准则下统一的即时差分公式,并利用一个神经元网络来表示性能势的估计值,导出参数TD(0)学习公式和算法,进行逼近策略评估;然后,根据性能势的逼近值,通过逼近策略迭代来实现两种准则下统一的神经元动态规划(neuro-dynam ic programm ing,NDP)优化方法.研究结果适用于半M arkov决策过程,并通过一个数值例子,说明了文中的神经元策略迭代算法对两种准则都适用,验证了平均问题是折扣问题当折扣因子趋近于零时的极限情况. 相似文献

17.

On the Convergence of Temporal-Difference Learning with Linear Function Approximation 总被引：1，自引：0，他引：1

Vladislav Tadić 《Machine Learning》2001,42(3):241-267

The asymptotic properties of temporal-difference learning algorithms with linear function approximation are analyzed in this paper. The analysis is carried out in the context of the approximation of a discounted cost-to-go function associated with an uncontrolled Markov chain with an uncountable finite-dimensional state-space. Under mild conditions, the almost sure convergence of temporal-difference learning algorithms with linear function approximation is established and an upper bound for their asymptotic approximation error is determined. The obtained results are a generalization and extension of the existing results related to the asymptotic behavior of temporal-difference learning. Moreover, they cover cases to which the existing results cannot be applied, while the adopted assumptions seem to be the weakest possible under which the almost sure convergence of temporal-difference learning algorithms is still possible to be demonstrated. 相似文献

18.

Estimation of Stationary Efficiency of a Production Line with Two Unreliable Aggregates

A. A. Pogorui A. F. Turbin 《Cybernetics and Systems Analysis》2002,38(6):823-829

The coefficient of stationary average productivity of a system of the type "aggregate 1 + bunker 1 + aggregate 2 + bunker 2 + customer" is investigated as a function of system parameters in the balance case. 相似文献