共查询到17条相似文献,搜索用时 78 毫秒
1.
2.
3.
4.
5.
Markov控制过程基于单个样本轨道的在线优化算法 总被引:3,自引:1,他引:3
在Markov性能势理论基础上, 研究了Markov控制过程的性能优化算法. 不同于传统的基于计算的方法, 文中的算法是根据单个样本轨道的仿真来估计性能指标关于策略参数的梯度, 以寻找最优 (或次优 )随机平稳策略. 由于可根据不同实际系统的特征来选择适当的算法参数, 因此它能满足不同实际工程系统在线优化的需要. 最后简要分析了这些算法在一个无限长的样本轨道上以概率 1的收敛性, 并给出了一个三 状态受控Markov过程的数值实例. 相似文献
6.
7.
应用Markov决策过程与性能势相结合的方法,给出了呼叫接入控制的策略优化算法。所得到的最优策略是状态相关的策略,与基于节点已占用带宽决定行动的策略相比,状态相关策略具有更好的性能值,而且该算法具有很快的收敛速度。 相似文献
8.
9.
基于性能势理论和等价Markov过程方法,研究了一类半Markov决策过程(SMDP)在参数化随机平稳策略下的仿真优化算法,并简要分析了算法的收敛性.通过SMDP的等价Markov过程,定义了一个一致化Markov链,然后根据该一致化Markov链的单个样本轨道来估计SMDP的平均代价性能指标关于策略参数的梯度,以寻找最优(或次优)策略.文中给出的算法是利用神经元网络来逼近参数化随机平稳策略,以节省计算机内存,避免了“维数灾”问题,适合于解决大状态空间系统的性能优化问题.最后给出了一个仿真实例来说明算法的应用. 相似文献
10.
11.
We show the existence of average cost optimal stationary policies for Markov control processes with Borel state space and unbounded costs per stage, under a set of assumptions recently introduced by L.I. Sennott (1989) for control processes with countable state space and finite control sets. 相似文献
12.
13.
Yoshio Ohtsubo 《International Transactions in Operational Research》2007,14(6):509-520
We consider Markov decisions processes with a target set, where criterion function is an expectation of minimum function. We formulate the problem as an infinite horizon case with a recurrent class. We show under some conditions that an optimal value function is a unique solution to an optimality equation and there exists a stationary optimal policy. Also we give a policy improvement method. 相似文献
14.
15.
This paper deals with discrete-time Markov control processes with Borel state space, allowing unbounded costs and noncompact control sets. For these models, the existence of average optimal stationary policies has been recently established under very general assumptions, using an optimality inequality. Here we give a condition, which is a strengtened version of a variant of the ‘vanishing discount factor’ approach, for the optimality equation to hold. 相似文献
16.
Evgueni Gordienko 《Systems & Control Letters》1997,31(1):59
In this paper we study the average cost optimization problem for discrete-time Markov control processes on Borel space with possibly unbounded costs. Under proper ergodicity assumptions we show the existence of stationary policies for which asymptotic (as N → ∞) behavior of some functions of N-horizon costs can be better than for overtaking optimal policies. 相似文献