期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

全文获取类型

免费	0篇
国内免费	3篇

专业分类

自动化技术

3篇

出版年

2006年	1篇
2005年	2篇

排序方式： 共有3条查询结果，搜索用时 15 毫秒

SMDP基于性能势的神经元动态规划 总被引：7，自引：0，他引：7

唐昊袁继彬陆阳程文娟《自动化学报》2005,31(4):642-645

An alpha-uniformized Markov chain is defined by the concept of equivalent infinitesimal generator for a semi-Markov decision process (SMDP) with both average- and discounted-criteria. According to the relations of their performance measures and performance potentials, the optimization of an SMDP can be realized by simulating the chain. For the critic model of neuro-dynamic programming (NDP), a neuro-policy iteration (NPI) algorithm is presented, and the performance error bound is shown as there are approximate error and improvement error in each iteration step. The obtained results may be extended to Markov systems, and have much applicability. Finally, a numerical example is provided. 相似文献

具有不确定性路径概率的闭排队网络鲁棒控制策略 总被引：1，自引：0，他引：1

唐昊奚宏生韩江洪袁继彬《自动化学报》2005,31(3):446-450

The paper is concerned with the robust control problems for exponential controlled closed queuing networks (CCQNs) under uncertain routing probabilities. As the rows of some parameter matrices such as infinitesimal generators may be dependent, we first transform the objective vector under discounted-cost criteria into a weighed-average cost. Through the solution to Poisson equation, i.e., Markov performance potentials, we then unify both discounted-cost and average-cost problems to study, and derive the gradient formula of the new objective function with respect to the routing probabilities. Some solution techniques are related for searching the optimal robust control policy. Finally, a numerical example is presented and analyzed. 相似文献

平均和折扣准则MDP基于TD(0)学习的统一NDP方法 总被引：3，自引：0，他引：3

唐昊周雷袁继彬《控制理论与应用》2006,23(2):292-296

为适应实际大规模M arkov系统的需要,讨论M arkov决策过程(MDP)基于仿真的学习优化问题.根据定义式,建立性能势在平均和折扣性能准则下统一的即时差分公式,并利用一个神经元网络来表示性能势的估计值,导出参数TD(0)学习公式和算法,进行逼近策略评估;然后,根据性能势的逼近值,通过逼近策略迭代来实现两种准则下统一的神经元动态规划(neuro-dynam ic programm ing,NDP)优化方法.研究结果适用于半M arkov决策过程,并通过一个数值例子,说明了文中的神经元策略迭代算法对两种准则都适用,验证了平均问题是折扣问题当折扣因子趋近于零时的极限情况. 相似文献