首页 | 本学科首页   官方微博 | 高级检索  
     

Markov控制过程基于单个样本轨道的在线优化算法
引用本文:唐 昊,奚宏生,殷保群.Markov控制过程基于单个样本轨道的在线优化算法[J].控制理论与应用,2002,19(6):865-871.
作者姓名:唐 昊  奚宏生  殷保群
作者单位:中国科学技术大学自动化系,合肥,230026
基金项目:国家自然科学基金(69974037); 国家高性能计算基金(00208)资助项目.
摘    要:在Markov性能势理论基础上, 研究了Markov控制过程的性能优化算法. 不同于传统的基于计算的方法, 文中的算法是根据单个样本轨道的仿真来估计性能指标关于策略参数的梯度, 以寻找最优 (或次优 )随机平稳策略. 由于可根据不同实际系统的特征来选择适当的算法参数, 因此它能满足不同实际工程系统在线优化的需要. 最后简要分析了这些算法在一个无限长的样本轨道上以概率 1的收敛性, 并给出了一个三 状态受控Markov过程的数值实例.

关 键 词:Markov控制过程    Markov性能势    随机平稳策略    在线优化
文章编号:1000-8152(2002)06-0865-07
收稿时间:2001/5/14 0:00:00
修稿时间:2001年5月14日

On-line optimization algorithm for Markov control processes based on a single sample path
TANG Hao,XI Hong-sheng and YIN Bao-qun.On-line optimization algorithm for Markov control processes based on a single sample path[J].Control Theory & Applications,2002,19(6):865-871.
Authors:TANG Hao  XI Hong-sheng and YIN Bao-qun
Affiliation:Department of Automation, China University of Science and Technology, Hefei 230026, China;Department of Automation, China University of Science and Technology, Hefei 230026, China;Department of Automation, China University of Science and Technology, Hefei 230026, China
Abstract:Based on the theory of Markov performance potentials, this paper studies a performance optimization algorithm for Markov control processes. Different from the traditional computation-based approaches, this algorithm could estimate the gradients of performance with respect to the policy parameters by simulating a single sample path, and look for an optimal (or suboptimal) randomized stationary policy. The algorithm provided here could satisfy the needs of on-line optimization of many different real-world engineering systems, because we can select suitable parameters in the algorithm according to the properties of a real system. Finally, the convergence of the algorithm with probability one on an infinite sample path is considered, and a numerical example for a three-state controlled Markov chain is provided.
Keywords:Markov control processes  Markov performance potentials  randomized stationary policies  on-line optimization
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《控制理论与应用》浏览原始摘要信息
点击此处可从《控制理论与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号