首页 | 本学科首页   官方微博 | 高级检索  
     

CTMDP基于随机平稳策略的仿真优化算法
引用本文:唐昊,奚宏生,殷保群.CTMDP基于随机平稳策略的仿真优化算法[J].自动化学报,2004,30(2):229-234.
作者姓名:唐昊  奚宏生  殷保群
作者单位:1.中国科学技术大学自动化系,合肥;
基金项目:National Natural Science Foundation of P.R.China(60274012),the Natural Science Foundation of Anhui Province(01042308)
摘    要:基于Markov性能势理论和神经元动态规划(NDP)方法,研究一类连续时间Markov决 策过程(MDP)在随机平稳策略下的仿真优化问题,给出的算法是把一个连续时间过程转换成其 一致化Markov链,然后通过其单个样本轨道来估计平均代价性能指标关于策略参数的梯度,以 寻找次优策略,该方法适合于解决大状态空间系统的性能优化问题.并给出了一个受控Markov 过程的数值实例.

关 键 词:性能势    神经元动态规划    仿真优化
收稿时间:2002-8-8

A Simulation Optimization Algorithm for CTMDPs Based on Randomized Stationary Policies
TANG Hao,XI Hong-Sheng,YIN Bao-Qun.A Simulation Optimization Algorithm for CTMDPs Based on Randomized Stationary Policies[J].Acta Automatica Sinica,2004,30(2):229-234.
Authors:TANG Hao  XI Hong-Sheng  YIN Bao-Qun
Affiliation:1.Department of Automaiton,University of Science and Technology of China,Hefei;Department of Computer,Hefei University of Technology,Hefei
Abstract:Based on the theory of Markov performance potentials and neuro-dynamic programming (NDP) methodology, we study simulation optimization algorithm for a class of continuous time Markov decision processes (CTMDPs) under randomized stationary policies. The proposed algorithm will estimate the gradient of average cost performance measure with respect to policy parameters by transforming a continuous time Markov process into a uniform Markov chain and simulating a single sample path of the chain. The goal is to look for a suboptimal randomized stationary policy. The algorithm derived here can meet the needs of performance optimization of many difficult systems with large-scale state space. Finally, a numerical example for a controlled Markov process is provided.
Keywords:Performance potentials  neuro-dynamic programming  simulation optimization
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《自动化学报》浏览原始摘要信息
点击此处可从《自动化学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号