CTMDP基于随机平稳策略的仿真优化算法 A Simulation Optimization Algorithm for CTMDPs Based on Randomized Stationary Policies期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

CTMDP基于随机平稳策略的仿真优化算法

引用本文：	唐昊,奚宏生,殷保群.CTMDP基于随机平稳策略的仿真优化算法[J].自动化学报,2004,30(2):229-234.

作者姓名：	唐昊奚宏生殷保群

作者单位：	1.中国科学技术大学自动化系,合肥;

基金项目：	National Natural Science Foundation of P.R.China(60274012)，the Natural Science Foundation of Anhui Province(01042308)

摘要：	基于Markov性能势理论和神经元动态规划(NDP)方法,研究一类连续时间Markov决策过程(MDP)在随机平稳策略下的仿真优化问题,给出的算法是把一个连续时间过程转换成其一致化Markov链,然后通过其单个样本轨道来估计平均代价性能指标关于策略参数的梯度,以寻找次优策略,该方法适合于解决大状态空间系统的性能优化问题.并给出了一个受控Markov 过程的数值实例.
关键词：	性能势神经元动态规划仿真优化
收稿时间：	2002-8-8
A Simulation Optimization Algorithm for CTMDPs Based on Randomized Stationary Policies

TANG Hao,XI Hong-Sheng,YIN Bao-Qun.A Simulation Optimization Algorithm for CTMDPs Based on Randomized Stationary Policies[J].Acta Automatica Sinica,2004,30(2):229-234.

Authors:	TANG Hao XI Hong-Sheng YIN Bao-Qun

Affiliation:	1.Department of Automaiton,University of Science and Technology of China,Hefei;Department of Computer,Hefei University of Technology,Hefei

Abstract:	Based on the theory of Markov performance potentials and neuro-dynamic programming (NDP) methodology, we study simulation optimization algorithm for a class of continuous time Markov decision processes (CTMDPs) under randomized stationary policies. The proposed algorithm will estimate the gradient of average cost performance measure with respect to policy parameters by transforming a continuous time Markov process into a uniform Markov chain and simulating a single sample path of the chain. The goal is to look for a suboptimal randomized stationary policy. The algorithm derived here can meet the needs of performance optimization of many difficult systems with large-scale state space. Finally, a numerical example for a controlled Markov process is provided.

Keywords:	Performance potentials neuro-dynamic programming simulation optimization
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《自动化学报》浏览原始摘要信息
	点击此处可从《自动化学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏