SMDP基于性能势的神经元动态规划 Performance Potential-based Neuro-dynamic Programming for SMDPs期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

SMDP基于性能势的神经元动态规划

引用本文：	唐昊,袁继彬,陆阳,程文娟.SMDP基于性能势的神经元动态规划[J].自动化学报,2005,31(4):642-645.

作者姓名：	唐昊袁继彬陆阳程文娟

作者单位：	1.School of Computer and Information, Hefei University of Technology, Hefei 230009

基金项目：	SupportedbyNationalNaturalScienceFoundationofP.R.China(60404009,60175011),theNaturalScienceFoundationofAnhuiProvince(050420303),andtheSustentationProjectofHefeiUniversityofTechnologyfortheScienceandTechnology-innovationGroups

摘要：	An alpha-uniformized Markov chain is defined by the concept of equivalent infinitesimal generator for a semi-Markov decision process (SMDP) with both average- and discounted-criteria. According to the relations of their performance measures and performance potentials, the optimization of an SMDP can be realized by simulating the chain. For the critic model of neuro-dynamic programming (NDP), a neuro-policy iteration (NPI) algorithm is presented, and the performance error bound is shown as there are approximate error and improvement error in each iteration step. The obtained results may be extended to Markov systems, and have much applicability. Finally, a numerical example is provided.
关键词：	Semi-Markov decision processes performance potentials neuro-dynamic programming
收稿时间：	2004-1-18
修稿时间：	2004年1月18日
Performance Potential-based Neuro-dynamic Programming for SMDPs

TANG Hao,YUAN Ji-Bin,LU Yang,CHENG Wen-juan.Performance Potential-based Neuro-dynamic Programming for SMDPs[J].Acta Automatica Sinica,2005,31(4):642-645.

Authors:	TANG Hao YUAN Ji-Bin LU Yang CHENG Wen-juan

Affiliation:	1.School of Computer and Information, Hefei University of Technology, Hefei 230009

Abstract:	An alpha-uniformized Markov chain is defined by the concept of equivalent infinitesimalgenerator for a semi-Markov decision process (SMDP) with both average- and discounted-criteria.According to the relations of their performance measures and performance potentials, the optimiza-tion of an SMDP can be realized by simulating the chain. For the critic model of neuro-dynamicprogramming (NDP), a neuro-policy iteration (NPI) algorithm is presented, and the performanceerror bound is shown as there are approximate error and improvement error in each iteration step.The obtained results may be extended to Markov systems, and have much applicability. Finally, anumerical example is provided.

Keywords:	Semi-Markov decision processes performance potentials neuro-dynamic program-ming
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《自动化学报》浏览原始摘要信息
	点击此处可从《自动化学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏