首页 | 官方网站   微博 | 高级检索  
     

SMDP基于性能势的神经元动态规划
引用本文:唐昊,袁继彬,陆阳,程文娟.SMDP基于性能势的神经元动态规划[J].自动化学报,2005,31(4):642-645.
作者姓名:唐昊  袁继彬  陆阳  程文娟
作者单位:1.School of Computer and Information, Hefei University of Technology, Hefei 230009
基金项目:SupportedbyNationalNaturalScienceFoundationofP.R.China(60404009,60175011),theNaturalScienceFoundationofAnhuiProvince(050420303),andtheSustentationProjectofHefeiUniversityofTechnologyfortheScienceandTechnology-innovationGroups
摘    要:An alpha-uniformized Markov chain is defined by the concept of equivalent infinitesimal generator for a semi-Markov decision process (SMDP) with both average- and discounted-criteria. According to the relations of their performance measures and performance potentials, the optimization of an SMDP can be realized by simulating the chain. For the critic model of neuro-dynamic programming (NDP), a neuro-policy iteration (NPI) algorithm is presented, and the performance error bound is shown as there are approximate error and improvement error in each iteration step. The obtained results may be extended to Markov systems, and have much applicability. Finally, a numerical example is provided.

关 键 词:Semi-Markov  decision  processes    performance  potentials    neuro-dynamic  programming
收稿时间:2004-1-18
修稿时间:2004年1月18日

Performance Potential-based Neuro-dynamic Programming for SMDPs
TANG Hao,YUAN Ji-Bin,LU Yang,CHENG Wen-juan.Performance Potential-based Neuro-dynamic Programming for SMDPs[J].Acta Automatica Sinica,2005,31(4):642-645.
Authors:TANG Hao  YUAN Ji-Bin  LU Yang  CHENG Wen-juan
Affiliation:1.School of Computer and Information, Hefei University of Technology, Hefei 230009
Abstract:An alpha-uniformized Markov chain is defined by the concept of equivalent infinitesimalgenerator for a semi-Markov decision process (SMDP) with both average- and discounted-criteria.According to the relations of their performance measures and performance potentials, the optimiza-tion of an SMDP can be realized by simulating the chain. For the critic model of neuro-dynamicprogramming (NDP), a neuro-policy iteration (NPI) algorithm is presented, and the performanceerror bound is shown as there are approximate error and improvement error in each iteration step.The obtained results may be extended to Markov systems, and have much applicability. Finally, anumerical example is provided.
Keywords:Semi-Markov decision processes  performance potentials  neuro-dynamic program-ming
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《自动化学报》浏览原始摘要信息
点击此处可从《自动化学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号