增强学习中的直接策略搜索方法综述 A survey of direct policy search methods in reinforcement learning期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

增强学习中的直接策略搜索方法综述

引用本文：	王学宁,陈伟,张锰,徐昕,贺汉根.增强学习中的直接策略搜索方法综述[J].智能系统学报,2007,2(1):16-24.

作者姓名：	王学宁陈伟张锰徐昕贺汉根

作者单位：	1. 国防科技大学,机电工程与自动化学院,湖南,长沙,410073 2. 北京清河大楼,子9,北京,100085

基金项目：	国家自然科学基金资助项目（60234030,60303012）.

摘要：	对增强学习中各种策略搜索算法进行了简单介绍，建立了策略梯度方法的理论框架，并且根据这个理论框架的指导，对一些现有的策略梯度算法进行了推广，讨论了近年来出现的提高策略梯度算法收敛速度的几种方法-对于非策略梯度搜索算法的最新进展进行了介绍，对进一步研究工作的方向进行了展望．
关键词：	增强学习策略搜索策略梯度
文章编号：	1673-4785（2007）01-0016-09
修稿时间：	2006-07-07
A survey of direct policy search methods in reinforcement learning

WANG Xue-ning,CHEN Wei,ZHANG Meng,XU Xin,HE Han-gen.A survey of direct policy search methods in reinforcement learning[J].CAAL Transactions on Intelligent Systems,2007,2(1):16-24.

Authors:	WANG Xue-ning CHEN Wei ZHANG Meng XU Xin HE Han-gen

Affiliation:	1. School of Electromechanical Engineering and Automation, National University of Defense Technology, Changsha 410073, China;2. Qinghe Building Zi 9, Beijing 100085, China

Abstract:	The direct policy search methods in reinforcement learning are described, and the theoretic framework of policy gradient methods is presented. According to this framework, some current policy gradient algorithms are generalized. The new methods of speeding up the policy gradient algorithms are discussed. The new non-policy gradient search methods are also described. Finally, some future directions of research work are also given.

Keywords:	reinforcement learning policy search policy Gradient
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏