基于递推最小二乘法的多步时序差分学习算法 Multi-step temporal difference learning algorithm based on recursive least-squares method期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于递推最小二乘法的多步时序差分学习算法

引用本文：	陈学松,杨宜民.基于递推最小二乘法的多步时序差分学习算法[J].计算机工程与应用,2010,46(8):52-55.

作者姓名：	陈学松杨宜民

作者单位：	1. 广东工业大学,应用数学学院,广州,510006 2. 广东工业大学,自动化学院,广州,510006

摘要：	强化学习是一种重要的机器学习方法。为了提高强化学习过程的收敛速度和减少学习过程值函数估计的误差，提出了基于递推最小二乘法的多步时序差分学习算法（RLS-TD（λ））。证明了在满足一定条件下，该算法的权值将以概率1收敛到唯一解，并且得出和证明了值函数估计值的误差应满足的关系式。迷宫实验表明，与RLS-TD（0）算法相比，该算法能加快学习过程的收敛，与传统的TD（λ）算法相比，该算法减少了值函数估计误差，从而提高了精度。
关键词：	强化学习时序差分最小二乘收敛 RLS-TD（λ）算法
收稿时间：	2009-9-22
修稿时间：	2009-11-18
Multi-step temporal difference learning algorithm based on recursive least-squares method

CHEN Xue-song,YANG Yi-min.Multi-step temporal difference learning algorithm based on recursive least-squares method[J].Computer Engineering and Applications,2010,46(8):52-55.

Authors:	CHEN Xue-song YANG Yi-min

Affiliation:	CHEN Xue-song,YANG Yi-min 1.Faculty of Applied Mathematics,Guangdong University of Technology,Guangzhou 510006,China 2.Faculty of Automation,China

Abstract:	Reinforcement learning is one of most important machine learning methods.In order to solve the problem of slow convergence speed and the error of value function in reinforcement learning systems,a multi-stop Temporal Difference(TD(λ))learning algorithm using Recursive Least-Squares(BSL)methods(RLS-TD(λ))is proposed.The proposed algorithm is based on RLS-TD(0),its convergence is proved,and its formula of error estimation is obtained.The experiment on maze problem demonstrates that the algorithm can speed up the convergence of the learning process compared with RLS-TD(0),and improve the learning precision compared with TD(λ).

Keywords:	reinforcement learning temporal difference Reeursive Least-Squares(RLS) convergence
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《计算机工程与应用》浏览原始摘要信息
	点击此处可从《计算机工程与应用》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏