基于神经网络的强化学习算法研究 RESEARCH ON A REINFORCEMENT LEARNING ALGORITHM BASED ON NEURAL NETWORK期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于神经网络的强化学习算法研究

引用本文：	陆鑫,高阳,李宁,陈世福.基于神经网络的强化学习算法研究[J].计算机研究与发展,2002,39(8):981-985.

作者姓名：	陆鑫高阳李宁陈世福

作者单位：	南京大学计算机软件新技术国家重点实验室,南京,210093

基金项目：	国家自然科学基金资助 ( 6 990 5 0 0 1)

摘要：	BP神经网络在非线性控制系统中被广泛运用，但作为有导师监督的学习算法，要求批量提供输入输出对神经网络训练，而在一些并不知道最优策略的系统中，这样的输入输出对事先并无法得到，另一方面，强化学习从实际系统学习经验来调整策略，并且是一个逼近最优策略的过程，学习过程并不需要导师的监督。提出了将强化学习与BP神经网络结合的学习算法-RBP模型。该模型的基本思想是通过强化学习控制策略，经过一定周期的学习后再用学到的知识训练神经网络，以使网络逐步收敛到最优状态。最后通过实验验证了该方法的有效性及收敛性。
关键词：	神经网络强化学习算法 RBP模型
RESEARCH ON A REINFORCEMENT LEARNING ALGORITHM BASED ON NEURAL NETWORK

LU Xin,GAO Yang,LI Ning,and CHEN Shi-Fu.RESEARCH ON A REINFORCEMENT LEARNING ALGORITHM BASED ON NEURAL NETWORK[J].Journal of Computer Research and Development,2002,39(8):981-985.

Authors:	LU Xin GAO Yang LI Ning and CHEN Shi-Fu

Abstract:	BP neural network has been used in nonlinear system controller widely. But as a supervised training algorithm, it requires the input-output pairs to be trained. But in some systems such input-output pairs cannot be received under the optimal control policy. On the other hand, reinforcement learning (RL) learns behavior through trial-and-error interaction with a dynamic environment. It is unsupervised and on-line. This paper provides the RBP model which adapts the BP network to be used in RL. The main idea of RBP is: RL learns optimal policy from the environment and stores the policy into the network. Instead of updating weights instantly, network weights are updated in batch mode periodically. A simple example is used to illustrate the validity of the algorithm.

Keywords:	reinforcement learning BP neural network reinforcement back-propagation model
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏