首页 | 本学科首页   官方微博 | 高级检索  
     

基于神经网络的强化学习算法研究
引用本文:陆鑫,高阳,李宁,陈世福.基于神经网络的强化学习算法研究[J].计算机研究与发展,2002,39(8):981-985.
作者姓名:陆鑫  高阳  李宁  陈世福
作者单位:南京大学计算机软件新技术国家重点实验室,南京,210093
基金项目:国家自然科学基金资助 ( 6 990 5 0 0 1)
摘    要:BP神经网络在非线性控制系统中被广泛运用,但作为有导师监督的学习算法,要求批量提供输入输出对神经网络训练,而在一些并不知道最优策略的系统中,这样的输入输出对事先并无法得到,另一方面,强化学习从实际系统学习经验来调整策略,并且是一个逼近最优策略的过程,学习过程并不需要导师的监督。提出了将强化学习与BP神经网络结合的学习算法-RBP模型。该模型的基本思想是通过强化学习控制策略,经过一定周期的学习后再用学到的知识训练神经网络,以使网络逐步收敛到最优状态。最后通过实验验证了该方法的有效性及收敛性。

关 键 词:神经网络  强化学习算法  RBP模型

RESEARCH ON A REINFORCEMENT LEARNING ALGORITHM BASED ON NEURAL NETWORK
LU Xin,GAO Yang,LI Ning,and CHEN Shi-Fu.RESEARCH ON A REINFORCEMENT LEARNING ALGORITHM BASED ON NEURAL NETWORK[J].Journal of Computer Research and Development,2002,39(8):981-985.
Authors:LU Xin  GAO Yang  LI Ning  and CHEN Shi-Fu
Abstract:BP neural network has been used in nonlinear system controller widely. But as a supervised training algorithm, it requires the input-output pairs to be trained. But in some systems such input-output pairs cannot be received under the optimal control policy. On the other hand, reinforcement learning (RL) learns behavior through trial-and-error interaction with a dynamic environment. It is unsupervised and on-line. This paper provides the RBP model which adapts the BP network to be used in RL. The main idea of RBP is: RL learns optimal policy from the environment and stores the policy into the network. Instead of updating weights instantly, network weights are updated in batch mode periodically. A simple example is used to illustrate the validity of the algorithm.
Keywords:reinforcement learning  BP neural network  reinforcement back-propagation model
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号