期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Analytical Mean Squared Error Curves for Temporal Difference Learning

Singh Satinder Dayan Peter 《Machine Learning》1998,32(1):5-40

相似文献

2.

胡光华吴沧浦《控制理论与应用》2000,17(5):660-664

讨论模型未知的平均报酬强化学习算法。通过结合即时差分学习与Ｒ学习算法,将折扣问题中的一些方法推广到了平均准则问题中,提出了两类算法：Ｒ（λ）学习。现有的Ｒ学习可视为Ｒ（λ）学习和ＴＴＤ（λ）学习当λ＝０时的一个特例。仿真结果表明,λ取中间值的Ｒ（λ）和ＴＴＤ（λ）学习比现有的方法在可靠性与收敛速度上均有提高。相似文献

3.

强化学习中资格迹的作用

孙羽张汝波徐东《计算机工程》2002,28(5):128-129,198

强化学习一词来自行为心理学，该学科把学习看作反复试验的过程，强化学习系统中的资格迹用来解决时间信度分配问题，文章介绍了资格迹的基本原理和实现方法。相似文献

4.

一类基于有效跟踪的广义平均奖赏激励学习算法 总被引：1，自引：0，他引：1

陈焕文谢建平《计算机工程与应用》2002,38(1):65-68

取消了平均奖赏激励学习的单链或互通MDPs假设,基于有效跟踪技术和折扣奖赏型SARSA(λ)算法,对传统的平均奖赏激励学习进行了推广,提出了一类广义平均奖赏激励学习算法,并对算法的性能进行了初步的比较实验。相似文献

5.

非线性系统的再励学习控制研究

蒋志明王丽红段锁林林廷圻《控制理论与应用》2000,17(6):899-902

研究了一种带有的CMAC神经网络的再励学习（RL）控制方法,以解决具有高度非线性的系统控制问题。研究的重点在于算法的简化以及具有连续输出的函数学习上。控制策略由两部分构成;再励学习控制器和固定增益常规控制器。前者用于学习系统的非线性,后者用于稳定系统。仿真结果表明,所提出的控制策略不仅是有效的,而且具有很高的控制精度。相似文献

6.

Recent Advances in Hierarchical Reinforcement Learning 总被引：22，自引：0，他引：22

Andrew G. Barto Sridhar Mahadevan 《Discrete Event Dynamic Systems》2003,13(4):341-379

Reinforcement learning is bedeviled by the curse of dimensionality: the number of parameters to be learned grows exponentially with the size of any compact encoding of a state. Recent attempts to combat the curse of dimensionality have turned to principled ways of exploiting temporal abstraction, where decisions are not required at each step, but rather invoke the execution of temporally-extended activities which follow their own policies until termination. This leads naturally to hierarchical control architectures and associated learning algorithms. We review several approaches to temporal abstraction and hierarchical organization that machine learning researchers have recently developed. Common to these approaches is a reliance on the theory of semi-Markov decision processes, which we emphasize in our review. We then discuss extensions of these ideas to concurrent activities, multiagent coordination, and hierarchical memory for addressing partial observability. Concluding remarks address open challenges facing the further development of reinforcement learning in a hierarchical setting. 相似文献

7.

基于替代传导径迹增强式学习的自主式微直升机控制

杨玉君程君实陈佳品张琛肖永利《信息与控制》2003,32(3):229-233

随着微电子机械系统（MEMS）的迅猛发展，自主式微直升机的研究也已成为这一领域内的研究热点之一．由于微直升机尺寸的限制，不能安装功能很强的传感器和处理器，难以获得完全的环境信息，所以传统的基于模型的控制方法不适用于环境是动态的自主微直升机控制．基于行为的控制方法采用累次逼近的方法，不需要环境的精确模型，因此系统的稳定性较好．本文采用基于替代传导径迹的增强式学习，结合即时差分方法，提高其学习效率，仿真实验验证了该学习算法的有效性．最后，本文介绍了微直升机控制中存在的一些问题和我们以后的改进方向．相似文献

8.

一种基于递归最小二乘法的强化学习算法及其应用研究

沈智鹏郭晨《计算机工程与应用》2005,41(8):213-216

文章推导了递归最小二乘瞬时差分法,较通常的瞬时差分法有样本使用效率高,收敛速度快,计算量少等特点。并将基于递归最小二乘的强化学习应用于船舶航向控制,克服了通常智能算法的学习需要一定数量样本数据的缺陷,对控制器的参数进行在线学习与调整,可以在一定程度上解决船舶运动中的不确定性问题,仿真结果表明,在有各种分浪流干扰的条件下,船舶航向的控制仍能取得令人满意的效果,说明该算法是有效可行的。相似文献

9.

激励学习的最优判据研究 总被引：8，自引：0，他引：8

下载免费PDF全文

陈焕文谢建平《计算机工程与科学》2001,23(2):62-65

激励学习智能体通过最优策略的学习与规划来求解序贯决策问题,因此如何定义策略的最优判所是激励学习研究的核心问题之一,本文讨论了一系列来自动态规划的最优判据,通过实例检验了各种判据对激励学习的适用性和优缺点,分析了设计各种判据的激励学习算法的必要性。相似文献

10.

Kernel-Based Reinforcement Learning 总被引：5，自引：0，他引：5

Ormoneit Dirk Sen Śaunak 《Machine Learning》2002,49(2-3):161-178

We present a kernel-based approach to reinforcement learning that overcomes the stability problems of temporal-difference learning in continuous state-spaces. First, our algorithm converges to a unique solution of an approximate Bellman's equation regardless of its initialization values. Second, the method is consistent in the sense that the resulting policy converges asymptotically to the optimal policy. Parametric value function estimates such as neural networks do not possess this property. Our kernel-based approach also allows us to show that the limiting distribution of the value function estimate is a Gaussian process. This information is useful in studying the bias-variance tradeoff in reinforcement learning. We find that all reinforcement learning approaches to estimating the value function, parametric or non-parametric, are subject to a bias. This bias is typically larger in reinforcement learning than in a comparable regression problem. 相似文献