共查询到10条相似文献,搜索用时 55 毫秒
1.
2.
平均报酬模型的多步强化学习算法 总被引:3,自引:0,他引:3
讨论模型未知的平均报酬强化学习算法。通过结合即时差分学习与R学习算法,将折扣问题中的一些方法推广到了平均准则问题中,提出了两类算法:R(λ)学习。现有的R学习可视为R(λ)学习和TTD(λ)学习当λ=0时的一个特例。仿真结果表明,λ取中间值的R(λ)和TTD(λ)学习比现有的方法在可靠性与收敛速度上均有提高。 相似文献
3.
4.
一类基于有效跟踪的广义平均奖赏激励学习算法 总被引:1,自引:0,他引:1
取消了平均奖赏激励学习的单链或互通MDPs假设,基于有效跟踪技术和折扣奖赏型SARSA(λ)算法,对传统的平均奖赏激励学习进行了推广,提出了一类广义平均奖赏激励学习算法,并对算法的性能进行了初步的比较实验。 相似文献
5.
6.
Recent Advances in Hierarchical Reinforcement Learning 总被引:22,自引:0,他引:22
Reinforcement learning is bedeviled by the curse of dimensionality: the number of parameters to be learned grows exponentially with the size of any compact encoding of a state. Recent attempts to combat the curse of dimensionality have turned to principled ways of exploiting temporal abstraction, where decisions are not required at each step, but rather invoke the execution of temporally-extended activities which follow their own policies until termination. This leads naturally to hierarchical control architectures and associated learning algorithms. We review several approaches to temporal abstraction and hierarchical organization that machine learning researchers have recently developed. Common to these approaches is a reliance on the theory of semi-Markov decision processes, which we emphasize in our review. We then discuss extensions of these ideas to concurrent activities, multiagent coordination, and hierarchical memory for addressing partial observability. Concluding remarks address open challenges facing the further development of reinforcement learning in a hierarchical setting. 相似文献
7.
随着微电子机械系统(MEMS)的迅猛发展,自主式微直升机的研究也已成为这一领域内的研究热点之一.由于微直升机尺寸的限制,不能安装功能很强的传感器和处理器,难以获得完全的环境信息,所以传统的基于模型的控制方法不适用于环境是动态的自主微直升机控制.基于行为的控制方法采用累次逼近的方法,不需要环境的精确模型,因此系统的稳定性较好.本文采用基于替代传导径迹的增强式学习,结合即时差分方法,提高其学习效率,仿真实验验证了该学习算法的有效性.最后,本文介绍了微直升机控制中存在的一些问题和
我们以后的改进方向. 相似文献
8.
文章推导了递归最小二乘瞬时差分法,较通常的瞬时差分法有样本使用效率高,收敛速度快,计算量少等特点。并将基于递归最小二乘的强化学习应用于船舶航向控制,克服了通常智能算法的学习需要一定数量样本数据的缺陷,对控制器的参数进行在线学习与调整,可以在一定程度上解决船舶运动中的不确定性问题,仿真结果表明,在有各种分浪流干扰的条件下,船舶航向的控制仍能取得令人满意的效果,说明该算法是有效可行的。 相似文献
9.
激励学习智能体通过最优策略的学习与规划来求解序贯决策问题,因此如何定义策略的最优判所是激励学习研究的核心问题之一,本文讨论了一系列来自动态规划的最优判据,通过实例检验了各种判据对激励学习的适用性和优缺点,分析了设计各种判据的激励学习算法的必要性。 相似文献
10.
Kernel-Based Reinforcement Learning 总被引:5,自引:0,他引:5
We present a kernel-based approach to reinforcement learning that overcomes the stability problems of temporal-difference learning in continuous state-spaces. First, our algorithm converges to a unique solution of an approximate Bellman's equation regardless of its initialization values. Second, the method is consistent in the sense that the resulting policy converges asymptotically to the optimal policy. Parametric value function estimates such as neural networks do not possess this property. Our kernel-based approach also allows us to show that the limiting distribution of the value function estimate is a Gaussian process. This information is useful in studying the bias-variance tradeoff in reinforcement learning. We find that all reinforcement learning approaches to estimating the value function, parametric or non-parametric, are subject to a bias. This bias is typically larger in reinforcement learning than in a comparable regression problem. 相似文献