首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 187 毫秒
1.
张峻伟  吕帅  张正昊  于佳玉  龚晓宇 《软件学报》2022,33(11):4217-4238
深度强化学习将深度学习的表示能力和强化学习的决策能力结合,因在复杂控制任务中效果显著而掀起研究热潮.以是否用Bellman方程为基准,将无模型深度强化学习方法分为Q值函数方法和策略梯度方法,并从模型构建方式、优化历程和方法评估等方面对两类方法分别进行了介绍.针对深度强化学习方法中样本效率低的问题进行讨论,根据两类方法的模型特性,说明了Q值函数方法过高估计问题和策略梯度方法采样无偏性约束分别是两类方法样本效率受限的主要原因.从增强探索效率和提高样本利用率两个角度,根据近年来的研究热点和趋势归纳出各类可行的优化方法,分析相关方法的优势和仍存在的问题,并对比其适用范围和优化效果.最后提出增强样本效率优化方法的通用性、探究两类方法间优化机制的迁移和提高理论完备性作为未来的研究方向.  相似文献   

2.
On one hand, multiple object detection approaches of Hough transform (HT) type and randomized HT type have been extended into an evidence accumulation featured general framework for problem solving, with five key mechanisms elaborated and several extensions of HT and RHT presented. On the other hand, another framework is proposed to integrate typical multi-learner based approaches for problem solving, particularly on Gaussian mixture based data clustering and local subspace learning, multi-sets mixture based object detection and motion estimation, and multi-agent coordinated problem solving. Typical learning algorithms, especially those that base on rival penalized competitive learning (RPCL) and Bayesian Ying-Yang (BYY) learning, are summarized from a unified perspective with new extensions. Furthermore, the two different frameworks are not only examined with one viewed crossly from a perspective of the other, with new insights and extensions, but also further unified into a general problem solving paradigm that consists of five basic mechanisms in terms of acquisition, allocation, amalgamation, admission, and affirmation, or shortly A5 paradigm.  相似文献   

3.
多Agent深度强化学习综述   总被引:10,自引:4,他引:6  
近年来, 深度强化学习(Deep reinforcement learning, DRL)在诸多复杂序贯决策问题中取得巨大突破.由于融合了深度学习强大的表征能力和强化学习有效的策略搜索能力, 深度强化学习已经成为实现人工智能颇有前景的学习范式.然而, 深度强化学习在多Agent系统的研究与应用中, 仍存在诸多困难和挑战, 以StarCraft Ⅱ为代表的部分观测环境下的多Agent学习仍然很难达到理想效果.本文简要介绍了深度Q网络、深度策略梯度算法等为代表的深度强化学习算法和相关技术.同时, 从多Agent深度强化学习中通信过程的角度对现有的多Agent深度强化学习算法进行归纳, 将其归纳为全通信集中决策、全通信自主决策、欠通信自主决策3种主流形式.从训练架构、样本增强、鲁棒性以及对手建模等方面探讨了多Agent深度强化学习中的一些关键问题, 并分析了多Agent深度强化学习的研究热点和发展前景.  相似文献   

4.

Deep learning techniques have shown success in learning from raw high-dimensional data in various applications. While deep reinforcement learning is recently gaining popularity as a method to train intelligent agents, utilizing deep learning in imitation learning has been scarcely explored. Imitation learning can be an efficient method to teach intelligent agents by providing a set of demonstrations to learn from. However, generalizing to situations that are not represented in the demonstrations can be challenging, especially in 3D environments. In this paper, we propose a deep imitation learning method to learn navigation tasks from demonstrations in a 3D environment. The supervised policy is refined using active learning in order to generalize to unseen situations. This approach is compared to two popular deep reinforcement learning techniques: deep-Q-networks and Asynchronous actor-critic (A3C). The proposed method as well as the reinforcement learning methods employ deep convolutional neural networks and learn directly from raw visual input. Methods for combining learning from demonstrations and experience are also investigated. This combination aims to join the generalization ability of learning by experience with the efficiency of learning by imitation. The proposed methods are evaluated on 4 navigation tasks in a 3D simulated environment. Navigation tasks are a typical problem that is relevant to many real applications. They pose the challenge of requiring demonstrations of long trajectories to reach the target and only providing delayed rewards (usually terminal) to the agent. The experiments show that the proposed method can successfully learn navigation tasks from raw visual input while learning from experience methods fail to learn an effective policy. Moreover, it is shown that active learning can significantly improve the performance of the initially learned policy using a small number of active samples.

  相似文献   

5.
Mahadevan  Sridhar 《Machine Learning》1996,22(1-3):159-195
This paper presents a detailed study of average reward reinforcement learning, an undiscounted optimality framework that is more appropriate for cyclical tasks than the much better studied discounted framework. A wide spectrum of average reward algorithms are described, ranging from synchronous dynamic programming methods to several (provably convergent) asynchronous algorithms from optimal control and learning automata. A general sensitive discount optimality metric calledn-discount-optimality is introduced, and used to compare the various algorithms. The overview identifies a key similarity across several asynchronous algorithms that is crucial to their convergence, namely independent estimation of the average reward and the relative values. The overview also uncovers a surprising limitation shared by the different algorithms while several algorithms can provably generategain-optimal policies that maximize average reward, none of them can reliably filter these to producebias-optimal (orT-optimal) policies that also maximize the finite reward to absorbing goal states. This paper also presents a detailed empirical study of R-learning, an average reward reinforcement learning method, using two empirical testbeds: a stochastic grid world domain and a simulated robot environment. A detailed sensitivity analysis of R-learning is carried out to test its dependence on learning rates and exploration levels. The results suggest that R-learning is quite sensitive to exploration strategies and can fall into sub-optimal limit cycles. The performance of R-learning is also compared with that of Q-learning, the best studied discounted RL method. Here, the results suggest that R-learning can be fine-tuned to give better performance than Q-learning in both domains.  相似文献   

6.
提出一种模糊神经网络的自适应控制方案。针对连续空间的复杂学习任务,提出了一种竞争式Takagi—Sugeno模糊再励学习网络,该网络结构集成了Takagi-Sugeno模糊推理系统和基于动作的评价值函数的再励学习方法。相应地,提出了一种优化学习算法,其把竞争式Takagi-Sugeno模糊再励学习网络训练成为一种所谓的Takagi-Sugeno模糊变结构控制器。以一级倒立摆控制系统为例.仿真研究表明所提出的学习算法在性能上优于其它的再励学习算法。  相似文献   

7.
竞争式Takagi-Sugeno模糊再励学习   总被引:4,自引:0,他引:4  
针对连续空间的复杂学习任务,提出了一种竞争式Takagi-Sugeno模糊再励学习网络 (CTSFRLN),该网络结构集成了Takagi-Sugeno模糊推理系统和基于动作的评价值函数的再 励学习方法.文中相应提出了两种学习算法,即竞争式Takagi-Sugeno模糊Q-学习算法和竞争 式Takagi-Sugeno模糊优胜学习算法,其把CTSFRLN训练成为一种所谓的Takagi-Sugeno模 糊变结构控制器.以二级倒立摆控制系统为例,仿真研究表明所提出的学习算法在性能上优于 其它的再励学习算法.  相似文献   

8.
提出一种模糊神经网络的自适应控制方案。针对连续空间的复杂学习任务,提出了一种竞争式Takagi-Sugeno模糊再励学习网络,该网络结构集成了Takagi-Sugeno模糊推理系统和基于动作的评价值函数的再励学习方法。相应地,提出了一种优化学习算法,其把竞争式Takagi-Sugeno模糊再励学习网络训练成为一种所谓的Takagi-Sugeno模糊变结构控制器。以一级倒立摆控制系统为例,仿真研究表明所提出的学习算法在性能上优于其它的再励学习算法。  相似文献   

9.
Task scheduling is important for the proper functioning of parallel processor systems. The static scheduling of tasks onto networks of parallel processors is well-defined and documented in the literature. However, in many practical situations a priori information about the tasks that need to be scheduled is not available. In such situations, tasks usually arrive dynamically and the scheduling should be performed on-line or “on the fly”. In this paper, we present a framework based on stochastic reinforcement learning, which is usually used to solve optimization problems in a simple and efficient way. The use of reinforcement learning reduces the dynamic scheduling problem to that of learning a stochastic approximation of an unknown average error surface. The main advantage of the proposed approach is that no prior information is required about the parallel processor system under consideration. The learning system develops an association between the best action (schedule) and the current state of the environment (parallel system). The performance of reinforcement learning is demonstrated by solving several dynamic scheduling problems. The conditions under which reinforcement learning can used to efficiently solve the dynamic scheduling problem are highlighted  相似文献   

10.
模仿学习是强化学习与监督学习的结合,目标是通过观察专家演示,学习专家策略,从而加速强化学习。通过引入任务相关的额外信息,模仿学习相较于强化学习,可以更快地实现策略优化,为缓解低样本效率问题提供了解决方案。模仿学习已成为解决强化学习问题的一种流行框架,涌现出多种提高学习性能的算法和技术。通过与图形图像学的最新研究成果相结合,模仿学习已经在游戏人工智能(artificial intelligence,AI)、机器人控制和自动驾驶等领域发挥了重要作用。本文围绕模仿学习的年度发展,从行为克隆、逆强化学习、对抗式模仿学习、基于观察量的模仿学习和跨领域模仿学习等多个角度进行深入探讨,介绍了模仿学习在实际应用上的最新情况,比较了国内外研究现状,并展望了该领域未来的发展方向。旨在为研究人员和从业人员提供模仿学习的最新进展,从而为开展工作提供参考与便利。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号