首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 109 毫秒
1.
Recent Advances in Hierarchical Reinforcement Learning   总被引:16,自引:0,他引:16  
Reinforcement learning is bedeviled by the curse of dimensionality: the number of parameters to be learned grows exponentially with the size of any compact encoding of a state. Recent attempts to combat the curse of dimensionality have turned to principled ways of exploiting temporal abstraction, where decisions are not required at each step, but rather invoke the execution of temporally-extended activities which follow their own policies until termination. This leads naturally to hierarchical control architectures and associated learning algorithms. We review several approaches to temporal abstraction and hierarchical organization that machine learning researchers have recently developed. Common to these approaches is a reliance on the theory of semi-Markov decision processes, which we emphasize in our review. We then discuss extensions of these ideas to concurrent activities, multiagent coordination, and hierarchical memory for addressing partial observability. Concluding remarks address open challenges facing the further development of reinforcement learning in a hierarchical setting.  相似文献   

2.
Operations research and management science are often confronted with sequential decision making problems with large state spaces. Standard methods that are used for solving such complex problems are associated with some difficulties. As we discuss in this article, these methods are plagued by the so-called curse of dimensionality and the curse of modelling. In this article, we discuss reinforcement learning, a machine learning technique for solving sequential decision making problems with large state spaces. We describe how reinforcement learning can be combined with a function approximation method to avoid both the curse of dimensionality and the curse of modelling. To illustrate the usefulness of this approach, we apply it to a problem with a huge state space—learning to play the game of Othello. We describe experiments in which reinforcement learning agents learn to play the game of Othello without the use of any knowledge provided by human experts. It turns out that the reinforcement learning agents learn to play the game of Othello better than players that use basic strategies.  相似文献   

3.
Reinforcement learning on explicitly specified time scales   总被引:1,自引:0,他引:1  
In recent years hierarchical concepts of temporal abstraction have been integrated in the reinforcement learning framework to improve scalability. However, existing approaches are limited to domains where a decomposition into subtasks is known a priori. In this article we propose the concept of explicitly selecting time scale related abstract actions if no subgoal related abstract actions are available. This concept is realised with multi-step actions on different time scales that are combined in one single action set. We exploit the special structure of the action set in the MSA-Q-learning algorithm. This approach is suited for learning optimal policies in unstructured domains where a decomposition into subtasks is not known in advance or does not exist at all. By learning different explicitly specified time scales simultaneously, we achieve a considerable improvement of learning speed, which we demonstrate on several benchmark problems.  相似文献   

4.
强化学习(reinforcement learning)是机器学习和人工智能领域的重要分支,近年来受到社会各界和企业的广泛关注。强化学习算法要解决的主要问题是,智能体如何直接与环境进行交互来学习策略。但是当状态空间维度增加时,传统的强化学习方法往往面临着维度灾难,难以取得好的学习效果。分层强化学习(hierarchical reinforcement learning)致力于将一个复杂的强化学习问题分解成几个子问题并分别解决,可以取得比直接解决整个问题更好的效果。分层强化学习是解决大规模强化学习问题的潜在途径,然而其受到的关注不高。本文将介绍和回顾分层强化学习的几大类方法。  相似文献   

5.
强化学习通过试错与环境交互获得策略的改进,其自学习和在线学习的特点使其成为机器学习研究的一个重要分支。但强化学习方法一直被维数灾难所困扰。近年来,分层强化学习方法在解决维数灾问题中取得了显著成果,并逐渐开始向多智能体系统推广,论文归纳分析这一领域目前的研究进展,并对迫切需要解决的一些问题和进一步的发展趋势作出探讨和展望。  相似文献   

6.
This paper studies maintenance policies for multi-component systems which have failure interaction among their components. Component failure might accelerate deterioration processes or induce instantaneous failures of the remaining components. We formulate this maintenance problem as a Markov decision process (MDP) with an objective of minimising a total discounted maintenance cost. However, the action set and state space in MDP exponentially grow as the number of components increases. This makes traditional approaches computationally intractable. To deal with this curse of dimensionality, a modified iterative aggregation procedure (MIAP) is proposed. We mathematically prove that iterations in MIAP guarantee the convergence and the policy obtained is optimal. Numerical case studies find that failure interaction should not be ignored in a maintenance policy decision making and the proposed MIAP is faster and requires less computational memory size than that of linear programming.  相似文献   

7.
With excellent global approximation performance and interpretability, Takagi-Sugeno-Kang (TSK) fuzzy systems have enjoyed a wide range of applications in various fields, such as smart control, medical, and finance. However, in handling high-dimensional complex data, the performance and interpretability of a single TSK fuzzy system are easily degraded by rule explosion due to the curse of dimensionality. Ensemble learning comes into play to deal with the problem by the fusion of multiple TSK fuzzy systems using appropriate ensemble learning strategies, which has shown to be effective in eliminating the issue of the curse of dimensionality curse problem and reducing the number of fuzzy rules, thereby maintaining the interpretability of fuzzy systems. To this end, this paper gives a comprehensive survey of TSK fuzzy system fusion to provide insights into further research development. First, we briefly review the fundamental concepts related to TSK fuzzy systems, including fuzzy rule structures, training methods, and interpretability, and discuss the three different development directions of TSK fuzzy systems. Next, along the direction of TSK fuzzy system fusion, we investigate in detail the current ensemble strategies for fusion at hierarchical, wide and stacked levels, and discuss their differences, merits and weaknesses from the aspects of time complexity, interpretability (model complexity) and classification performance. We then present some applications of TSK fuzzy systems in real-world scenarios. Finally, the challenges and future directions of TSK fuzzy system fusion are discussed to foster prospective research.  相似文献   

8.
Reinforcement learning (RL) is an area of machine learning that is concerned with how an agent learns to make decisions sequentially in order to optimize a particular performance measure. For achieving such a goal, the agent has to choose either 1) exploiting previously known knowledge that might end up at local optimality or 2) exploring to gather new knowledge that expects to improve the current performance. Among other RL algorithms, Bayesian model-based RL (BRL) is well-known to be able to trade-off between exploitation and exploration optimally via belief planning, i.e. partially observable Markov decision process (POMDP). However, solving that POMDP often suffers from curse of dimensionality and curse of history. In this paper, we make two major contributions which are: 1) an integration framework of temporal abstraction into BRL that eventually results in a hierarchical POMDP formulation, which can be solved online using a hierarchical sample-based planning solver; 2) a subgoal discovery method for hierarchical BRL that automatically discovers useful macro actions to accelerate learning. In the experiment section, we demonstrate that the proposed approach can scale up to much larger problems. On the other hand, the agent is able to discover useful subgoals for speeding up Bayesian reinforcement learning.  相似文献   

9.
Reinforcement learning (RL) for solving large and complex problems faces the curse of dimensions problem. To overcome this problem, frameworks based on the temporal abstraction have been presented; each having their advantages and disadvantages. This paper proposes a new method like the strategies introduced in the hierarchical abstract machines (HAMs) to create a high-level controller layer of reinforcement learning which uses options. The proposed framework considers a non-deterministic automata as a controller to make a more effective use of temporally extended actions and state space clustering. This method can be viewed as a bridge between option and HAM frameworks, which tries to suggest a new framework to decrease the disadvantage of both by creating connection structures between them and at the same time takes advantages of them. Experimental results on different test environments show significant efficiency of the proposed method.  相似文献   

10.
深度分层强化学习是深度强化学习领域的一个重要研究方向,它重点关注经典深度强化学习难以解决的稀疏奖励、顺序决策和弱迁移能力等问题.其核心思想在于:根据分层思想构建具有多层结构的强化学习策略,运用时序抽象表达方法组合时间细粒度的下层动作,学习时间粗粒度的、有语义的上层动作,将复杂问题分解为数个简单问题进行求解.近年来,随着研究的深入,深度分层强化学习方法已经取得了实质性的突破,且被应用于视觉导航、自然语言处理、推荐系统和视频描述生成等生活领域.首先介绍了分层强化学习的理论基础;然后描述了深度分层强化学习的核心技术,包括分层抽象技术和常用实验环境;详细分析了基于技能的深度分层强化学习框架和基于子目标的深度分层强化学习框架,对比了各类算法的研究现状和发展趋势;接下来介绍了深度分层强化学习在多个现实生活领域中的应用;最后,对深度分层强化学习进行了展望和总结.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号