首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
3.
分层强化学习中的动态分层方法研究   总被引:1,自引:0,他引:1  
分层强化学习中现有的自动分层方法均是在对状态空间进行一定程度探测之后一次性生成层次结构,不充分探测不能保证求解质量,过度探测则影响学习速度,为了克服学习算法性能高度依赖于状态空间探测程度这个问题,本文提出一种动态分层方法,该方法将免疫聚类及二次应答机制融入Sutton提出的Option分层强化学习框架,能对Option状态空间进行动态调整,并沿着学习轨迹动态生成Option内部策略,以二维有障碍栅格空间内两点间最短路径规划为学习任务进行了仿真实验,结果表明,动态分层方法对状态空间探测程度的依赖性很小,动态分层方法更适用于解决大规模强化学习问题.  相似文献   

4.
Recent Advances in Hierarchical Reinforcement Learning   总被引:22,自引:0,他引:22  
Reinforcement learning is bedeviled by the curse of dimensionality: the number of parameters to be learned grows exponentially with the size of any compact encoding of a state. Recent attempts to combat the curse of dimensionality have turned to principled ways of exploiting temporal abstraction, where decisions are not required at each step, but rather invoke the execution of temporally-extended activities which follow their own policies until termination. This leads naturally to hierarchical control architectures and associated learning algorithms. We review several approaches to temporal abstraction and hierarchical organization that machine learning researchers have recently developed. Common to these approaches is a reliance on the theory of semi-Markov decision processes, which we emphasize in our review. We then discuss extensions of these ideas to concurrent activities, multiagent coordination, and hierarchical memory for addressing partial observability. Concluding remarks address open challenges facing the further development of reinforcement learning in a hierarchical setting.  相似文献   

5.
Recent Advances in Hierarchical Reinforcement Learning   总被引:16,自引:0,他引:16  
Reinforcement learning is bedeviled by the curse of dimensionality: the number of parameters to be learned grows exponentially with the size of any compact encoding of a state. Recent attempts to combat the curse of dimensionality have turned to principled ways of exploiting temporal abstraction, where decisions are not required at each step, but rather invoke the execution of temporally-extended activities which follow their own policies until termination. This leads naturally to hierarchical control architectures and associated learning algorithms. We review several approaches to temporal abstraction and hierarchical organization that machine learning researchers have recently developed. Common to these approaches is a reliance on the theory of semi-Markov decision processes, which we emphasize in our review. We then discuss extensions of these ideas to concurrent activities, multiagent coordination, and hierarchical memory for addressing partial observability. Concluding remarks address open challenges facing the further development of reinforcement learning in a hierarchical setting.  相似文献   

6.
In this paper an optimized classification method for object recognition is presented. The proposed method is based on the Hierarchical Temporal Memory (HTM), which stems from the memory prediction theory of the human brain. As in HTM, this method comprises a tree structure of connected computational nodes, whilst utilizing different rules to memorize objects appearing in various orientations. These rules involve both the spatial and the temporal module. As HTM is inspired from brain activity, its input should also comply with the human vision system. Thus, for the representation of the input images the logpolar was given preference to the Cartesian one. As compared to the original HTM method, experimental results exhibit performance enhancements with this approach, in recognition and categorization applications. Results obtained prove that the proposed method is more accurate and faster in training, whilst retaining the network robustness in multiple orientation variations.  相似文献   

7.
We study the typical properties of polynomial Support Vector Machines within a Statistical Mechanics approach that takes into account the number of high order features relative to the input space dimension. We analyze the effect of different features' normalizations on the generalization error, for different kinds of learning tasks. If the normalization is adequately selected, hierarchical learning of features of increasing order takes place as a function of the training set size. Otherwise, the performance worsens, and there is no hierarchical learning at all.  相似文献   

8.
分层强化学习中的Option自动生成算法   总被引:2,自引:1,他引:2  
分层强化学习中目前有Option、HAM和MAXQ三种主要方法,其自动分层问题均未得到有效解决,该文针对第一种方法,提出了Option自动生成算法,该算法以Agent在学习初始阶段探测到的状态空间为输入,采用人工免疫网络技术对其进行聚类,在聚类后的各状态子集上通过经验回放学习产生内部策略集,从而生成Option,仿真实验验证了该算法的有效性。  相似文献   

9.
视频层次结构挖掘   总被引:3,自引:0,他引:3  
视频处理的关键是视频信息的结构化,视频基本结构是由帧、镜头、场景和视频节目构成的层次结构。视频层次结构挖掘的一个简单框架是对视频进行镜头分割、抽取镜头特征和视频场景构造。论文在镜头分割的基础上提出了基于多特征的镜头聚类分析和基于镜头的场景边界检测两种视频场景构造方法,从而实现视频层次结构挖掘。实验表明,基于镜头的场景边界检测性能优于基于多特征的镜头聚类分析。  相似文献   

10.
Practical Issues in Temporal Difference Learning   总被引:8,自引:10,他引:8  
This paper examines whether temporal difference methods for training connectionist networks, such as Sutton's TD() algorithm, can be successfully applied to complex real-world problems. A number of important practical issues are identified and discussed from a general theoretical perspective. These practical issues are then examined in the context of a case study in which TD() is applied to learning the game of backgammon from the outcome of self-play. This is apparently the first application of this algorithm to a complex non-trivial task. It is found that, with zero knowledge built in, the network is able to learn from scratch to play the entire game at a fairly strong intermediate level of performance, which is clearly better than conventional commercial programs, and which in fact surpasses comparable networks trained on a massive human expert data set. This indicates that TD learning may work better in practice than one would expect based on current theory, and it suggests that further analysis of TD methods, as well as applications in other complex domains, may be worth investigating.  相似文献   

11.
提出基于层级实现记忆( HTM)网络的地图创建方法。该方法利用层级实时记忆将制图问题等效为场景识别问题,环境地图由一系列HTM模型输出的场景构成。首先从获取图像中提取位置不变鲁棒特征( PIRF)。并利用PIRF构建视觉词汇表,根据词汇表将图像的PIRF描述符映射为视觉单词频率矢量。多个视觉单词频率矢量构成的序列输入HTM网络,用于实现环境地图的学习与创建及环路场景的推断识别。采用两组实验数据验证文中方法,结果表明基于HTM的制图策略能成功建立环境地图,并能高效处理环路检测问题。  相似文献   

12.
Bursty and Hierarchical Structure in Streams   总被引:9,自引:1,他引:9  
A fundamental problem in text data mining is to extract meaningful structure from document streams that arrive continuously over time. E-mail and news articles are two natural examples of such streams, each characterized by topics that appear, grow in intensity for a period of time, and then fade away. The published literature in a particular research field can be seen to exhibit similar phenomena over a much longer time scale. Underlying much of the text mining work in this area is the following intuitive premise—that the appearance of a topic in a document stream is signaled by a burst of activity, with certain features rising sharply in frequency as the topic emerges.The goal of the present work is to develop a formal approach for modeling such bursts, in such a way that they can be robustly and efficiently identified, and can provide an organizational framework for analyzing the underlying content. The approach is based on modeling the stream using an infinite-state automaton, in which bursts appear naturally as state transitions; it can be viewed as drawing an analogy with models from queueing theory for bursty network traffic. The resulting algorithms are highly efficient, and yield a nested representation of the set of bursts that imposes a hierarchical structure on the overall stream. Experiments with e-mail and research paper archives suggest that the resulting structures have a natural meaning in terms of the content that gave rise to them.  相似文献   

13.
强化学习通过试错与环境交互获得策略的改进,其自学习和在线学习的特点使其成为机器学习研究的一个重要分支.但是,强化学习一直被"维数灾"问题所困扰.近年来,分层强化学习方法引入抽象(Abstraction)机制,在克服"维数灾"方面取得了显著进展.作为理论基础,本文首先介绍了强化学习的基本原理及基于半马氏过程的Q-学习算法.然后介绍了3种典型的单Agent分层强化学习方法(Option、HAM和MAXQ)的基本思想,Q-学习更新公式,概括了各方法的本质特征,并对这3种方法进行了对比分析评价.最后指出了将单Agent分层强化学习方法拓展到多Agent分层强化学习时需要解决的问题.  相似文献   

14.
Hierarchical reinforcement learning (RL) algorithms can learn a policy faster than standard RL algorithms. However, the applicability of hierarchical RL algorithms is limited by the fact that the task decomposition has to be performed in advance by the human designer. We propose a Lamarckian evolutionary approach for automatic development of the learning structure in hierarchical RL. The proposed method combines the MAXQ hierarchical RL method and genetic programming (GP). In the MAXQ framework, a subtask can optimize the policy independently of its parent task's policy, which makes it possible to reuse learned policies of the subtasks. In the proposed method, the MAXQ method learns the policy based on the task hierarchies obtained by GP, while the GP explores the appropriate hierarchies using the result of the MAXQ method. To show the validity of the proposed method, we have performed simulation experiments for a foraging task in three different environmental settings. The results show strong interconnection between the obtained learning structures and the given task environments. The main conclusion of the experiments is that the GP can find a minimal strategy, i.e., a hierarchy that minimizes the number of primitive subtasks that can be executed for each type of situation. The experimental results for the most challenging environment also show that the policies of the subtasks can continue to improve, even after the structure of the hierarchy has been evolutionary stabilized, as an effect of Lamarckian mechanisms  相似文献   

15.
The recent Polytope ARTMAP (PTAM) suggests that irregular polytopes are more flexible than the redefined category geometries to approximate the borders among the desired output predictions. However, category expansion and adjustment steps without statistical information make PTAM not robust to noise and category overlap. In order to push the learning problem towards Structural Risk Minimization (SRM), this paper proposes Hierarchical Polytope ARTMAP (HPTAM) to use a hierarchical structure with different levels, which are determined by the complexity of regions incorporating the input pattern. Besides, overlapping of simplexes from the same desired prediction is designed to reduce category proliferation. Although HPTAM is still inevitably sensible to noisy outliers in the presence of noise, main experimental results show that HPTAM can achieve a balance between representation error and approximation error, which ameliorates the overall generalization capabilities.  相似文献   

16.
基于MAXQ方法的分层强化学习   总被引:1,自引:0,他引:1  
强化学习是机器学习领域的一个重要分支,但在强化学习系统中,学习的数量会随着状态变量的个数成指数级增长,从而形成"维数灾".为此提出了一种基于MAXQ的分层强化学习方法,通过引入抽象机制将强化学习任务分解到不同层次上来分别实现,使得每层上的学习任务仅需在较小的空间中进行,从而大大减少了学习的数量和规模.并给出具体算法--MAXQ-RLA.  相似文献   

17.
We study the problem of recovering temporal parameters which act as predictive operators, generalize time-to-collision and have direct interpretation for navigational purposes for piecewise arbitrarily smooth (polynomial) motion. A result stating that, for monocular observers undergoing arbitrary polynomial laws, these parameters are visually observable, is presented in the first part of this paper. This property suggests an alternate temporal representation of visual looming information. The second part of this paper is concerned with algorithmic approaches for environments with maneuvering agents. A method addressing model order determination, collision detection, and temporal parameter estimation is proposed. Experimental results are reported.  相似文献   

18.
分层增强学习在足球机器人比赛中的应用   总被引:4,自引:0,他引:4  
足球机器人的研究是一项挑战性的研究领域,为了设计出智能型的球员必须涉及到计算机、人工智能、视觉及机械学等方面的研究。球员的学习能力是体现其智能的主要标志。如何在不断改变的外界环境中选取合适的动作技巧是在机器人足球比赛中的一个关键问题。该文介绍了马尔可夫决策过程,在半马尔可夫决策模型下,利用分层增强学习算法对不同层次的动作学习和选取同时进行学习。在仿真平台上进行实验,结果表明该学习方法是非常有效的。  相似文献   

19.
20.
Ad Hoc网络是一种新型的自组织网络,它不需要预设网络设施而是通过网络节点自身的路由转发功能就能快速、自动地建立.论文提出了基于FCCN虚拟拓扑结构的层次Ad Hoc网络,结合现有的分层式结构分析了其特点并与之做了比较,最后通过仿真分析了其性能.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号