首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The robot soccer game has been proposed as a benchmark problem for the artificial intelligence and robotic researches. Decision-making system is the most important part of the robot soccer system. As the environment is dynamic and complex, one of the reinforcement learning (RL) method named FNN-RL is employed in learning the decision-making strategy. The FNN-RL system consists of the fuzzy neural network (FNN) and RL. RL is used for structure identification and parameters tuning of FNN. On the other hand, the curse of dimensionality problem of RL can be solved by the function approximation characteristics of FNN. Furthermore, the residual algorithm is used to calculate the gradient of the FNN-RL method in order to guarantee the convergence and rapidity of learning. The complex decision-making task is divided into multiple learning subtasks that include dynamic role assignment, action selection, and action implementation. They constitute a hierarchical learning system. We apply the proposed FNN-RL method to the soccer agents who attempt to learn each subtask at the various layers. The effectiveness of the proposed method is demonstrated by the simulation and the real experiments.  相似文献   

2.
Speaker verification has been studied widely from different points of view, including accuracy, robustness and being real-time. Recent studies have turned toward better feature stability and robustness. In this paper we study the effect of nonlinear manifold based dimensionality reduction for feature robustness. Manifold learning is a popular recent approach for nonlinear dimensionality reduction. Algorithms for this task are based on the idea that each data point may be described as a function of only a few parameters. Manifold learning algorithms attempt to uncover these parameters in order to find a low-dimensional representation of the data. From the manifold based dimension reduction approaches, we applied the widely used Isometric mapping (Isomap) algorithm. Since in the problem of speaker verification, the input utterance is compared with the model of the claiming client, a speaker dependent feature transformation would be beneficial for deciding on the identity of the speaker. Therefore, our first contribution is to use Isomap dimension reduction approach in the speaker dependent context and compare its performance with two other widely used approaches, namely principle component analysis and factor analysis. The other contribution of our work is to perform the nonlinear transformation in a speaker-dependent framework. We evaluated this approach in a GMM based speaker verification framework using Tfarsdat Telephone speech dataset for different noises and SNRs and the evaluations have shown reliability and robustness even in low SNRs. The results also show better performance for the proposed Isomap approach compared to the other approaches.  相似文献   

3.
Riemannian manifold learning   总被引:1,自引:0,他引:1  
Recently, manifold learning has been widely exploited in pattern recognition, data analysis, and machine learning. This paper presents a novel framework, called Riemannian manifold learning (RML), based on the assumption that the input high-dimensional data lie on an intrinsically low-dimensional Riemannian manifold. The main idea is to formulate the dimensionality reduction problem as a classical problem in Riemannian geometry, i.e., how to construct coordinate charts for a given Riemannian manifold? We implement the Riemannian normal coordinate chart, which has been the most widely used in Riemannian geometry, for a set of unorganized data points. First, two input parameters (the neighborhood size k and the intrinsic dimension d) are estimated based on an efficient simplicial reconstruction of the underlying manifold. Then, the normal coordinates are computed to map the input high-dimensional data into a low-dimensional space. Experiments on synthetic data as well as real world images demonstrate that our algorithm can learn intrinsic geometric structures of the data, preserve radial geodesic distances, and yield regular embeddings.  相似文献   

4.
强化学习(reinforcement learning)是机器学习和人工智能领域的重要分支,近年来受到社会各界和企业的广泛关注。强化学习算法要解决的主要问题是,智能体如何直接与环境进行交互来学习策略。但是当状态空间维度增加时,传统的强化学习方法往往面临着维度灾难,难以取得好的学习效果。分层强化学习(hierarchical reinforcement learning)致力于将一个复杂的强化学习问题分解成几个子问题并分别解决,可以取得比直接解决整个问题更好的效果。分层强化学习是解决大规模强化学习问题的潜在途径,然而其受到的关注不高。本文将介绍和回顾分层强化学习的几大类方法。  相似文献   

5.
强化学习方法是人工智能领域中比较重要的方法之一,自从其提出以来已经有了很大的发展,并且能用来解决很多的问题。但是在遇到大规模状态空间问题时,使用普通的强化学习方法就会产生“维数灾”现象,所以提出了关系强化学习,把强化学习应用到关系领域可以在一定的程度上解决“维数灾”难题。在此基础上,简单介绍关系强化学习的概念以及相关的算法,以及以后有待解决的问题。  相似文献   

6.
In recent robotics fields, much attention has been focused on utilizing reinforcement learning (RL) for designing robot controllers, since environments where the robots will be situated in should be unpredictable for human designers in advance. However there exist some difficulties. One of them is well known as ‘curse of dimensionality problem’. Thus, in order to adopt RL for complicated systems, not only ‘adaptability’ but also ‘computational efficiencies’ should be taken into account. The paper proposes an adaptive state recruitment strategy for NGnet-based actor-critic RL. The strategy enables the learning system to rearrange/divide its state space gradually according to the task complexity and the progress of learning. Some simulation results and real robot implementations show the validity of the method.  相似文献   

7.
Hierarchical reinforcement learning (RL) algorithms can learn a policy faster than standard RL algorithms. However, the applicability of hierarchical RL algorithms is limited by the fact that the task decomposition has to be performed in advance by the human designer. We propose a Lamarckian evolutionary approach for automatic development of the learning structure in hierarchical RL. The proposed method combines the MAXQ hierarchical RL method and genetic programming (GP). In the MAXQ framework, a subtask can optimize the policy independently of its parent task's policy, which makes it possible to reuse learned policies of the subtasks. In the proposed method, the MAXQ method learns the policy based on the task hierarchies obtained by GP, while the GP explores the appropriate hierarchies using the result of the MAXQ method. To show the validity of the proposed method, we have performed simulation experiments for a foraging task in three different environmental settings. The results show strong interconnection between the obtained learning structures and the given task environments. The main conclusion of the experiments is that the GP can find a minimal strategy, i.e., a hierarchy that minimizes the number of primitive subtasks that can be executed for each type of situation. The experimental results for the most challenging environment also show that the policies of the subtasks can continue to improve, even after the structure of the hierarchy has been evolutionary stabilized, as an effect of Lamarckian mechanisms  相似文献   

8.
Gaussian fields (GF) have recently received considerable attention for dimension reduction and semi-supervised classification. In this paper we show how the GF framework can be used for semi-supervised regression on high-dimensional data. We propose an active learning strategy based on entropy minimization and a maximum likelihood model selection method. Furthermore, we show how a recent generalization of the LLE algorithm for correspondence learning can be cast into the GF framework, which obviates the need to choose a representation dimensionality.  相似文献   

9.
基于路径匹配的在线分层强化学习方法   总被引:1,自引:0,他引:1  
如何在线找到正确的子目标是基于option的分层强化学习的关键问题.通过分析学习主体在子目标处的动作,发现了子目标的有效动作受限的特性,进而将寻找子目标的问题转化为寻找路径中最匹配的动作受限状态.针对网格学习环境,提出了单向值方法表示子目标的有效动作受限特性和基于此方法的option自动发现算法.实验表明,基于单向值方法产生的option能够显著加快Q学习算法,也进一步分析了option产生的时机和大小对Q学习算法性能的影响.  相似文献   

10.
陈珍  夏靖波  柏骏  徐敏 《计算机科学》2015,42(11):288-292
信息全面与维数灾难的矛盾是大数据时代网络态势感知需要解决的首要难题。特征提取一直是主流的降维方法,但现有算法对高维非线性数据效果不佳;深度学习是一类具有多层非线性映射的学习算法,可以完成复杂函数的逼近,但对隐层相关参数十分敏感。针对上述问题,将进化算法的思想引入深度学习,提出了一种基于进化深度学习的特征提取算法。该算法利用遗传算法及进化策略实现全局搜索及优化的特点,并对深度学习结构及相关参数进行了优化。理论分析及实验结果都证明了该算法的有效性。  相似文献   

11.
Task decomposition and State abstraction are crucial parts in reinforcement learning. It allows an agent to ignore aspects of its current states that are irrelevant to its current decision, and therefore speeds up dynamic programming and learning. This paper presents the SVI algorithm that uses a dynamic Bayesian network model to construct an influence graph that indicates relationships between state variables. SVI performs state abstraction for each subtask by ignoring irrelevant state variables and lower level subtasks. Experiment results show that the decomposition of tasks introduced by SVI can significantly accelerate constructing a near-optimal policy. This general framework can be applied to a broad spectrum of complex real world problems such as robotics, industrial manufacturing, games and others.  相似文献   

12.
万建武  杨明 《软件学报》2013,24(11):2597-2609
传统的降维方法追求较低的识别错误率,假设不同错分的代价相同,这个假设在一些实际应用中往往不成立.例如,在基于人脸识别的门禁系统中,存在入侵者类和合法者类,将入侵者错分成合法者的损失往往高于将合法者错分成入侵者的损失,而将合法者错分成入侵者的损失又大于将合法者错分成其他合法者的损失.为此,首先通过对人脸识别门禁系统进行分析,将其归为一个代价敏感的子类学习问题,然后将错分代价以及子类信息同时注入判别分析的框架中,提出一种近似于成对贝叶斯风险准则的降维算法.在人脸数据集Extended Yale B以及ORL上的实验结果表明了该算法的有效性.  相似文献   

13.
Dimensionality reduction (DR) has been one central research topic in information theory, pattern recognition, and machine learning. Apparently, the performance of many learning models significantly rely on dimensionality reduction: successful DR can largely improve various approaches in clustering and classification, while inappropriate DR may deteriorate the systems. When applied on high-dimensional data, some existing research approaches often try to reduce the dimensionality first, and then input the reduced features to other available models, e.g., Gaussian mixture model (GMM). Such independent learning could however significantly limit the performance, since the optimal subspace given by a particular DR approach may not be appropriate for the following model. In this paper, we focus on investigating how unsupervised dimensionality reduction could be performed together with GMM and if such joint learning could lead to improvement in comparison with the traditional unsupervised method. In particular, we engage the mixture of factor analyzers with the assumption that a common factor loading exists for all the components. Based on that, we then present EM-algorithm that converges to a local optimal solution. Such setting exactly optimizes a dimensionality reduction together with the parameters of GMM. We describe the framework, detail the algorithm, and conduct a series of experiments to validate the effectiveness of our proposed approach. Specifically, we compare the proposed joint learning approach with two competitive algorithms on one synthetic and six real data sets. Experimental results show that the joint learning significantly outperforms the comparison methods in terms of three criteria.  相似文献   

14.
作为机器学习和人工智能领域的一个重要分支,多智能体分层强化学习以一种通用的形式将多智能体的协作能力与强化学习的决策能力相结合,并通过将复杂的强化学习问题分解成若干个子问题并分别解决,可以有效解决空间维数灾难问题。这也使得多智能体分层强化学习成为解决大规模复杂背景下智能决策问题的一种潜在途径。首先对多智能体分层强化学习中涉及的主要技术进行阐述,包括强化学习、半马尔可夫决策过程和多智能体强化学习;然后基于分层的角度,对基于选项、基于分层抽象机、基于值函数分解和基于端到端等4种多智能体分层强化学习方法的算法原理和研究现状进行了综述;最后介绍了多智能体分层强化学习在机器人控制、博弈决策以及任务规划等领域的应用现状。  相似文献   

15.
半监督降维(Semi\|Supervised Dimensionality Reduction,SSDR)框架下,基于成对约束提出一种半监督降维算法SCSSDR。利用成对样本进行构图,在保持局部结构的同时顾及数据的全局结构。通过最优化目标函数,使得同类样本更加紧凑\,异类样本更加离散。采用UCI数据集对算法进行定量分析,发现该方法优于PCA及传统流形学习算法,进一步的UCI数据集和高光谱数据集分类实验表明:该方法适合于进行分类目的特征提取。  相似文献   

16.
Many image recognition algorithms based on data-learning perform dimensionality reduction before the actual learning and classification because the high dimensionality of raw imagery would require enormous training sets to achieve satisfactory performance. A potential problem with this approach is that most dimensionality reduction techniques, such as principal component analysis (PCA), seek to maximize the representation of data variation into a small number of PCA components, without considering interclass discriminability. This paper presents a neural-network-based transformation that simultaneously seeks to provide dimensionality reduction and a high degree of discriminability by combining together the learning mechanism of a neural-network-based PCA and a backpropagation learning algorithm. The joint discrimination-compression algorithm is applied to infrared imagery to detect military vehicles.  相似文献   

17.
随着电力通信网络的快速增长,网络中通信设备的在线状态预测对于提升运维可靠性具有重要意义。在实际场景中,设备工作数据来源复杂,往往存在数据维度高、特征稀疏且模式重复等问题,导致传统的预测方法性能非常受限。本文提出一种基于注意力机制和LSTM(长短时记忆)模块的设备状态预测模型。模型训练分2阶段进行,保证注意力机制能够通过端到端学习对原始特征进行充分降维并提取出最相关的信息进行状态预测。基于电力通信网络真实运维数据进行一系列验证实验,结果表明所提方法在设备状态预测问题中的有效性。  相似文献   

18.
19.
在状态空间满足结构化条件的前提下,通过状态空间的维度划分直接将复杂的原始MDP问题递阶分解为一组简单的MDP或SMDP子问题,并在线对递阶结构进行完善.递阶结构中嵌入不同的再励学习方法可以形成不同的递阶学习.所提出的方法在具备递阶再励学习速度快、易于共享等优点的同时,降低了对先验知识的依赖程度,缓解了学习初期回报值稀少的问题.  相似文献   

20.
We examined critical characteristics of fluent cognitive skills, using the Georgia Tech Aegis Simulation Program, a tactical decision-making computer game that simulates tasks of an anti-air-warfare coordinator. To characterize learning, we adopted the unit-task analysis framework, in which a task is decomposed into several unit tasks that are further decomposed into functional-level subtasks. Our results showed that learning at a global level could be decomposed into learning smaller component tasks. Further, most learning was associated with a reduction in cognitive processes, in which people make inferences from the currently available information. Eye-movement data also revealed that the time spent on task-irrelevant regions of the display decreased more than did the time spent on task-relevant regions. In sum, although fluency in dynamic, complex problem solving was achieved by attaining efficiency in perceptual, motor, and cognitive processes, the magnitude of the gains depended on the preexisting fluency of the component skills. These results imply that a training program should decompose a task into its component skills and emphasize those components with which trainees have relatively little prior experience. Actual or potential applications of this research include learning and training of complex tasks as well as evaluation of performance on those tasks.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号