首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
基于模块化Q学习的足球机器人合作   总被引:2,自引:0,他引:2  
提出了一种新的多机器人系统学习结构,这种学习结构能够降低环境状态空间和机器 人的动作空间,加快学习速度。该方法的有效性在机器人足球比赛中得到了验证。  相似文献   

2.
强化学习在机器人足球比赛中的应用   总被引:8,自引:1,他引:8  
机器人足球比赛是一个有趣并且复杂的新兴的人工智能研究领域 ,它是一个典型的多智能体系统。采用强化学习方法研究了机器人足球比赛中的足球机器人的动作选择问题 ,扩展了单个Agent的强化学习方法 ,提出了基于多Agents的强化学习方法 ,最后给出了实验结果。  相似文献   

3.
在多机器人系统中,协作环境探索的强化学习的空间规模是机器人个数的指数函数,学习空间非常庞大造成收敛速度极慢。为了解决这个问题,将基于动作预测的强化学习方法及动作选择策略应用于多机器人协作研究中,通过预测机器人可能执行动作的概率以加快学习算法的收敛速度。实验结果表明,基于动作预测的强化学习方法能够比原始算法更快速地获取多机器人的协作策略。  相似文献   

4.
针对多机器人协作复杂搜集任务中学习空间大,学习速度慢的问题,提出了带共享区的双层强化学习算法。该强化学习算法不仅能够实现低层状态-动作对的学习,而且能够实现高层条件-行为对的学习。高层条件-行为对的学习避免了学习空间的组合爆炸,共享区的应用强化了机器人间协作学习的能力。仿真实验结果说明所提方法加快了学习速度,满足了未知环境下多机器人复杂搜集任务的要求。  相似文献   

5.
自适应模糊RBF神经网络的多智能体机器人强化学习   总被引:3,自引:0,他引:3  
多机器人环境中的学习,由于机器人所处的环境是连续状态,连续动作,而且包含多个机器人,因此学习空间巨大,直接应用Q学习算法难以获得满意的结果。文章研究中针对多智能体机器人系统的学习问题,提出自适应模糊RBF神经网络强化学习算法,网络本身具有模糊推理能力、较强的函数逼近能力以及泛化能力,因此,实现了人类专家知识与机器学习方法的结合,减少学习问题的复杂度;实现连续状态空间与动作空间的策略学习。  相似文献   

6.
基于人工神经网络的足球机器人分层学习研究   总被引:10,自引:2,他引:8  
主要研究人工神经网络在机器人足球比赛中的应用。介绍了足球机器人使用BP网络学习基本动作和行为决策的分层学习模型,并讨论了对BP算法的诸多改进方法。结合BP网络和产生式系统,提出了一个混合动作选择器,并进行了实验,给出了实验结果。  相似文献   

7.
强化学习一词来自于行为心理学,这门学科把行为学习看成反复试验的过程,从而把环境状态映射成相应的动作。在设计智能机器人过程中,如何来实现行为主义的思想,在与环境的交互中学习行为动作?文中把机器人在未知环境中为躲避障碍所采取的动作看作一种行为,采用强化学习方法来实现智能机器人避碰行为学习。为了提高机器人学习速度,在机器人局部路径规划中的状态空量化就显得十分重要。本文采用自组织映射网络的方法来进行空间的量化。由于自组织映射网络本身所具有的自组织特性,使得它在进行空间量化时就能够较好地解决适应性灵活性问题,本文在对状态空间进行自组织量化的基础方法上,采用强化学习。解决了机器人避碰行为的学习问题,取得了满意的学习结果。  相似文献   

8.
从研究分层强化学习入手,提出由MAS中任务结构分析产生的分层合作研究方法,通过区分子任务并以此建立更大粒度层面上的基于任务场景的状态空间,并结合以联合动作为基础的任务动作与势能场模型,从而解决强化学习中的状态空间的维数灾难。文中给出了基于机器人足球的子任务的算法应用,其效能得到实验的验证。  相似文献   

9.
多智能体强化学习及其在足球机器人角色分配中的应用   总被引:2,自引:0,他引:2  
足球机器人系统是一个典型的多智能体系统, 每个机器人球员选择动作不仅与自身的状态有关, 还要受到其他球员的影响, 因此通过强化学习来实现足球机器人决策策略需要采用组合状态和组合动作. 本文研究了基于智能体动作预测的多智能体强化学习算法, 使用朴素贝叶斯分类器来预测其他智能体的动作. 并引入策略共享机制来交换多智能体所学习的策略, 以提高多智能体强化学习的速度. 最后, 研究了所提出的方法在足球机器人动态角色分配中的应用, 实现了多机器人的分工和协作.  相似文献   

10.
基于强化学习规则的两轮机器人自平衡控制   总被引:1,自引:0,他引:1  
两轮机器人是一个典型的不稳定,非线性,强耦合的自平衡系统,在两轮机器人系统模型未知和没有先验经验的条件下,将强化学习算法和模糊神经网络有效结合,保证了函数逼近的快速性和收敛性,成功地实现两轮机器人的自学习平衡控制,并解决了两轮机器人连续状态空间和动作空间的强化学习问题;仿真和实验表明:该方法不仅在很短的时间内成功地完成对两轮机器人的平衡控制,而且在两轮机器人参数变化较大时,仍能维持两轮机器人的平衡。  相似文献   

11.
This paper proposes an algorithm to deal with continuous state/action space in the reinforcement learning (RL) problem. Extensive studies have been done to solve the continuous state RL problems, but more research should be carried out for RL problems with continuous action spaces. Due to non-stationary, very large size, and continuous nature of RL problems, the proposed algorithm uses two growing self-organizing maps (GSOM) to elegantly approximate the state/action space through addition and deletion of neurons. It has been demonstrated that GSOM has a better performance in topology preservation, quantization error reduction, and non-stationary distribution approximation than the standard SOM. The novel algorithm proposed in this paper attempts to simultaneously find the best representation for the state space, accurate estimation of Q-values, and appropriate representation for highly rewarded regions in the action space. Experimental results on delayed reward, non-stationary, and large-scale problems demonstrate very satisfactory performance of the proposed algorithm.  相似文献   

12.
强化学习算法中启发式回报函数的设计及其收敛性分析   总被引:3,自引:0,他引:3  
(中国科学院沈阳自动化所机器人学重点实验室沈阳110016)  相似文献   

13.
Engineers and researchers are paying more attention to reinforcement learning (RL) as a key technique for realizing adaptive and autonomous decentralized systems. In general, however, it is not easy to put RL into practical use. Our approach mainly deals with the problem of designing state and action spaces. Previously, an adaptive state space construction method which is called a ??state space filter?? and an adaptive action space construction method which is called ??switching RL??, have been proposed after the other space has been fixed. Then, we have reconstituted these two construction methods as one method by treating the former method and the latter method as a combined method for mimicking an infant??s perceptual and motor developments and we have proposed a method which is based on introducing and referring to ??entropy??. In this paper, a computational experiment was conducted using a so-called ??robot navigation problem?? with three-dimensional continuous state space and two-dimensional continuous action space which is more complicated than a so-called ??path planning problem??. As a result, the validity of the proposed method has been confirmed.  相似文献   

14.
The key approaches for machine learning, particularly learning in unknown probabilistic environments, are new representations and computation mechanisms. In this paper, a novel quantum reinforcement learning (QRL) method is proposed by combining quantum theory and reinforcement learning (RL). Inspired by the state superposition principle and quantum parallelism, a framework of a value-updating algorithm is introduced. The state (action) in traditional RL is identified as the eigen state (eigen action) in QRL. The state (action) set can be represented with a quantum superposition state, and the eigen state (eigen action) can be obtained by randomly observing the simulated quantum state according to the collapse postulate of quantum measurement. The probability of the eigen action is determined by the probability amplitude, which is updated in parallel according to rewards. Some related characteristics of QRL such as convergence, optimality, and balancing between exploration and exploitation are also analyzed, which shows that this approach makes a good tradeoff between exploration and exploitation using the probability amplitude and can speedup learning through the quantum parallelism. To evaluate the performance and practicability of QRL, several simulated experiments are given, and the results demonstrate the effectiveness and superiority of the QRL algorithm for some complex problems. This paper is also an effective exploration on the application of quantum computation to artificial intelligence.   相似文献   

15.
Reinforcement Learning (RL) is a well-known technique for learning the solutions of control problems from the interactions of an agent in its domain. However, RL is known to be inefficient in problems of the real-world where the state space and the set of actions grow up fast. Recently, heuristics, case-based reasoning (CBR) and transfer learning have been used as tools to accelerate the RL process. This paper investigates a class of algorithms called Transfer Learning Heuristically Accelerated Reinforcement Learning (TLHARL) that uses CBR as heuristics within a transfer learning setting to accelerate RL. The main contributions of this work are the proposal of a new TLHARL algorithm based on the traditional RL algorithm Q(λ) and the application of TLHARL on two distinct real-robot domains: a robot soccer with small-scale robots and the humanoid-robot stability learning. Experimental results show that our proposed method led to a significant improvement of the learning rate in both domains.  相似文献   

16.
Dyna-Q, a well-known model-based reinforcement learning (RL) method, interplays offline simulations and action executions to update Q functions. It creates a world model that predicts the feature values in the next state and the reward function of the domain directly from the data and uses the model to train Q functions to accelerate policy learning. In general, tabular methods are always used in Dyna-Q to establish the model, but a tabular model needs many more samples of experience to approximate the environment concisely. In this article, an adaptive model learning method based on tree structures is presented to enhance sampling efficiency in modeling the world model. The proposed method is to produce simulated experiences for indirect learning. Thus, the proposed agent has additional experience for updating the policy. The agent works backwards from collections of state transition and associated rewards, utilizing coarse coding to learn their definitions for the region of state space that tracks back to the precedent states. The proposed method estimates the reward and transition probabilities between states from past experience. Because the resultant tree is always concise and small, the agent can use value iteration to quickly estimate the Q-values of each action in the induced states and determine a policy. The effectiveness and generality of our method is further demonstrated in two numerical simulations. Two simulations, a mountain car and a mobile robot in a maze, are used to verify the proposed methods. The simulation result demonstrates that the training rate of our method can improve obviously.  相似文献   

17.
连续状态自适应离散化基于K-均值聚类的强化学习方法   总被引:5,自引:1,他引:5  
文锋  陈宗海  卓睿  周光明 《控制与决策》2006,21(2):143-0148
使用聚类算法对连续状态空间进行自适应离散化.得到了基于K-均值聚类的强化学习方法.该方法的学习过程分为两部分:对连续状态空间进行自适应离散化的状态空间学习,使用K-均值聚类算法;寻找最优策略的策略学习.使用替代合适迹Sarsa学习算法.对连续状态的强化学习基准问题进行仿真实验,结果表明该方法能实现对连续状态空间的自适应离散化,并最终学习到最优策略.与基于CMAC网络的强化学习方法进行比较.结果表明该方法具有节省存储空间和缩短计算时间的优点.  相似文献   

18.
多技能项目调度存在组合爆炸的现象, 其问题复杂度远超传统的单技能项目调度, 启发式算法和元启发式 算法在求解多技能项目调度问题时也各有缺陷. 为此, 根据项目调度的特点和强化学习的算法逻辑, 本文设计了基 于强化学习的多技能项目调度算法. 首先, 将多技能项目调度过程建模为符合马尔科夫性质的序贯决策过程, 并依 据决策过程设计了双智能体机制. 而后, 通过状态整合和行动分解, 降低了价值函数的学习难度. 最后, 为进一步提 高算法性能, 针对资源的多技能特性, 设计了技能归并法, 显著降低了资源分配算法的时间复杂度. 与启发式算法的 对比实验显示, 本文所设计的强化学习算法求解性能更高, 与元启发式算法的对比实验表明, 该算法稳定性更强, 且 求解速度更快.  相似文献   

19.
强化学习方法是人工智能领域中比较重要的方法之一,自从其提出以来已经有了很大的发展,并且能用来解决很多的问题。但是在遇到大规模状态空间问题时,使用普通的强化学习方法就会产生“维数灾”现象,所以提出了关系强化学习,把强化学习应用到关系领域可以在一定的程度上解决“维数灾”难题。在此基础上,简单介绍关系强化学习的概念以及相关的算法,以及以后有待解决的问题。  相似文献   

20.
Path-based relational reasoning over knowledge graphs has become increasingly popular due to a variety of downstream applications such as question answering in dialogue systems, fact prediction, and recommendation systems. In recent years, reinforcement learning (RL) based solutions for knowledge graphs have been demonstrated to be more interpretable and explainable than other deep learning models. However, the current solutions still struggle with performance issues due to incomplete state representations and large action spaces for the RL agent. We address these problems by developing HRRL (Heterogeneous Relational reasoning with Reinforcement Learning), a type-enhanced RL agent that utilizes the local heterogeneous neighborhood information for efficient path-based reasoning over knowledge graphs. HRRL improves the state representation using a graph neural network (GNN) for encoding the neighborhood information and utilizes entity type information for pruning the action space. Extensive experiments on real-world datasets show that HRRL outperforms state-of-the-art RL methods and discovers more novel paths during the training procedure, demonstrating the explorative power of our method.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号