首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
Multi-agent reinforcement learning technologies are mainly investigated from two perspectives of the concurrence and the game theory. The former chiefly applies to cooperative multi-agent systems, while the latter usually applies to coordinated multi-agent systems. However, there exist such problems as the credit assignment and the multiple Nash equilibriums for agents with them. In this paper, we propose a new multi-agent reinforcement learning model and algorithm LMRL from a layer perspective. LMRL model is composed of an off-line training layer that employs a single agent reinforcement learning technology to acquire stationary strategy knowledge and an online interaction layer that employs a multi-agent reinforcement learning technology and the strategy knowledge that can be revised dynamically to interact with the environment. An agent with LMRL can improve its generalization capability, adaptability and coordination ability. Experiments show that the performance of LMRL can be better than those of a single agent reinforcement learning and Nash-Q.  相似文献   

2.
提高强化学习速度的方法研究   总被引:4,自引:0,他引:4  
强化学习一词出自于行为心理学,这门学科把学习看作为反复试验的过程,以便把环境的状态映射为动作。强化学习的这种特性必然增加智能系统的困难性,学习时间增长。强化学习学习速度较慢的原因是没有明确的监督信号。因此,强化学习系统在与环境交互时不得不采取反复试验的方法依靠外部评价信号来调整自己的行为。智能系统必然经过很长的学习过程。如何提高强化学习速度是一个最重要的研究问题。该文从几个方面来讨论提高强化学习速度的方法。  相似文献   

3.
虽然深度强化学习能够解决很多复杂的控制问题, 但是需要付出的代价是必须和环境进行大量的交互, 这是深度强化学习所面临的一大挑战. 造成这一问题的原因之一是仅依靠值函数损失难以让智能体从高维的复杂输入中提取有效特征. 导致智能体对所处状态理解不足, 从而不能正确给状态分配价值. 因此, 为了让智能体认识所处环境, 提高强化学习样本效率, 本文提出一种结合向前状态预测与隐空间约束的表示学习方法(regularized predictive representation learning, RPRL). 帮助智能体从高维视觉输入中学习并提取状态特征, 以此来提高强化学习样本效率. 该方法用前向的状态转移损失作为辅助损失, 使智能体学习到的特征包含环境转移的相关动态信息. 同时在向前预测的基础上添加正则化项对隐空间的状态表示进行约束, 进一步帮助智能体学习到高维度输入的平滑、规则表示. 该方法在DeepMind Control (DMControl)环境中与其他的基于模型的方法以及加入了表示学习的无模型方法进行比较, 都获得了更好的性能.  相似文献   

4.
Reinforcement learning is about learning agent models that make the best sequential decisions in unknown environments. In an unknown environment, the agent needs to explore the environment while exploiting the collected information, which usually forms a sophisticated problem to solve. Derivative-free optimization, meanwhile, is capable of solving sophisticated problems. It commonly uses a sampling-andupdating framework to iteratively improve the solution, where exploration and exploitation are also needed to be well balanced. Therefore, derivative-free optimization deals with a similar core issue as reinforcement learning, and has been introduced in reinforcement learning approaches, under the names of learning classifier systems and neuroevolution/evolutionary reinforcement learning. Although such methods have been developed for decades, recently, derivative-free reinforcement learning exhibits attracting increasing attention. However, recent survey on this topic is still lacking. In this article, we summarize methods of derivative-free reinforcement learning to date, and organize the methods in aspects including parameter updating, model selection, exploration, and parallel/distributed methods. Moreover, we discuss some current limitations and possible future directions, hoping that this article could bring more attentions to this topic and serve as a catalyst for developing novel and efficient approaches.  相似文献   

5.
Recently, many models of reinforcement learning with hierarchical or modular structures have been proposed. They decompose a task into simpler subtasks and solve them by using multiple agents. However, these models impose certain restrictions on the topological relations of agents and so on. By relaxing these restrictions, we propose networked reinforcement learning, where each agent in a network acts autonomously by regarding the other agents as a part of its environment. Although convergence to an optimal policy is no longer assured, by means of numerical simulations, we show that our model functions appropriately, at least in certain simple situations. This work was presented in part at the 13th International Symposium on Artificial Life and Robotics, Oita, Japan, January 31–February 2, 2008  相似文献   

6.
郭锐  彭军  吴敏 《计算机工程与应用》2005,41(13):36-38,146
增强学习属于机器学习的一种,它通过与环境的交互获得策略的改进,其在线学习和自适应学习的特点使其成为解决策略寻优问题有力的工具。多智能体系统是人工智能领域的一个研究热点,对于多智能体学习技术的研究需要建立在系统环境模型的基础之上,由于多个智能体的存在,智能体之间的相互影响使得多智能体系统高度复杂,多智能体系统环境属于非确定马尔可夫模型,因此直接把基于马尔可夫模型的增强学习技术引入多智能体系统是不合适的。论文基于智能体间独立的学习机制,提出了一种改进的多智能体Q学习算法,使其适用于非确定马尔可夫环境,并对该学习技术在多智能体系统RoboCup中的应用进行了研究,实验证明了该学习技术的有效性与泛化能力,最后简要给出了多智能体增强学习研究的方向及进一步的工作。  相似文献   

7.
Multi-agent reinforcement learning methods suffer from several deficiencies that are rooted in the large state space of multi-agent environments. This paper tackles two deficiencies of multi-agent reinforcement learning methods: their slow learning rate, and low quality decision-making in early stages of learning. The proposed methods are applied in a grid-world soccer game. In the proposed approach, modular reinforcement learning is applied to reduce the state space of the learning agents from exponential to linear in terms of the number of agents. The modular model proposed here includes two new modules, a partial-module and a single-module. These two new modules are effective for increasing the speed of learning in a soccer game. We also apply the instance-based learning concepts, to choose proper actions in states that are not experienced adequately during learning. The key idea is to use neighbouring states that have been explored sufficiently during the learning phase. The results of experiments in a grid-soccer game environment show that our proposed methods produce a higher average reward compared to the situation where the proposed method is not applied to the modular structure.  相似文献   

8.
Autonomous agents that learn about their environment can be divided into two broad classes. One class of existing learners, reinforcement learners, typically employ weak learning methods to directly modify an agent's execution knowledge. These systems are robust in dynamic and complex environments but generally do not support planning or the pursuit of multiple goals. In contrast, symbolic theory revision systems learn declarative planning knowledge that allows them to pursue multiple goals in large state spaces, but these approaches are generally only applicable to fully sensed, deterministic environments with no exogenous events. This research investigates the hypothesis that by limiting an agent to procedural access to symbolic planning knowledge, the agent can combine the powerful, knowledge-intensive learning performance of the theory revision systems with the robust performance in complex environments of the reinforcement learners. The system, IMPROV, uses an expressive knowledge representation so that it can learn complex actions that produce conditional or sequential effects over time. By developing learning methods that only require limited procedural access to the agent's knowledge, IMPROV's learning remains tractable as the agent's knowledge is scaled to large problems. IMPROV learns to correct operator precondition and effect knowledge in complex environments that include such properties as noise, multiple agents and time-critical tasks, and demonstrates a general learning method that can be easily strengthened through the addition of many different kinds of knowledge.  相似文献   

9.
Multiagent learning provides a promising paradigm to study how autonomous agents learn to achieve coordinated behavior in multiagent systems. In multiagent learning, the concurrency of multiple distributed learning processes makes the environment nonstationary for each individual learner. Developing an efficient learning approach to coordinate agents’ behavior in this dynamic environment is a difficult problem especially when agents do not know the domain structure and at the same time have only local observability of the environment. In this paper, a coordinated learning approach is proposed to enable agents to learn where and how to coordinate their behavior in loosely coupled multiagent systems where the sparse interactions of agents constrain coordination to some specific parts of the environment. In the proposed approach, an agent first collects statistical information to detect those states where coordination is most necessary by considering not only the potential contributions from all the domain states but also the direct causes of the miscoordination in a conflicting state. The agent then learns to coordinate its behavior with others through its local observability of the environment according to different scenarios of state transitions. To handle the uncertainties caused by agents’ local observability, an optimistic estimation mechanism is introduced to guide the learning process of the agents. Empirical studies show that the proposed approach can achieve a better performance by improving the average agent reward compared with an uncoordinated learning approach and by reducing the computational complexity significantly compared with a centralized learning approach. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

10.
Reinforcement learning is one of the more prominent machine-learning technologies due to its unsupervised learning structure and ability to continually learn, even in a dynamic operating environment. Applying this learning to cooperative multi-agent systems not only allows each individual agent to learn from its own experience, but also offers the opportunity for the individual agents to learn from the other agents in the system so the speed of learning can be accelerated. In the proposed learning algorithm, an agent adapts to comply with its peers by learning carefully when it obtains a positive reinforcement feedback signal, but should learn more aggressively if a negative reward follows the action just taken. These two properties are applied to develop the proposed cooperative learning method. This research presents the novel use of the fastest policy hill-climbing methods of Win or Lose Fast (WoLF) with policy-sharing. Results from the multi-agent cooperative domain illustrate that the proposed algorithms perform better than Q-learning alone in a piano mover environment. It also demonstrates that agents can learn to accomplish a task together efficiently through repetitive trials.  相似文献   

11.
一种新的多智能体Q学习算法   总被引:2,自引:0,他引:2  
郭锐  吴敏  彭军  彭姣  曹卫华 《自动化学报》2007,33(4):367-372
针对非确定马尔可夫环境下的多智能体系统,提出了一种新的多智能体Q学习算法.算法中通过对联合动作的统计来学习其它智能体的行为策略,并利用智能体策略向量的全概率分布保证了对联合最优动作的选择. 同时对算法的收敛性和学习性能进行了分析.该算法在多智能体系统RoboCup中的应用进一步表明了算法的有效性与泛化能力.  相似文献   

12.
多智能体协作的两层强化学习实现方法   总被引:3,自引:0,他引:3  
提出了多智能体协作的两层强化学习方法。该方法主要通过在单个智能体中构筑两层强化学习单元来实现,将该方法应用于3个智能体协作抬起圆形物体的计算机模拟中,结果表明比采用传统强化学习方法的智能体协作得更好。  相似文献   

13.
In this paper, we first discuss the meaning of physical embodiment and the complexity of the environment in the context of multi-agent learning. We then propose a vision-based reinforcement learning method that acquires cooperative behaviors in a dynamic environment. We use the robot soccer game initiated by RoboCup (Kitano et al., 1997) to illustrate the effectiveness of our method. Each agent works with other team members to achieve a common goal against opponents. Our method estimates the relationships between a learner's behaviors and those of other agents in the environment through interactions (observations and actions) using a technique from system identification. In order to identify the model of each agent, Akaike's Information Criterion is applied to the results of Canonical Variate Analysis to clarify the relationship between the observed data in terms of actions and future observations. Next, reinforcement learning based on the estimated state vectors is performed to obtain the optimal behavior policy. The proposed method is applied to a soccer playing situation. The method successfully models a rolling ball and other moving agents and acquires the learner's behaviors. Computer simulations and real experiments are shown and a discussion is given.  相似文献   

14.
Lately, development in robotics for utilizing in both industry and home is in much progress. In this research, a group of robots is made to handle relatively complicated tasks. Cooperative action among robots is one of the research areas in robotics that is progressing remarkably well. Reinforcement learning is known as a common approach in robotics for deploying acquisition of action under dynamic environment. However, until recently, reinforcement learning is only applied to one agent problem. In multi-agent environment where plural robots exist, it was difficult to differentiate between learning of achievement of task and learning of performing cooperative action. This paper introduces a method of implementing reinforcement learning to induce cooperation among a group of robots where its task is to transport luggage of various weights to a destination. The general Q-learning method is used as a learning algorithm. Also, the switching of learning mode is proposed for reduction of learning time and learning area. Finally, grid world simulation is carried out to evaluate the proposed methods.  相似文献   

15.
This paper examines the performance of simple reinforcement learningalgorithms in a stationary environment and in a repeated game where theenvironment evolves endogenously based on the actions of other agents. Sometypes of reinforcement learning rules can be extremely sensitive to smallchanges in the initial conditions, consequently, events early in a simulationcan affect the performance of the rule over a relatively long time horizon.However, when multiple adaptive agents interact, algorithms that performedpoorly in a stationary environment often converge rapidly to a stableaggregate behaviors despite the slow and erratic behavior of individuallearners. Algorithms that are robust in stationary environments can exhibitslow convergence in an evolving environment.  相似文献   

16.
针对非确定马尔可夫环境下的多智能体系统,提出了多智能体Q学习模型和算法。算法中通过对联合动作的统计来学习其它智能体的行为策略,并利用智能体策略向量的全概率分布保证了对联合最优动作的选择。在实验中,成功实现了智能体的决策,提高了AFU队的整体的对抗能力,证明了算法的有效性和可行性。  相似文献   

17.
模仿学习是强化学习与监督学习的结合,目标是通过观察专家演示,学习专家策略,从而加速强化学习。通过引入任务相关的额外信息,模仿学习相较于强化学习,可以更快地实现策略优化,为缓解低样本效率问题提供了解决方案。模仿学习已成为解决强化学习问题的一种流行框架,涌现出多种提高学习性能的算法和技术。通过与图形图像学的最新研究成果相结合,模仿学习已经在游戏人工智能(artificial intelligence,AI)、机器人控制和自动驾驶等领域发挥了重要作用。本文围绕模仿学习的年度发展,从行为克隆、逆强化学习、对抗式模仿学习、基于观察量的模仿学习和跨领域模仿学习等多个角度进行深入探讨,介绍了模仿学习在实际应用上的最新情况,比较了国内外研究现状,并展望了该领域未来的发展方向。旨在为研究人员和从业人员提供模仿学习的最新进展,从而为开展工作提供参考与便利。  相似文献   

18.
基于半自治agent的profit-sharing增强学习方法研究   总被引:1,自引:0,他引:1       下载免费PDF全文
在基于半自治agent的系统中应用profit-sharing增强学习方法,并与基于动态规划的Q-learning 增强学习方法进行比较,在不确定因素较多的动态环境中,当系统状态变化不是一个马尔科夫过程时profit-sharing方法具有很大优势。根据半自治agent中半自治的特性——受制性,提出了一种面向基于半自治agent的增强学习模型,以战场仿真中安全隐蔽的寻找模型为实例对基于半自治agent的profit-sharing增强学习模型进行了试验分析。  相似文献   

19.
One of the main challenges in Grid computing is efficient allocation of resources (CPU – hours, network bandwidth, etc.) to the tasks submitted by users. Due to the lack of centralized control and the dynamic/stochastic nature of resource availability, any successful allocation mechanism should be highly distributed and robust to the changes in the Grid environment. Moreover, it is desirable to have an allocation mechanism that does not rely on the availability of coherent global information. In this paper we examine a simple algorithm for distributed resource allocation in a simplified Grid-like environment that meets the above requirements. Our system consists of a large number of heterogenous reinforcement learning agents that share common resources for their computational needs. There is no explicit communication or interaction between the agents: the only information that agents receive is the expected response time of a job it submitted to a particular resource, which serves as a reinforcement signal for the agent. The results of our experiments suggest that even simple reinforcement learning can indeed be used to achieve load balanced resource allocation in large scale heterogenous system.  相似文献   

20.
学习、交互及其结合是建立健壮、自治agent的关键必需能力。强化学习是agent学习的重要部分,agent强化学习包括单agent强化学习和多agent强化学习。文章对单agent强化学习与多agent强化学习进行了比较研究,从基本概念、环境框架、学习目标、学习算法等方面进行了对比分析,指出了它们的区别和联系,并讨论了它们所面临的一些开放性的问题。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号