首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 156 毫秒
1.
In this paper, we investigate the use of hierarchical reinforcement learning (HRL) to speed up the acquisition of cooperative multi-agent tasks. We introduce a hierarchical multi-agent reinforcement learning (RL) framework, and propose a hierarchical multi-agent RL algorithm called Cooperative HRL. In this framework, agents are cooperative and homogeneous (use the same task decomposition). Learning is decentralized, with each agent learning three interrelated skills: how to perform each individual subtask, the order in which to carry them out, and how to coordinate with other agents. We define cooperative subtasks to be those subtasks in which coordination among agents significantly improves the performance of the overall task. Those levels of the hierarchy which include cooperative subtasks are called cooperation levels. A fundamental property of the proposed approach is that it allows agents to learn coordination faster by sharing information at the level of cooperative subtasks, rather than attempting to learn coordination at the level of primitive actions. We study the empirical performance of the Cooperative HRL algorithm using two testbeds: a simulated two-robot trash collection task, and a larger four-agent automated guided vehicle (AGV) scheduling problem. We compare the performance and speed of Cooperative HRL with other learning algorithms, as well as several well-known industrial AGV heuristics. We also address the issue of rational communication behavior among autonomous agents in this paper. The goal is for agents to learn both action and communication policies that together optimize the task given a communication cost. We extend the multi-agent HRL framework to include communication decisions and propose a cooperative multi-agent HRL algorithm called COM-Cooperative HRL. In this algorithm, we add a communication level to the hierarchical decomposition of the problem below each cooperation level. Before an agent makes a decision at a cooperative subtask, it decides if it is worthwhile to perform a communication action. A communication action has a certain cost and provides the agent with the actions selected by the other agents at a cooperation level. We demonstrate the efficiency of the COM-Cooperative HRL algorithm as well as the relation between the communication cost and the learned communication policy using a multi-agent taxi problem.  相似文献   

2.
This paper introduces a novel multi-agent multi-state reinforcement learning exploration scheme for dynamic spectrum access and dynamic spectrum sharing in wireless communications. With the multi-agent multi-state reinforcement learning, cognitive radios can decide the best channels to use in order to maximize spectral efficiency in a distributed way. However, we argue that the performance of spectrum management, including both dynamic spectrum access and dynamic spectrum sharing, will largely depend on different reinforcement learning exploration schemes, and we believe that the traditional multi-agent multi-state reinforcement learning exploration schemes may not be adequate in the context of spectrum management. We then propose a novel reinforcement learning exploration scheme and show that we can improve the performance of multi-agent multi-state reinforcement learning based spectrum management by using the proposed reinforcement learning exploration scheme. We also investigate various real-world scenarios, and confirm the validity of the proposed method.  相似文献   

3.
In this paper, a novel iterative learning control (ILC) scheme with input sharing is presented for multi-agent consensus tracking. In many ILC works for multi-agent coordination problem, each agent maintains its own input learning, and the input signal is corrected by local measurements over iteration domain. If the agents are allowed to share their learned inputs among them, the strategy can improve the learning process as more learning resources are available. In this work, we develop a new type of learning controller by considering the input sharing among agents, which includes the traditional ILC strategy as a special case. The convergence condition is rigorously derived and analyzed as well. Furthermore, the proposed controller is extended to multi-agent systems under iteration-varying graph. It turns out that the developed controller is very robust to communication variations. In the numerical study, three illustrative examples are presented to show the effectiveness of the proposed controller. The learning controller with input sharing demonstrates not only faster convergence but also smooth transient performance.  相似文献   

4.
In this paper, a multi-agent reinforcement learning method based on action prediction of other agent is proposed. In a multi-agent system, action selection of the learning agent is unavoidably impacted by other agents’ actions. Therefore, joint-state and joint-action are involved in the multi-agent reinforcement learning system. A novel agent action prediction method based on the probabilistic neural network (PNN) is proposed. PNN is used to predict the actions of other agents. Furthermore, the sharing policy mechanism is used to exchange the learning policy of multiple agents, the aim of which is to speed up the learning. Finally, the application of presented method to robot soccer is studied. Through learning, robot players can master the mapping policy from the state information to the action space. Moreover, multiple robots coordination and cooperation are well realized.  相似文献   

5.
Distributed learning and cooperative control for multi-agent systems   总被引:1,自引:0,他引:1  
This paper presents an algorithm and analysis of distributed learning and cooperative control for a multi-agent system so that a global goal of the overall system can be achieved by locally acting agents. We consider a resource-constrained multi-agent system, in which each agent has limited capabilities in terms of sensing, computation, and communication. The proposed algorithm is executed by each agent independently to estimate an unknown field of interest from noisy measurements and to coordinate multiple agents in a distributed manner to discover peaks of the unknown field. Each mobile agent maintains its own local estimate of the field and updates the estimate using collective measurements from itself and nearby agents. Each agent then moves towards peaks of the field using the gradient of its estimated field while avoiding collision and maintaining communication connectivity. The proposed algorithm is based on a recursive spatial estimation of an unknown field. We show that the closed-loop dynamics of the proposed multi-agent system can be transformed into a form of a stochastic approximation algorithm and prove its convergence using Ljung’s ordinary differential equation (ODE) approach. We also present extensive simulation results supporting our theoretical results.  相似文献   

6.
Agents can learn to improve their coordination with their teammates and increase team performance. There are finite training instances, where each training instance is an opportunity for the learning agents to improve their coordination. In this article, we focus on allocating training instances to learning agent pairs, i.e., pairs that improve coordination with each other, with the goal of team formation. Agents learn at different rates, and hence, the allocation of training instances affects the performance of the team formed. We build upon previous work on the Synergy Graph model, that is learned completely from data and represents agents’ capabilities and compatibility in a multi-agent team. We formally define the learning agents team formation problem, and compare it with the multi-armed bandit problem. We consider learning agent pairs that improve linearly and geometrically, i.e., the marginal improvement decreases by a constant factor. We contribute algorithms that allocate the training instances, and compare against algorithms from the multi-armed bandit problem. In our simulations, we demonstrate that our algorithms perform similarly to the bandit algorithms in the linear case, and outperform them in the geometric case. Further, we apply our model and algorithms to a multi-agent foraging problem, thus demonstrating the efficacy of our algorithms in general multi-agent problems.  相似文献   

7.
Manufacturing processes need their entities to coordinate and discoordinate at different times of the process, in order to achieve the adequate manufacturing timing some jobs need to be done in a sequence or can be done in parallel, using different machines. This paper introduces a complex discoordination problem: the multibar problem, based on the “El Farol” bar problem, devised to test enhanced complexity for multi-agent systems. A multi-agent system that learns based on the extended classifier system (MAXCS) is used for the simulation. Different classifier population sizes are used to help agent adaptation. MAXCS adapts to the all possible configurations of the different bars tested in 20 different experiments in different ways. The first set of experiments proved that MAXCS is able to adapt to the multibar problem with the emergence of several agents switching bars (vacillating agents). The preliminary experiments yielded the hypothesis of the irrelevance of the classifiers’ rule conditions, and their evolution to influence the result. MAXCS is then compared with multi-agent Q-learning (MAQL). These experiments demonstrate the need to use evolutionary computation for better adaptation, rather than just a reinforcement learning algorithm, proving wrong the previous hypothesis. The MAXCS–MAQL comparison showed that the use of rule conditions, combined with the genetic algorithm, determines whether there is only one or several vacillating agents at the same time throughout the experiment. The solution scales when 133 agents are used for the problem. After this study, it can be concluded that the multibar problem can become an interesting benchmark for multi-agent learning and provide manufacturing processes with suitable coordination solutions.  相似文献   

8.
Communication and coordination are the main cores for reaching a constructive agreement among multi-agent systems (MASs). Dividing the overall performance of MAS to individual agents may lead to group learning as opposed to individual learning, which is one of the weak points of MASs. This paper proposes a recursive genetic framework for solving problems with high dynamism. In this framework, a combination of genetic algorithm and multi-agent capabilities is utilised to accelerate team learning and accurate credit assignment. The argumentation feature is used to accomplish agent learning and the negotiation features of MASs are used to achieve a credit assignment. The proposed framework is quite general and its recursive hierarchical structure could be extended. We have dedicated one special controlling module for increasing convergence time. Due to the complexity of blackjack, we have applied it as a possible test bed to evaluate the system’s performance. The learning rate of agents is measured as well as their credit assignment. The analysis of the obtained results led us to believe that our robust framework with the proposed negotiation operator is a promising methodology to solve similar problems in other areas with high dynamism.  相似文献   

9.
Coordinating Agents in Organizations Using Social Commitments   总被引:1,自引:0,他引:1  
One of the main challenges faced by the multi-agent community is to ensure the coordination of autonomous agents in open heterogeneous multi-agent systems. In order to coordinate their behaviour, the agents should be able to interact with each other. Social commitments have been used in recent years as an answer to the challenges of enabling heterogeneous agents to communicate and interact successfully. However, coordinating agents only by means of interaction models is difficult in open multi-agent systems, where possibly malevolent agents can enter at any time and violate the interaction rules. Agent organizations, institutions and normative systems have been used to control the way agents interact and behave. In this paper we try to bring together the two models of coordinating agents: commitment-based interaction and organizations. To this aim we describe how one can use social commitments to represent the expected behaviour of an agent playing a role in an organization. We thus make a first step towards a unified model of coordination in multi-agent systems: a definition of the expected behaviour of an agent using social commitments in both organizational and non-organizational contexts.  相似文献   

10.
多Agent领域所面临的一个重大的挑战是解决开放异质的多Agent系统中自治Agent间的协调问题。多Agent为了协调它们之间的活动,需要进行交互。社会承诺作为一种通信和交互机制,为自治的多Agent提供了一种协调的途径。然而,仅靠交互难以实现多Agent间的协调。Agent组织作为一种协调模型可以有效地控制多Agent间的交互与合作。论文将社会承诺和Agent组织两种协调机制相结合,提出一种基于社会承诺的Agent组织模型OMSC,分析了Agent如何用社会承诺进行推理以及基于社会承诺的多Agent系统并给出了一个实例,为多Agent间的协调提供了一种新的方法。  相似文献   

11.
Multi-agent reinforcement learning technologies are mainly investigated from two perspectives of the concurrence and the game theory. The former chiefly applies to cooperative multi-agent systems, while the latter usually applies to coordinated multi-agent systems. However, there exist such problems as the credit assignment and the multiple Nash equilibriums for agents with them. In this paper, we propose a new multi-agent reinforcement learning model and algorithm LMRL from a layer perspective. LMRL model is composed of an off-line training layer that employs a single agent reinforcement learning technology to acquire stationary strategy knowledge and an online interaction layer that employs a multi-agent reinforcement learning technology and the strategy knowledge that can be revised dynamically to interact with the environment. An agent with LMRL can improve its generalization capability, adaptability and coordination ability. Experiments show that the performance of LMRL can be better than those of a single agent reinforcement learning and Nash-Q.  相似文献   

12.
A fundamental question that must be addressed in software agents for knowledge management is coordination in multi-agent systems. The coordination problem is ubiquitous in knowledge management, such as in manufacturing, supply chains, negotiation, and agent-mediated auctions. This paper summarizes several multi-agent systems for knowledge management that have been developed recently by the author and his collaborators to highlight new research directions for multi-agent knowledge management systems. In particular, the paper focuses on three areas of research:
  • Coordination mechanisms in agent-based supply chains. How do we design mechanisms for coordination, information and knowledge sharing in supply chains with self-interested agents? What would be a good coordination mechanism when we have a non-linear structure of the supply chain, such as a pyramid structure? What are the desirable properties for the optimal structure of efficient supply chains in terms of information and knowledge sharing? Will DNA computing be a viable tool for the analysis of agent-based supply chains?
  • Coordination mechanisms in agent-mediated auctions. How do we induce cooperation and coordination among various self-interested agents in agent-mediated auctions? What are the fundamental principles to promote agent cooperation behavior? How do we train agents to learn to cooperate rather than program agents to cooperate? What are the principles of trust building in agent systems?
  • Multi-agent enterprise knowledge management, performance impact and human aspects. Will people use agent-based systems? If so, how do we coordinate agent-based systems with human beings? What would be the impact of agent systems in knowledge management in an information economy?
  相似文献   

13.
One approach to modeling multi-agent systems (MASs) is to employ a method that defines components which describe the local behavior of individual agents, as well as a special component, called a coordinator. The coordinator component coordinates the resource sharing behavior among the agents. The agent models define a set of local plans, and the combination of local plans and a coordinator defines a system’s global plan. Although earlier work has provided the base functionality needed to synthesize inter-agent resource sharing behavior for a global, conflict-free MAS environment, the lack of coordination flexibility limits the modeling capability at both the local plan level and the global plan level. In this paper, we describe a flexible design method that supports a range of coordinator components. The method defines four levels of coordination and an associated four-step coordinator generation process, which allows for the design of coordinators with increasing capabilities for handling complexity associated with resource coordination. Colored Petri net based simulation is used to analyze various properties that derive from different coordinators and synthesis of a reduced coordinator component is discussed for cases that involve homogeneous agents.  相似文献   

14.
Topology-based multi-agent systems (TMAS), wherein agents interact with one another according to their spatial relationship in a network, are well suited for problems with topological constraints. In a TMAS system, however, each agent may have a different state space, which can be rather large. Consequently, traditional approaches to multi-agent cooperative learning may not be able to scale up with the complexity of the network topology. In this paper, we propose a cooperative learning strategy, under which autonomous agents are assembled in a binary tree formation (BTF). By constraining the interaction between agents, we effectively unify the state space of individual agents and enable policy sharing across agents. Our complexity analysis indicates that multi-agent systems with the BTF have a much smaller state space and a higher level of flexibility, compared with the general form of n-ary (n > 2) tree formation. We have applied the proposed cooperative learning strategy to a class of reinforcement learning agents known as temporal difference-fusion architecture for learning and cognition (TD-FALCON). Comparative experiments based on a generic network routing problem, which is a typical TMAS domain, show that the TD-FALCON BTF teams outperform alternative methods, including TD-FALCON teams in single agent and n-ary tree formation, a Q-learning method based on the table lookup mechanism, as well as a classical linear programming algorithm. Our study further shows that TD-FALCON BTF can adapt and function well under various scales of network complexity and traffic volume in TMAS domains.  相似文献   

15.
基于量子计算的多Agent协作学习算法   总被引:1,自引:0,他引:1       下载免费PDF全文
针对多Agent协作强化学习中存在的行为和状态维数灾问题,以及行为选择上存在多个均衡解,为了收敛到最佳均衡解需要搜索策略空间和协调策略选择问题,提出了一种新颖的基于量子理论的多Agent协作学习算法。新算法借签了量子计算理论,将多Agent的行为和状态空间通过量子叠加态表示,利用量子纠缠态来协调策略选择,利用概率振幅表示行为选择概率,并用量子搜索算法来加速多Agent的学习。相应的仿真实验结果显示新算法的有效性。  相似文献   

16.
This paper considers the problem of multiagent sequential decision making under uncertainty and incomplete knowledge of the state transition model. A distributed learning framework, where each agent learns an individual model and shares the results with the team, is proposed. The challenges associated with this approach include choosing the model representation for each agent and how to effectively share these representations under limited communication. A decentralized extension of the model learning scheme based on the Incremental Feature Dependency Discovery (Dec-iFDD) is presented to address the distributed learning problem. The representation selection problem is solved by leveraging iFDD’s property of adjusting the model complexity based on the observed data. The model sharing problem is addressed by having each agent rank the features of their representation based on the model reduction error and broadcast the most relevant features to their teammates. The algorithm is tested on the multi-agent block building and the persistent search and track missions. The results show that the proposed distributed learning scheme is particularly useful in heterogeneous learning setting, where each agent learns significantly different models. We show through large-scale planning under uncertainty simulations and flight experiments with state-dependent actuator and fuel-burn- rate uncertainty that our planning approach can outperform planners that do not account for heterogeneity between agents.  相似文献   

17.
本体在多代理系统中起着重要的作用,它提供和定义了一个共享的语义词汇库。然而,在现实的多代理通讯的过程中,两个代理共享完全相同的语义词汇库是几乎不可能的。因为信息不完整以及本体的异构等特性,一个代理只能部分理解另外一个代理所拥有的本体内容,这使得代理间的通讯非常困难。本文就是探索利用近似逼近技术实现基于部分共享分布式本体的多代理通讯,从而实现多代理之间的协作查询。我们使用基于OWLweb本体语言的描述逻辑来描述分布式本体的近似查询技术。最终我们也开发了基于语义近似逼近方法的一个多代理协调查询系统。  相似文献   

18.
In multi-agent systems, agents coordinate their behaviour and work together to achieve a shared goal through collaboration. However, in open multi-agent systems, selecting qualified participants to form effective collaboration communities is challenging. In such systems, agents do not have access to complete domain knowledge, they leave and join the systems unpredictably. More importantly, agents are mostly self-interested and have multiple goals and policies that may be even conflicting with others, which makes the participant selection even more challenging.Many of the current approaches are not applicable in constantly evolving open systems, where their performance will be affected by any unpredictable behaviour, agents’ lack of complete domain knowledge and the impossibility of having a central coordinator agent. In open systems, agents require a mechanism that enables them to dynamically change their perception of the environment and observe their neighbouring agents, so that they can identify qualified collaboration participants that have no conflicting goals and to balance their level of cooperation and self-interest.In this paper, we propose OPSCO, as a solution for On-demand Participant Selection for Short-term Collaboration in Open multi-agent systems. Unlike the existing research, we do not assume any predefined setting for agents’ structure in the system and do not have access to complete domain knowledge and allow each agent to build a dynamic dependency model and maintain when there is a change in the system. The model captures the agent’s most recent dependency structure of goals and policies with its neighbouring agents. It enables them to identify and select a qualified non-conflicting set of participants.OPSCO is evaluated in a real world open system smart grid and constrained resource sharing case studies. OPSCO outperforms other methods by selecting a qualified non-conflicting set of agents to collaborate. OPSCO balances the self-interest and level of cooperation and decreases failure in the overall agents’ goals (individual/shared).  相似文献   

19.
乔林  罗杰 《计算机科学》2012,39(5):213-216
主要以提高多智能体系统中Q学习算法的学习效率为研究目标,以追捕问题为研究平台,提出了一种基于共享经验的Q学习算法。该算法模拟人类的团队学习行为,各个智能体拥有共同的最终目标,即围捕猎物,同时每个智能体通过协商获得自己的阶段目标。在学习过程中把学习分为阶段性学习,每学习一个阶段,就进行一次阶段性总结,分享彼此好的学习经验,以便于下一阶段的学习。这样以学习快的、好的带动慢的、差的,进而提升总体的学习性能。仿真实验证明,在学习过程中共享经验的Q学习算法能够提高学习系统的性能,高效地收敛于最优策略。  相似文献   

20.
并行多任务分配是多agent系统中极具挑战性的课题, 主要面向资源分配、灾害应急管理等应用需求, 研究如何把一组待求解任务分配给相应的agent联盟去执行. 本文提出了一种基于自组织、自学习agent的分布式并行多任务分配算法, 该算法引入P学习设计了单agent寻找任务的学习模型, 并给出了agent之间通信和协商策略. 对比实验说明该算法不仅能快速寻找到每个任务的求解联盟, 而且能明确给出联盟中各agent成员的实际资源承担量, 从而可以为实际的控制和决策任务提供有价值的参考依据.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号