首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 328 毫秒
1.
Agents can learn to improve their coordination with their teammates and increase team performance. There are finite training instances, where each training instance is an opportunity for the learning agents to improve their coordination. In this article, we focus on allocating training instances to learning agent pairs, i.e., pairs that improve coordination with each other, with the goal of team formation. Agents learn at different rates, and hence, the allocation of training instances affects the performance of the team formed. We build upon previous work on the Synergy Graph model, that is learned completely from data and represents agents’ capabilities and compatibility in a multi-agent team. We formally define the learning agents team formation problem, and compare it with the multi-armed bandit problem. We consider learning agent pairs that improve linearly and geometrically, i.e., the marginal improvement decreases by a constant factor. We contribute algorithms that allocate the training instances, and compare against algorithms from the multi-armed bandit problem. In our simulations, we demonstrate that our algorithms perform similarly to the bandit algorithms in the linear case, and outperform them in the geometric case. Further, we apply our model and algorithms to a multi-agent foraging problem, thus demonstrating the efficacy of our algorithms in general multi-agent problems.  相似文献   

2.
Cooperative Multi-Agent Learning: The State of the Art   总被引:5,自引:4,他引:1  
Cooperative multi-agent systems (MAS) are ones in which several agents attempt, through their interaction, to jointly solve tasks or to maximize utility. Due to the interactions among the agents, multi-agent problem complexity can rise rapidly with the number of agents or their behavioral sophistication. The challenge this presents to the task of programming solutions to MAS problems has spawned increasing interest in machine learning techniques to automate the search and optimization process. We provide a broad survey of the cooperative multi-agent learning literature. Previous surveys of this area have largely focused on issues common to specific subareas (for example, reinforcement learning, RL or robotics). In this survey we attempt to draw from multi-agent learning work in a spectrum of areas, including RL, evolutionary computation, game theory, complex systems, agent modeling, and robotics. We find that this broad view leads to a division of the work into two categories, each with its own special issues: applying a single learner to discover joint solutions to multi-agent problems (team learning), or using multiple simultaneous learners, often one per agent (concurrent learning). Additionally, we discuss direct and indirect communication in connection with learning, plus open issues in task decomposition, scalability, and adaptive dynamics. We conclude with a presentation of multi-agent learning problem domains, and a list of multi-agent learning resources.  相似文献   

3.
近年来,强化学习与自适应动态规划算法的迅猛发展及其在一系列挑战性问题(如大规模多智能体系统优化决策和最优协调控制问题)中的成功应用,使其逐渐成为人工智能、系统与控制和应用数学等领域的研究热点.鉴于此,首先简要介绍强化学习和自适应动态规划算法的基础知识和核心思想,在此基础上综述两类密切相关的算法在不同研究领域的发展历程,着重介绍其从应用于单个智能体(控制对象)序贯决策(最优控制)问题到多智能体系统序贯决策(最优协调控制)问题的发展脉络和研究进展.进一步,在简要介绍自适应动态规划算法的结构变化历程和由基于模型的离线规划到无模型的在线学习发展演进的基础上,综述自适应动态规划算法在多智能体系统最优协调控制问题中的研究进展.最后,给出多智能体强化学习算法和利用自适应动态规划求解多智能体系统最优协调控制问题研究中值得关注的一些挑战性课题.  相似文献   

4.
In this paper, we investigate Reinforcement learning (RL) in multi-agent systems (MAS) from an evolutionary dynamical perspective. Typical for a MAS is that the environment is not stationary and the Markov property is not valid. This requires agents to be adaptive. RL is a natural approach to model the learning of individual agents. These Learning algorithms are however known to be sensitive to the correct choice of parameter settings for single agent systems. This issue is more prevalent in the MAS case due to the changing interactions amongst the agents. It is largely an open question for a developer of MAS of how to design the individual agents such that, through learning, the agents as a collective arrive at good solutions. We will show that modeling RL in MAS, by taking an evolutionary game theoretic point of view, is a new and potentially successful way to guide learning agents to the most suitable solution for their task at hand. We show how evolutionary dynamics (ED) from Evolutionary Game Theory can help the developer of a MAS in good choices of parameter settings of the used RL algorithms. The ED essentially predict the equilibriums outcomes of the MAS where the agents use individual RL algorithms. More specifically, we show how the ED predict the learning trajectories of Q-Learners for iterated games. Moreover, we apply our results to (an extension of) the COllective INtelligence framework (COIN). COIN is a proved engineering approach for learning of cooperative tasks in MASs. The utilities of the agents are re-engineered to contribute to the global utility. We show how the improved results for MAS RL in COIN, and a developed extension, are predicted by the ED. Author funded by a doctoral grant of the institute for advancement of scientific technological research in Flanders (IWT).  相似文献   

5.
In this work we investigate the use of a reinforcement learning (RL) framework for the autonomous navigation of a group of mini-robots in a multi-agent collaborative environment. Each mini-robot is driven by inertial forces provided by two vibration motors that are controlled by a simple and efficient low-level speed controller. The action of the RL agent is the direction of each mini-robot, and it is based on the position of each mini-robot, the distance between them and the sign of the distance gradient between each mini-robot and the nearest one. Each mini-robot is considered a moving obstacle that must be avoided by the others. We propose suitable state space and reward function that result in an efficient collaborative RL framework. The classical and the double Q-learning algorithms are employed, where the latter is considered to learn optimal policies of mini-robots that offers more stable and reliable learning process. A simulation environment is created, using the ROS framework, that include a group of four mini-robots. The dynamic model of each mini-robot and of the vibration motors is also included. Several application scenarios are simulated and the results are presented to demonstrate the performance of the proposed approach.  相似文献   

6.
Explanation-Based Learning and Reinforcement Learning: A Unified View   总被引:3,自引:0,他引:3  
  相似文献   

7.
We address an unrelated parallel machine scheduling problem with R-learning, an average-reward reinforcement learning (RL) method. Different types of jobs dynamically arrive in independent Poisson processes. Thus the arrival time and the due date of each job are stochastic. We convert the scheduling problems into RL problems by constructing elaborate state features, actions, and the reward function. The state features and actions are defined fully utilizing prior domain knowledge. Minimizing the reward per decision time step is equivalent to minimizing the schedule objective, i.e. mean weighted tardiness. We apply an on-line R-learning algorithm with function approximation to solve the RL problems. Computational experiments demonstrate that R-learning learns an optimal or near-optimal policy in a dynamic environment from experience and outperforms four effective heuristic priority rules (i.e. WSPT, WMDD, ATC and WCOVERT) in all test problems.  相似文献   

8.
An important application of reinforcement learning (RL) is to finite-state control problems and one of the most difficult problems in learning for control is balancing the exploration/exploitation tradeoff. Existing theoretical results for RL give very little guidance on reasonable ways to perform exploration. In this paper, we examine the convergence of single-step on-policy RL algorithms for control. On-policy algorithms cannot separate exploration from learning and therefore must confront the exploration problem directly. We prove convergence results for several related on-policy algorithms with both decaying exploration and persistent exploration. We also provide examples of exploration strategies that can be followed during learning that result in convergence to both optimal values and optimal policies.  相似文献   

9.
This paper introduces a new multi-agent model for intelligent agents, called reinforcement learning hierarchical neuro-fuzzy multi-agent system. This class of model uses a hierarchical partitioning of the input space with a reinforcement learning algorithm to overcome limitations of previous RL methods. The main contribution of the new system is to provide a flexible and generic model for multi-agent environments. The proposed generic model can be used in several applications, including competitive and cooperative problems, with the autonomous capacity to create fuzzy rules and expand their own rule structures, extracting knowledge from the direct interaction between the agents and the environment, without any use of supervised algorithms. The proposed model was tested in three different case studies, with promising results. The tests demonstrated that the developed system attained good capacity of convergence and coordination among the autonomous intelligent agents.  相似文献   

10.
In this paper, we investigate the use of hierarchical reinforcement learning (HRL) to speed up the acquisition of cooperative multi-agent tasks. We introduce a hierarchical multi-agent reinforcement learning (RL) framework, and propose a hierarchical multi-agent RL algorithm called Cooperative HRL. In this framework, agents are cooperative and homogeneous (use the same task decomposition). Learning is decentralized, with each agent learning three interrelated skills: how to perform each individual subtask, the order in which to carry them out, and how to coordinate with other agents. We define cooperative subtasks to be those subtasks in which coordination among agents significantly improves the performance of the overall task. Those levels of the hierarchy which include cooperative subtasks are called cooperation levels. A fundamental property of the proposed approach is that it allows agents to learn coordination faster by sharing information at the level of cooperative subtasks, rather than attempting to learn coordination at the level of primitive actions. We study the empirical performance of the Cooperative HRL algorithm using two testbeds: a simulated two-robot trash collection task, and a larger four-agent automated guided vehicle (AGV) scheduling problem. We compare the performance and speed of Cooperative HRL with other learning algorithms, as well as several well-known industrial AGV heuristics. We also address the issue of rational communication behavior among autonomous agents in this paper. The goal is for agents to learn both action and communication policies that together optimize the task given a communication cost. We extend the multi-agent HRL framework to include communication decisions and propose a cooperative multi-agent HRL algorithm called COM-Cooperative HRL. In this algorithm, we add a communication level to the hierarchical decomposition of the problem below each cooperation level. Before an agent makes a decision at a cooperative subtask, it decides if it is worthwhile to perform a communication action. A communication action has a certain cost and provides the agent with the actions selected by the other agents at a cooperation level. We demonstrate the efficiency of the COM-Cooperative HRL algorithm as well as the relation between the communication cost and the learned communication policy using a multi-agent taxi problem.  相似文献   

11.
The adaptive critic heuristic has been a popular algorithm in reinforcement learning(RL) and approximate dynamic programming(ADP) alike.It is one of the first RL and ADP algorithms.RL and ADP algorithms are particularly useful for solving Markov decision processes(MDPs) that suffer from the curses of dimensionality and modeling.Many real-world problems,however,tend to be semi-Markov decision processes(SMDPs) in which the time spent in each transition of the underlying Markov chains is itself a random variable.Unfortunately for the average reward case,unlike the discounted reward case,the MDP does not have an easy extension to the SMDP.Examples of SMDPs can be found in the area of supply chain management,maintenance management,and airline revenue management.In this paper,we propose an adaptive critic heuristic for the SMDP under the long-run average reward criterion.We present the convergence analysis of the algorithm which shows that under certain mild conditions,which can be ensured within a simulator,the algorithm converges to an optimal solution with probability 1.We test the algorithm extensively on a problem of airline revenue management in which the manager has to set prices for airline tickets over the booking horizon.The problem has a large scale,suffering from the curse of dimensionality,and hence it is difficult to solve it via classical methods of dynamic programming.Our numerical results are encouraging and show that the algorithm outperforms an existing heuristic used widely in the airline industry.  相似文献   

12.
针对强化学习在视觉语义导航任务中准确率低,导航效率不高,容错率太差,且部分只适用于单智能体等问题,提出一种基于场景先验的多智能体目标搜索算法。该算法利用强化学习,将单智能体系统拓展到多智能体系统上将场景图谱作为先验知识辅助智能体团队进行视觉探索,利用集中式训练分布式探索的多智能体强化学习的方法以大幅度提升智能体团队的准确率和工作效率。通过在AI2THOR中进行训练测试,并与其他算法进行对比证明此方法无论在目标搜索的准确率还是效率上都优先于其他算法。  相似文献   

13.
Sophisticated agents operating in open environments must make decisions that efficiently trade off the use of their limited resources between dynamic deliberative actions and domain actions. This is the meta-level control problem for agents operating in resource-bounded multi-agent environments. Control activities involve decisions on when to invoke and the amount to effort to put into scheduling and coordination of domain activities. The focus of this paper is how to make effective meta-level control decisions. We show that meta-level control with bounded computational overhead allows complex agents to solve problems more efficiently than current approaches in dynamic open multi-agent environments. The meta-level control approach that we present is based on the decision-theoretic use of an abstract representation of the agent state. This abstraction concisely captures critical information necessary for decision making while bounding the cost of meta-level control and is appropriate for use in automatically learning the meta-level control policies.  相似文献   

14.
As an important approach to solving complex sequential decision problems, reinforcement learning (RL) has been widely studied in the community of artificial intelligence and machine learning. However, the generalization ability of RL is still an open problem and it is difficult for existing RL algorithms to solve Markov decision problems (MDPs) with both continuous state and action spaces. In this paper, a novel RL approach with fast policy search and adaptive basis function selection, which is called Continuous-action Approximate Policy Iteration (CAPI), is proposed for RL in MDPs with both continuous state and action spaces. In CAPI, based on the value functions estimated by temporal-difference learning, a fast policy search technique is suggested to search for optimal actions in continuous spaces, which is computationally efficient and easy to implement. To improve the generalization ability and learning efficiency of CAPI, two adaptive basis function selection methods are developed so that sparse approximation of value functions can be obtained efficiently both for linear function approximators and kernel machines. Simulation results on benchmark learning control tasks with continuous state and action spaces show that the proposed approach not only can converge to a near-optimal policy in a few iterations but also can obtain comparable or even better performance than Sarsa-learning, and previous approximate policy iteration methods such as LSPI and KLSPI.  相似文献   

15.
作为机器学习和人工智能领域的一个重要分支,多智能体分层强化学习以一种通用的形式将多智能体的协作能力与强化学习的决策能力相结合,并通过将复杂的强化学习问题分解成若干个子问题并分别解决,可以有效解决空间维数灾难问题。这也使得多智能体分层强化学习成为解决大规模复杂背景下智能决策问题的一种潜在途径。首先对多智能体分层强化学习中涉及的主要技术进行阐述,包括强化学习、半马尔可夫决策过程和多智能体强化学习;然后基于分层的角度,对基于选项、基于分层抽象机、基于值函数分解和基于端到端等4种多智能体分层强化学习方法的算法原理和研究现状进行了综述;最后介绍了多智能体分层强化学习在机器人控制、博弈决策以及任务规划等领域的应用现状。  相似文献   

16.
强化学习是机器学习领域的研究热点, 是考察智能体与环境的相互作用, 做出序列决策、优化策略并最大化累积回报的过程. 强化学习具有巨大的研究价值和应用潜力, 是实现通用人工智能的关键步骤. 本文综述了强化学习算法与应用的研究进展和发展动态, 首先介绍强化学习的基本原理, 包括马尔可夫决策过程、价值函数、探索-利用问题. 其次, 回顾强化学习经典算法, 包括基于价值函数的强化学习算法、基于策略搜索的强化学习算法、结合价值函数和策略搜索的强化学习算法, 以及综述强化学习前沿研究, 主要介绍多智能体强化学习和元强化学习方向. 最后综述强化学习在游戏对抗、机器人控制、城市交通和商业等领域的成功应用, 以及总结与展望.  相似文献   

17.
Local strategy learning in networked multi-agent team formation   总被引:1,自引:0,他引:1  
Networked multi-agent systems are comprised of many autonomous yet interdependent agents situated in a virtual social network. Two examples of such systems are supply chain networks and sensor networks. A common challenge in many networked multi-agent systems is decentralized team formation among the spatially and logically extended agents. Even in cooperative multi-agent systems, efficient team formation is made difficult by the limited local information available to the individual agents. We present a model of distributed multi-agent team formation in networked multi-agent systems, describe a policy learning framework for joining teams based on local information, and give empirical results on improving team formation performance. In particular, we show that local policy learning from limited information leads to a significant increase in organizational team formation performance compared to a random policy.  相似文献   

18.
针对电梯群控调度过程中交通流和乘客目的楼层的不确定性,为克服集中式群控系统计算瓶颈问题,提出了基于目的层预约结合多智能体技术的电梯群控调度方法。首先,设计了该电梯群控调度系统的结构,分析了目的层预约的多目标评价函数;然后,提出基于模糊逻辑的电梯交通模式在线识别方案,实现实时地更新不同交通模式下对应的各控制目标的权系数;最后,结合多智能体技术具有良好的实时性和自主性的特点,将各轿厢运行控制器和群控制器映射为多个代理,通过各个代理之间协调多目标优化任务以得到平均候梯时间、平均乘梯时间及能源消耗整体最优的分布式电梯群控调度算法。  相似文献   

19.
激励学习的最优判据研究   总被引:8,自引:0,他引:8       下载免费PDF全文
激励学习智能体通过最优策略的学习与规划来求解序贯决策问题,因此如何定义策略的最优判所是激励学习研究的核心问题之一,本文讨论了一系列来自动态规划的最优判据,通过实例检验了各种判据对激励学习的适用性和优缺点,分析了设计各种判据的激励学习算法的必要性。  相似文献   

20.
The paper achieves two outcomes. First, it summarizes previous work on concurrent Markov decision processes (CMDPs) currently demonstrated for use with multi-agent foraging problems. When using CMDPs, each agent models the environment using two Markov decision process (MDP). The two MDPs characterize a multi-agent foraging problem by modeling both a single-agent foraging problem, and multi-agent task allocation problem, for each agent. Second, the paper studies the effects of state uncertainty on a heterogeneous robot team that utilizes the aforementioned CMDP modelling approach. Furthermore, the paper presents a method to maintain performance despite state uncertainty. The resulting robust concurrent individual and social learning (RCISL) mechanism leads to an enhanced team learning behaviour despite state uncertainty. The paper analyzes the performance of the concurrent individual and social learning mechanism with and without a particle filter for a heterogeneous foraging scenario. The RCISL mechanism confers statistically significant performance improvements over the CISL mechanism.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号