共查询到17条相似文献,搜索用时 62 毫秒
1.
本文针对多智能体强化学习中存在的通信和计算资源消耗大等问题,提出了一种基于事件驱动的多智能体强化学习算法,侧重于事件驱动在多智能体学习策略层方面的研究。在智能体与环境的交互过程中,算法基于事件驱动的思想,根据智能体观测信息的变化率设计触发函数,使学习过程中的通信和学习时机无需实时或按周期地进行,故在相同时间内可以降低数据传输和计算次数。另外,分析了该算法的计算资源消耗,以及对算法收敛性进行了论证。最后,仿真实验说明了该算法可以在学习过程中减少一定的通信次数和策略遍历次数,进而缓解了通信和计算资源消耗。 相似文献
2.
3.
4.
5.
6.
7.
近年来事件驱动控制发展迅速,并引起了多智能体系统领域研究者的极大关注.本文对基于事件驱动控制的多智能体系统的研究现状进行综述.从智能体动力学角度,分别对这个领域的一些代表性成果和研究方法进行了归纳总结.进一步,论述了边事件驱动控制策略下的多智能体系统的研究成果.随后,利用一类新型事件驱动控制来探讨多智能体系统的一致性问题.最后,给出了尚未解决的问题和未来值得关注的研究方向. 相似文献
8.
多机器人动态编队的强化学习算法研究 总被引:8,自引:0,他引:8
在人工智能领域中,强化学习理论由于其自学习性和自适应性的优点而得到了广泛关注.随着分布式人工智能中多智能体理论的不断发展,分布式强化学习算法逐渐成为研究的重点.首先介绍了强化学习的研究状况,然后以多机器人动态编队为研究模型,阐述应用分布式强化学习实现多机器人行为控制的方法.应用SOM神经网络对状态空间进行自主划分,以加快学习速度;应用BP神经网络实现强化学习,以增强系统的泛化能力;并且采用内、外两个强化信号兼顾机器人的个体利益及整体利益.为了明确控制任务,系统使用黑板通信方式进行分层控制.最后由仿真实验证明该方法的有效性. 相似文献
9.
多智能体深度强化学习是机器学习领域的一个新兴的研究热点和应用方向,涵盖众多算法、规则、框架,并广泛应用于自动驾驶、能源分配、编队控制、航迹规划、路由规划、社会难题等现实领域,具有极高的研究价值和意义。对多智能体深度强化学习的基本理论、发展历程进行简要的概念介绍;按照无关联型、通信规则型、互相合作型和建模学习型4种分类方式阐述了现有的经典算法;对多智能体深度强化学习算法的实际应用进行了综述,并简单罗列了多智能体深度强化学习的现有测试平台;总结了多智能体深度强化学习在理论、算法和应用方面面临的挑战和未来的发展方向。 相似文献
10.
针对一类具有任意初始状态的部分非正则多智能体系统,提出一种迭代学习控制算法.该算法将具有固定拓扑结构的多智能体编队控制问题转化为广义上的跟踪问题,即让领导者跟踪给定的期望轨迹,而跟随者要始终保持预定队形对某一智能体进行跟踪,并将该智能体作为自身的领导者.同时,为了使每个智能体在任意初始状态下都能按照期望队形进行编队,对每个智能体的初始状态设计迭代学习律,并从理论上对算法的收敛性进行严格证明,给出算法收敛的充分条件.所提出的算法对于各个智能体在任意初始位置条件下均能实现在有限时间区间内系统的稳定编队.最后,通过仿真算例进一步验证了所提出算法的有效性. 相似文献
11.
作为机器学习和人工智能领域的一个重要分支,多智能体分层强化学习以一种通用的形式将多智能体的协作能力与强化学习的决策能力相结合,并通过将复杂的强化学习问题分解成若干个子问题并分别解决,可以有效解决空间维数灾难问题。这也使得多智能体分层强化学习成为解决大规模复杂背景下智能决策问题的一种潜在途径。首先对多智能体分层强化学习中涉及的主要技术进行阐述,包括强化学习、半马尔可夫决策过程和多智能体强化学习;然后基于分层的角度,对基于选项、基于分层抽象机、基于值函数分解和基于端到端等4种多智能体分层强化学习方法的算法原理和研究现状进行了综述;最后介绍了多智能体分层强化学习在机器人控制、博弈决策以及任务规划等领域的应用现状。 相似文献
12.
We consider the average consensus problem for the multi-agent system in the discrete-time domain. Three triggering based control protocols are developed, which dictate the broadcast and control update instants of individual agents to alleviate communication and computational burden. Lyapunov-based design methods prescribe when agents should communicate and update their control so that the network converges to the average of agents' initial states. We start with a static version of the distributed event-triggering law and then generalize it so that it involves an internal auxiliary variable to regulate the threshold dynamically for each agent. The third protocol uses a self-triggering algorithm to avoid continuous listening wherein each agent estimates its next triggering time and broadcasts it to its neighbors at the current triggering time. Numerical simulations are shown to validate the efficacy of the proposed algorithms. 相似文献
13.
This article delves into the bearing-only formation tracking control problem of nonlinear multi-agent systems with unknown disturbances and unmodeled dynamics, wherein the distributed control solely utilizes the relative bearing information of its neighbors. Firstly, a disturbance observer combined with a neural network is proposed to eliminate the impact of unknown disturbances and unmodeled dynamics. Additionally, a pioneering event-triggered based backstepping control approach is put forth for the formation tracking control problem of nonlinear multi-agent systems, which economizes on communication bandwidth and computing resources by decreasing the update frequency of the controller. Finally, using the Lyapunov method, it is demonstrated that all signals of the control system are bounded, and the convergence errors are confined to a small neighborhood of the origin while excluding the “Zeno behavior” phenomenon through rigorous proof. 相似文献
14.
15.
Mohammad Ghavamzadeh Sridhar Mahadevan Rajbala Makar 《Autonomous Agents and Multi-Agent Systems》2006,13(2):197-229
In this paper, we investigate the use of hierarchical reinforcement learning (HRL) to speed up the acquisition of cooperative
multi-agent tasks. We introduce a hierarchical multi-agent reinforcement learning (RL) framework, and propose a hierarchical
multi-agent RL algorithm called Cooperative HRL. In this framework, agents are cooperative and homogeneous (use the same task decomposition). Learning is decentralized,
with each agent learning three interrelated skills: how to perform each individual subtask, the order in which to carry them
out, and how to coordinate with other agents. We define cooperative subtasks to be those subtasks in which coordination among agents significantly improves the performance of the overall task. Those
levels of the hierarchy which include cooperative subtasks are called cooperation levels. A fundamental property of the proposed approach is that it allows agents to learn coordination faster by sharing information
at the level of cooperative subtasks, rather than attempting to learn coordination at the level of primitive actions. We study the empirical performance of the
Cooperative HRL algorithm using two testbeds: a simulated two-robot trash collection task, and a larger four-agent automated guided vehicle
(AGV) scheduling problem. We compare the performance and speed of Cooperative HRL with other learning algorithms, as well as several well-known industrial AGV heuristics. We also address the issue of rational
communication behavior among autonomous agents in this paper. The goal is for agents to learn both action and communication
policies that together optimize the task given a communication cost. We extend the multi-agent HRL framework to include communication
decisions and propose a cooperative multi-agent HRL algorithm called COM-Cooperative HRL. In this algorithm, we add a communication level to the hierarchical decomposition of the problem below each cooperation level. Before an agent makes a decision at a cooperative subtask, it decides if it is worthwhile to perform a communication action. A communication action has a certain cost and provides
the agent with the actions selected by the other agents at a cooperation level. We demonstrate the efficiency of the COM-Cooperative HRL algorithm as well as the relation between the communication cost and the learned communication policy using a multi-agent
taxi problem. 相似文献
16.
多智能体系统在自动驾驶、智能物流、医疗协同等多个领域中广泛应用,然而由于技术进步和系统需求的增加,这些系统面临着规模庞大、复杂度高等挑战,常出现训练效率低和适应能力差等问题。为了解决这些问题,将基于梯度的元学习方法扩展到多智能体深度强化学习中,提出一种名为多智能体一阶元近端策略优化(MAMPPO)方法,用于学习多智能体系统的初始模型参数,从而为提高多智能体深度强化学习的性能提供新的视角。该方法充分利用多智能体强化学习过程中的经验数据,通过反复适应找到在梯度下降方向上最敏感的参数并学习初始参数,使模型训练从最佳起点开始,有效提高了联合策略的决策效率,显著加快了策略变化的速度,面对新情况的适应速度显著加快。在星际争霸II上的实验结果表明,MAMPPO方法显著提高了训练速度和适应能力,为后续提高多智能强化学习的训练效率和适应能力提供了一种新的解决方法。 相似文献
17.
This paper studies event-triggered containment control problem of multi-agent systems (MASs) under deception attacks and denial-of-service (DoS) attacks. First, to save limited network resources, an event-triggered mechanism is proposed for MASs under hybrid cyber attacks. Different from the existing event-triggered mechanisms, the negative influences of deception attacks and DoS attacks are considered in the proposed triggering function. The communication frequencies between agents are reduced. Then, based on the proposed event-triggered mechanism, a corresponding control protocol is proposed to ensure that the followers will converge to the convex hull formed by the leaders under deception attacks and DoS attacks. Compared with the previous researches about containment control, in addition to hybrid cyber attacks being considered, the nonlinear functions related to the states of the agents are applied to describe the deception attack signals in the MAS. By orthogonal transformation of deception attack signals, the containment control problem under deception attacks and DoS attacks is reformulated as a stability problem. Then, the sufficient conditions on containment control can be obtained. Finally, a set of simulation example is used to verify the effectiveness of the proposed method. 相似文献