首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到17条相似文献,搜索用时 62 毫秒
1.
本文针对多智能体强化学习中存在的通信和计算资源消耗大等问题,提出了一种基于事件驱动的多智能体强化学习算法,侧重于事件驱动在多智能体学习策略层方面的研究。在智能体与环境的交互过程中,算法基于事件驱动的思想,根据智能体观测信息的变化率设计触发函数,使学习过程中的通信和学习时机无需实时或按周期地进行,故在相同时间内可以降低数据传输和计算次数。另外,分析了该算法的计算资源消耗,以及对算法收敛性进行了论证。最后,仿真实验说明了该算法可以在学习过程中减少一定的通信次数和策略遍历次数,进而缓解了通信和计算资源消耗。  相似文献   

2.
针对一阶具有通信时滞的多智能体系统环形编队存在通信和计算资源消耗大的问题,引入事件驱动控制机制,综合设计适用于任意环形编队的控制律分别耦合状态相关和状态无关两类事件驱动条件。事件触发函数基于状态误差建立,使得个体之间的信息通信和控制信号更新仅在事件触发时刻进行。从理论上严格证明了系统在控制律作用下的收敛性,并通过数值仿真验证了控制算法的有效性。仿真结果也表明,在获得系统期望性能的前提下,降低了控制器输入的更新频率和减少了智能体的资源消耗。  相似文献   

3.
4.
强化学习研究综述   总被引:8,自引:2,他引:8  
在未知环境中,关于agent的学习行为是一个既充满挑战又有趣的问题,强化学习通过试探与环境交互获得策略的改进,其学习和在线学习的特点使其成为机器学习研究的一个重要分支。介绍了强化学习在理论、算法和应用研究三个方面最新的研究成果,首先介绍了强化学习的环境模型和其基本要素;其次介绍了强化学习算法的收敛性和泛化有关的理论研究问题;然后结合最近几年的研究成果,综述了折扣型回报指标和平均回报指标强化学习算法;最后列举了强化学习在非线性控制、机器人控制、人工智能问题求解、多agent 系统问题等若干领域的成功应用和未来的发展方向。  相似文献   

5.
多智能体强化学习综述   总被引:1,自引:0,他引:1  
  相似文献   

6.
研究了多智能体系统中最优持久编队生成算法,并根据对应的通信拓扑设计了最优持久编队的运动控制算法.首先,提出了基础圈概念,通过有向增加顶点操作,研究了基础圈为三角形或包含部分四边形的最优持久编队的分布式生成算法;在此基础上,考虑到持久编队中单向通信邻居的状态信息,设计了基于距离的最优持久编队运动控制算法.最后,仿真研究验证了所提算法的有效性.  相似文献   

7.
近年来事件驱动控制发展迅速,并引起了多智能体系统领域研究者的极大关注.本文对基于事件驱动控制的多智能体系统的研究现状进行综述.从智能体动力学角度,分别对这个领域的一些代表性成果和研究方法进行了归纳总结.进一步,论述了边事件驱动控制策略下的多智能体系统的研究成果.随后,利用一类新型事件驱动控制来探讨多智能体系统的一致性问题.最后,给出了尚未解决的问题和未来值得关注的研究方向.  相似文献   

8.
多机器人动态编队的强化学习算法研究   总被引:8,自引:0,他引:8  
在人工智能领域中,强化学习理论由于其自学习性和自适应性的优点而得到了广泛关注.随着分布式人工智能中多智能体理论的不断发展,分布式强化学习算法逐渐成为研究的重点.首先介绍了强化学习的研究状况,然后以多机器人动态编队为研究模型,阐述应用分布式强化学习实现多机器人行为控制的方法.应用SOM神经网络对状态空间进行自主划分,以加快学习速度;应用BP神经网络实现强化学习,以增强系统的泛化能力;并且采用内、外两个强化信号兼顾机器人的个体利益及整体利益.为了明确控制任务,系统使用黑板通信方式进行分层控制.最后由仿真实验证明该方法的有效性.  相似文献   

9.
多智能体深度强化学习研究综述   总被引:1,自引:0,他引:1       下载免费PDF全文
多智能体深度强化学习是机器学习领域的一个新兴的研究热点和应用方向,涵盖众多算法、规则、框架,并广泛应用于自动驾驶、能源分配、编队控制、航迹规划、路由规划、社会难题等现实领域,具有极高的研究价值和意义。对多智能体深度强化学习的基本理论、发展历程进行简要的概念介绍;按照无关联型、通信规则型、互相合作型和建模学习型4种分类方式阐述了现有的经典算法;对多智能体深度强化学习算法的实际应用进行了综述,并简单罗列了多智能体深度强化学习的现有测试平台;总结了多智能体深度强化学习在理论、算法和应用方面面临的挑战和未来的发展方向。  相似文献   

10.
曹伟  孙明 《控制与决策》2018,33(9):1619-1624
针对一类具有任意初始状态的部分非正则多智能体系统,提出一种迭代学习控制算法.该算法将具有固定拓扑结构的多智能体编队控制问题转化为广义上的跟踪问题,即让领导者跟踪给定的期望轨迹,而跟随者要始终保持预定队形对某一智能体进行跟踪,并将该智能体作为自身的领导者.同时,为了使每个智能体在任意初始状态下都能按照期望队形进行编队,对每个智能体的初始状态设计迭代学习律,并从理论上对算法的收敛性进行严格证明,给出算法收敛的充分条件.所提出的算法对于各个智能体在任意初始位置条件下均能实现在有限时间区间内系统的稳定编队.最后,通过仿真算例进一步验证了所提出算法的有效性.  相似文献   

11.
作为机器学习和人工智能领域的一个重要分支,多智能体分层强化学习以一种通用的形式将多智能体的协作能力与强化学习的决策能力相结合,并通过将复杂的强化学习问题分解成若干个子问题并分别解决,可以有效解决空间维数灾难问题。这也使得多智能体分层强化学习成为解决大规模复杂背景下智能决策问题的一种潜在途径。首先对多智能体分层强化学习中涉及的主要技术进行阐述,包括强化学习、半马尔可夫决策过程和多智能体强化学习;然后基于分层的角度,对基于选项、基于分层抽象机、基于值函数分解和基于端到端等4种多智能体分层强化学习方法的算法原理和研究现状进行了综述;最后介绍了多智能体分层强化学习在机器人控制、博弈决策以及任务规划等领域的应用现状。  相似文献   

12.
We consider the average consensus problem for the multi-agent system in the discrete-time domain. Three triggering based control protocols are developed, which dictate the broadcast and control update instants of individual agents to alleviate communication and computational burden. Lyapunov-based design methods prescribe when agents should communicate and update their control so that the network converges to the average of agents' initial states. We start with a static version of the distributed event-triggering law and then generalize it so that it involves an internal auxiliary variable to regulate the threshold dynamically for each agent. The third protocol uses a self-triggering algorithm to avoid continuous listening wherein each agent estimates its next triggering time and broadcasts it to its neighbors at the current triggering time. Numerical simulations are shown to validate the efficacy of the proposed algorithms.  相似文献   

13.
This article delves into the bearing-only formation tracking control problem of nonlinear multi-agent systems with unknown disturbances and unmodeled dynamics, wherein the distributed control solely utilizes the relative bearing information of its neighbors. Firstly, a disturbance observer combined with a neural network is proposed to eliminate the impact of unknown disturbances and unmodeled dynamics. Additionally, a pioneering event-triggered based backstepping control approach is put forth for the formation tracking control problem of nonlinear multi-agent systems, which economizes on communication bandwidth and computing resources by decreasing the update frequency of the controller. Finally, using the Lyapunov method, it is demonstrated that all signals of the control system are bounded, and the convergence errors are confined to a small neighborhood of the origin while excluding the “Zeno behavior” phenomenon through rigorous proof.  相似文献   

14.
研究二阶多智能体系统在固定有向拓扑下的一致性问题。为减少不必要的网络带宽资源的浪费,给出一种基于事件触发控制的一致性算法。该算法基于状态误差对系统中的所有个体建立事件触发函数,使得个体之间的信息通讯和控制信号更新仅在事件触发时刻进行。采用矩阵理论和模型变换思想对系统进行了分析和转化,并利用Lyapunov理论给出了系统达到渐近一致的充分条件。仿真结果验证了理论方案的有效性。  相似文献   

15.
In this paper, we investigate the use of hierarchical reinforcement learning (HRL) to speed up the acquisition of cooperative multi-agent tasks. We introduce a hierarchical multi-agent reinforcement learning (RL) framework, and propose a hierarchical multi-agent RL algorithm called Cooperative HRL. In this framework, agents are cooperative and homogeneous (use the same task decomposition). Learning is decentralized, with each agent learning three interrelated skills: how to perform each individual subtask, the order in which to carry them out, and how to coordinate with other agents. We define cooperative subtasks to be those subtasks in which coordination among agents significantly improves the performance of the overall task. Those levels of the hierarchy which include cooperative subtasks are called cooperation levels. A fundamental property of the proposed approach is that it allows agents to learn coordination faster by sharing information at the level of cooperative subtasks, rather than attempting to learn coordination at the level of primitive actions. We study the empirical performance of the Cooperative HRL algorithm using two testbeds: a simulated two-robot trash collection task, and a larger four-agent automated guided vehicle (AGV) scheduling problem. We compare the performance and speed of Cooperative HRL with other learning algorithms, as well as several well-known industrial AGV heuristics. We also address the issue of rational communication behavior among autonomous agents in this paper. The goal is for agents to learn both action and communication policies that together optimize the task given a communication cost. We extend the multi-agent HRL framework to include communication decisions and propose a cooperative multi-agent HRL algorithm called COM-Cooperative HRL. In this algorithm, we add a communication level to the hierarchical decomposition of the problem below each cooperation level. Before an agent makes a decision at a cooperative subtask, it decides if it is worthwhile to perform a communication action. A communication action has a certain cost and provides the agent with the actions selected by the other agents at a cooperation level. We demonstrate the efficiency of the COM-Cooperative HRL algorithm as well as the relation between the communication cost and the learned communication policy using a multi-agent taxi problem.  相似文献   

16.
多智能体系统在自动驾驶、智能物流、医疗协同等多个领域中广泛应用,然而由于技术进步和系统需求的增加,这些系统面临着规模庞大、复杂度高等挑战,常出现训练效率低和适应能力差等问题。为了解决这些问题,将基于梯度的元学习方法扩展到多智能体深度强化学习中,提出一种名为多智能体一阶元近端策略优化(MAMPPO)方法,用于学习多智能体系统的初始模型参数,从而为提高多智能体深度强化学习的性能提供新的视角。该方法充分利用多智能体强化学习过程中的经验数据,通过反复适应找到在梯度下降方向上最敏感的参数并学习初始参数,使模型训练从最佳起点开始,有效提高了联合策略的决策效率,显著加快了策略变化的速度,面对新情况的适应速度显著加快。在星际争霸II上的实验结果表明,MAMPPO方法显著提高了训练速度和适应能力,为后续提高多智能强化学习的训练效率和适应能力提供了一种新的解决方法。  相似文献   

17.
This paper studies event-triggered containment control problem of multi-agent systems (MASs) under deception attacks and denial-of-service (DoS) attacks. First, to save limited network resources, an event-triggered mechanism is proposed for MASs under hybrid cyber attacks. Different from the existing event-triggered mechanisms, the negative influences of deception attacks and DoS attacks are considered in the proposed triggering function. The communication frequencies between agents are reduced. Then, based on the proposed event-triggered mechanism, a corresponding control protocol is proposed to ensure that the followers will converge to the convex hull formed by the leaders under deception attacks and DoS attacks. Compared with the previous researches about containment control, in addition to hybrid cyber attacks being considered, the nonlinear functions related to the states of the agents are applied to describe the deception attack signals in the MAS. By orthogonal transformation of deception attack signals, the containment control problem under deception attacks and DoS attacks is reformulated as a stability problem. Then, the sufficient conditions on containment control can be obtained. Finally, a set of simulation example is used to verify the effectiveness of the proposed method.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号