共查询到19条相似文献,搜索用时 62 毫秒
1.
针对多Agent协作强化学习中存在的行为和状态维数灾问题,以及行为选择上存在多个均衡解,为了收敛到最佳均衡解需要搜索策略空间和协调策略选择问题,提出了一种新颖的基于量子理论和蚁群算法的多Agent协作学习算法。新算法首先借签了量子计算理论,将多Agent的行为和状态空间通过量子叠加态表示,利用量子纠缠态来协调策略选择,利用概率振幅进行动作探索,加快学习速度。其次,根据蚁群算法,提出“脚印”思想来间接增强Agent之间的交互。最后,对新算法的理论分析和实验结果都证明了改进的Q学习是可行的,并且可以有效地提高学习效率。 相似文献
2.
研究了多Agent环境下的协作与学习.对多Agent系统中的协作问题提出了协作模型MACM,该模型通过提供灵活协调机制支持多Agent之间的协作及协作过程中的学习.系统中的学习Agent采用分布式强化学习算法.该算法通过映射减少Q值表的存储空间,降低对系统资源的要求,同时能够保证收敛到最优解. 相似文献
3.
多Agent协作的强化学习模型和算法 总被引:2,自引:0,他引:2
结合强化学习技术讨论了多Agent协作学习的过程,构造了一个新的多Agent协作学习模型。在这个模型的基础上,提出一个多Agent协作学习算法。算法充分考虑了多Agent共同学习的特点,使得Agent基于对动作长期利益的估计来预测其动作策略,并做出相应的决策,进而达成最优的联合动作策略。最后,通过对猎人。猎物追逐问题的仿真试验验证了该算法的收敛性,表明这种学习算法是一种高效、快速的学习方法。 相似文献
4.
多Agent协作追捕问题是多Agent协调与协作研究中的一个典型问题。针对具有学习能力的单逃跑者追捕问题,提出了一种基于博弈论及Q学习的多Agent协作追捕算法。首先,建立协作追捕团队,并构建协作追捕的博弈模型;其次,通过对逃跑者策略选择的学习,建立逃跑者有限的Step-T累积奖赏的运动轨迹,并把运动轨迹调整到追捕者的策略集中;最后,求解协作追捕博弈得到Nash均衡解,每个Agent执行均衡策略完成追捕任务。同时,针对在求解中可能存在多个均衡解的问题,加入了虚拟行动行为选择算法来选择最优的均衡策略。C#仿真实验表明,所提算法能够有效地解决障碍环境中单个具有学习能力的逃跑者的追捕问题,实验数据对比分析表明该算法在同等条件下的追捕效率要优于纯博弈或纯学习的追捕算法。 相似文献
5.
6.
多Agent协作求解是分布式人工智能要研究的基本问题。该文基于管理agent概念提出一个新的协作模型,该模型利用管理Agent对多Agent系统进行全局协作分配,协作申请分级处理,解决了传统协作模型中存在的模型与应用领域有关和只适用于静态环境的问题。 相似文献
7.
基于强化学习的多Agent协作研究 总被引:2,自引:0,他引:2
强化学习为多Agent之间的协作提供了鲁棒的学习方法.本文首先介绍了强化学习的原理和组成要素,其次描述了多Agent马尔可夫决策过程MMDP,并给出了Agent强化学习模型.在此基础上,对多Agent协作过程中存在的两种强化学习方式:IL(独立学习)和JAL(联合动作学习)进行了比较.最后分析了在有多个最优策略存在的情况下,协作多Agent系统常用的几种协调机制. 相似文献
8.
一种基于Agent网络的多Agent系统的协作组织方式 总被引:1,自引:0,他引:1
对分布式多Agent系统中的Agent组织方式,协作机制进行了讨论,提出了根据Agent间的物理位置和通信代价关系建立Agent网络,通过Agent网络模型,模仿计算机网络中的动态路由机制,进行Agent协作组织的方法. 相似文献
9.
10.
为了在协作学习系统中实现学习者Agent之间的有效合作,通过引入一种新的合作机制--同学关系网模型(Schoolmate Relation Web Model),来构建学习系统中学习者Agent之间的同学联盟,并且基于学习者Agent之间的同学联盟来实现多个学习者Agent之间的协作学习.在每个同学联盟中,任意两个Agent之间都具有同学关系,并且联盟中的所有Agent相互协作,共同完成学习任务.另外,联盟中的学习者Agent之间的通信不是直接进行的,而是通过一个黑板来进行,这可以显著地提高Agent之间的通信效率.由于同学关系网模型可以避免Agent联盟形成的盲目性,并且可以提高学习者Agent之间的交互效率,从而使得我们基于Agent同学联盟的协作学习系统可以实现学习者Agent之间的有效合作,弥补了现有协作学习系统的不足. 相似文献
11.
针对路口交通拥堵现象,结合雾计算和强化学习理论,提出了一种FRTL(fog reinforcement traffic light)交通灯控制模型,该模型根据实时的交通流信息进行交通灯智能协同控制。雾节点将收集到的实时交通流信息上传到雾服务器,雾服务器在雾平台实现信息共享,雾平台结合处理后的共享数据和Q学习制定交通灯控制算法。算法利用检测到的实时交通数据计算出合适的交通灯配时方案,最终应用到交通灯上。仿真结果表明,与传统的分时段控制方式和主干道控制方式(ATL)相比,FRTL控制方法提高了路口的吞吐量,减少了车辆平均等待时间,达到了合理调控红绿灯时间、缓解交通拥堵的目标。 相似文献
12.
In this work, we present an optimal cooperative control scheme for a multi-agent system in an unknown dynamic obstacle environment, based on an improved distributed cooperative reinforcement learning (RL) strategy with a three-layer collaborative mechanism. The three collaborative layers are collaborative perception layer, collaborative control layer, and collaborative evaluation layer. The incorporation of collaborative perception expands the perception range of a single agent, and improves the early warning ability of the agents for the obstacles. Neural networks (NNs) are employed to approximate the cost function and the optimal controller of each agent, where the NN weight matrices are collaboratively optimized to achieve global optimal performance. The distinction of the proposed control strategy is that cooperation of the agents is embodied not only in the input of NNs (in a collaborative perception layer) but also in their weight updating procedure (in the collaborative evaluation and collaborative control layers). Comparative simulations are carried out to demonstrate the effectiveness and performance of the proposed RL-based cooperative control scheme. 相似文献
13.
14.
Mohammad Ghavamzadeh Sridhar Mahadevan Rajbala Makar 《Autonomous Agents and Multi-Agent Systems》2006,13(2):197-229
In this paper, we investigate the use of hierarchical reinforcement learning (HRL) to speed up the acquisition of cooperative
multi-agent tasks. We introduce a hierarchical multi-agent reinforcement learning (RL) framework, and propose a hierarchical
multi-agent RL algorithm called Cooperative HRL. In this framework, agents are cooperative and homogeneous (use the same task decomposition). Learning is decentralized,
with each agent learning three interrelated skills: how to perform each individual subtask, the order in which to carry them
out, and how to coordinate with other agents. We define cooperative subtasks to be those subtasks in which coordination among agents significantly improves the performance of the overall task. Those
levels of the hierarchy which include cooperative subtasks are called cooperation levels. A fundamental property of the proposed approach is that it allows agents to learn coordination faster by sharing information
at the level of cooperative subtasks, rather than attempting to learn coordination at the level of primitive actions. We study the empirical performance of the
Cooperative HRL algorithm using two testbeds: a simulated two-robot trash collection task, and a larger four-agent automated guided vehicle
(AGV) scheduling problem. We compare the performance and speed of Cooperative HRL with other learning algorithms, as well as several well-known industrial AGV heuristics. We also address the issue of rational
communication behavior among autonomous agents in this paper. The goal is for agents to learn both action and communication
policies that together optimize the task given a communication cost. We extend the multi-agent HRL framework to include communication
decisions and propose a cooperative multi-agent HRL algorithm called COM-Cooperative HRL. In this algorithm, we add a communication level to the hierarchical decomposition of the problem below each cooperation level. Before an agent makes a decision at a cooperative subtask, it decides if it is worthwhile to perform a communication action. A communication action has a certain cost and provides
the agent with the actions selected by the other agents at a cooperation level. We demonstrate the efficiency of the COM-Cooperative HRL algorithm as well as the relation between the communication cost and the learned communication policy using a multi-agent
taxi problem. 相似文献
15.
针对多智能体系统中联合动作空间随智能体数量的增加而产生的指数爆炸的问题,采用"中心训练-分散执行"的框架来避免联合动作空间的维数灾难并降低算法的优化代价.针对在众多的多智能体强化学习场景下,环境仅给出所有智能体的联合行为所对应的全局奖励这一问题,提出一种新的全局信用分配机制——奖励高速路网络(RHWNet).通过在原有... 相似文献
16.
常晓军 《计算机工程与应用》2011,47(23):212-216
在传统Q学习算法基础上引入多智能体系统,提出了多智能体联合Q学习算法。该算法是在同一评价函数下进行多智能体的学习,并且学习过程考虑了参与协作的所有智能体的学习结果。在RoboCup-2D足球仿真比赛中通过引入球场状态分解法减少了状态分量,采用联合学习得到的最优状态作为多智能体协作的最优动作组,有效解决了仿真中各智能体之间的传球策略及其协作问题,仿真和实验结果证明了算法的有效性和可靠性。 相似文献
17.
基于人工神经网络的多机器人协作学习研究 总被引:5,自引:0,他引:5
机器人足球比赛是一个有趣并且复杂的新兴的人工智能研究领域,它是一个典型的多智能体系统。文中主要研究机器人足球比赛中的协作行为的学习问题,采用人工神经网络算法实现了两个足球机器人的传球学习,实验结果表明了该方法的有效性。最后讨论了对BP算法的诸多改进方法。 相似文献
18.
19.
This paper relieves the ‘curse of dimensionality’ problem, which becomes intractable when scaling reinforcement learning to multi-agent systems. This problem is aggravated exponentially as the number of agents increases, resulting in large memory requirement and slowness in learning speed. For cooperative systems which widely exist in multi-agent systems, this paper proposes a new multi-agent Q-learning algorithm based on decomposing the joint state and joint action learning into two learning processes, which are learning individual action and the maximum value of the joint state approximately. The latter process considers others’ actions to insure that the joint action is optimal and supports the updating of the former one. The simulation results illustrate that the proposed algorithm can learn the optimal joint behavior with smaller memory and faster learning speed compared with friend-Q learning and independent learning. 相似文献