期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Learning Intelligent Behavior in a Non-stationary and Partially Observable Environment

SelÇuk şenkul Faruk Polat 《Artificial Intelligence Review》2002,18(2):97-115

Individual learning in an environment where more than one agent exist is a chal-lengingtask. In this paper, a single learning agent situated in an environment where multipleagents exist is modeled based on reinforcement learning. The environment is non-stationaryand partially accessible from an agents' point of view. Therefore, learning activities of anagent is influenced by actions of other cooperative or competitive agents in the environment.A prey-hunter capture game that has the above characteristics is defined and experimentedto simulate the learning process of individual agents. Experimental results show that thereare no strict rules for reinforcement learning. We suggest two new methods to improve theperformance of agents. These methods decrease the number of states while keeping as muchstate as necessary. 相似文献

2.

增强协作多智能体强化学习中的全局信用分配机制

姚兴虎宋光鑫《计算技术与自动化》2021,40(1):149-154

针对协作多智能体强化学习中的全局信用分配机制很难捕捉智能体之间的复杂协作关系及无法有效地处理非马尔可夫奖励信号的问题,提出了一种增强的协作多智能体强化学习中的全局信用分配机制。首先,设计了一种新的基于奖励高速路连接的全局信用分配结构,使得智能体在决策时能够考虑其所分得的局部奖励信号与团队的全局奖励信号;其次,通过融合多步奖励信号提出了一种能够适应非马尔可夫奖励的值函数估计方法。在星际争霸微操作实验平台上的多个复杂场景下的实验结果表明:所提方法不仅能够取得先进的性能,同时还能大大提高样本的利用率。相似文献

3.

多智能体系统中的分布式强化学习研究现状 总被引：4，自引：0，他引：4

下载免费PDF全文

仲宇顾国昌张汝波《控制理论与应用》2003,20(3):317-322

对目前世界上分布式强化学习方法的研究成果加以总结, 分析比较了独立强化学习、社会强化学习和群体强化学习三类分布式强化学习方法的特点、差别和适用范围, 并对分布式强化学习仍需解决的问题和未来的发展方向进行了探讨. 相似文献

4.

Predicting the Expected Behavior of Agents that Learn About Agents: The CLRI Framework 总被引：2，自引：0，他引：2

José M. Vidal Edmund H. Durfee 《Autonomous Agents and Multi-Agent Systems》2003,6(1):77-107

We describe a framework and equations used to model and predict the behavior of multi-agent systems (MASs) with learning agents. A difference equation is used for calculating the progression of an agent's error in its decision function, thereby telling us how the agent is expected to fare in the MAS. The equation relies on parameters which capture the agent's learning abilities, such as its change rate, learning rate and retention rate, as well as relevant aspects of the MAS such as the impact that agents have on each other. We validate the framework with experimental results using reinforcement learning agents in a market system, as well as with other experimental results gathered from the AI literature. Finally, we use PAC-theory to show how to calculate bounds on the values of the learning parameters. 相似文献

5.

Cooperative Multi-Agent Learning: The State of the Art 总被引：1，自引：4，他引：1

Liviu?Panait Email author Sean?Luke 《Autonomous Agents and Multi-Agent Systems》2005,11(3):387-434

Cooperative multi-agent systems (MAS) are ones in which several agents attempt, through their interaction, to jointly solve tasks or to maximize utility. Due to the interactions among the agents, multi-agent problem complexity can rise rapidly with the number of agents or their behavioral sophistication. The challenge this presents to the task of programming solutions to MAS problems has spawned increasing interest in machine learning techniques to automate the search and optimization process. We provide a broad survey of the cooperative multi-agent learning literature. Previous surveys of this area have largely focused on issues common to specific subareas (for example, reinforcement learning, RL or robotics). In this survey we attempt to draw from multi-agent learning work in a spectrum of areas, including RL, evolutionary computation, game theory, complex systems, agent modeling, and robotics. We find that this broad view leads to a division of the work into two categories, each with its own special issues: applying a single learner to discover joint solutions to multi-agent problems (team learning), or using multiple simultaneous learners, often one per agent (concurrent learning). Additionally, we discuss direct and indirect communication in connection with learning, plus open issues in task decomposition, scalability, and adaptive dynamics. We conclude with a presentation of multi-agent learning problem domains, and a list of multi-agent learning resources. 相似文献

6.

强化学习研究综述 总被引：8，自引：2，他引：8

陈学松杨宜民a 《计算机应用研究》2010,27(8):2834-2838

在未知环境中,关于agent的学习行为是一个既充满挑战又有趣的问题,强化学习通过试探与环境交互获得策略的改进,其学习和在线学习的特点使其成为机器学习研究的一个重要分支。介绍了强化学习在理论、算法和应用研究三个方面最新的研究成果,首先介绍了强化学习的环境模型和其基本要素;其次介绍了强化学习算法的收敛性和泛化有关的理论研究问题;然后结合最近几年的研究成果,综述了折扣型回报指标和平均回报指标强化学习算法;最后列举了强化学习在非线性控制、机器人控制、人工智能问题求解、多agent 系统问题等若干领域的成功应用和未来的发展方向。相似文献

7.

多智能体深度强化学习研究综述 总被引：1，自引：0，他引：1

孙彧曹雷陈希亮徐志雄赖俊《计算机工程与应用》2020,56(5):13-24

多智能体深度强化学习是机器学习领域的一个新兴的研究热点和应用方向,涵盖众多算法、规则、框架,并广泛应用于自动驾驶、能源分配、编队控制、航迹规划、路由规划、社会难题等现实领域,具有极高的研究价值和意义.对多智能体深度强化学习的基本理论、发展历程进行简要的概念介绍;按照无关联型、通信规则型、互相合作型和建模学习型4种分类方... 相似文献

8.

Learning and Exploiting Relative Weaknesses of Opponent Agents

Shaul?Markovitch Email author Ronit?Reger 《Autonomous Agents and Multi-Agent Systems》2005,10(2):103-130

Agents in a competitive interaction can greatly benefit from adapting to a particular adversary, rather than using the same general strategy against all opponents. One method of such adaptation isOpponent Modeling, in which a model of an opponent is acquired and utilized as part of the agents decision procedure in future interactions with this opponent. However, acquiring an accurate model of a complex opponent strategy may be computationally infeasible. In addition, if the learned model is not accurate, then using it to predict the opponents actions may potentially harm the agents strategy rather than improving it. We thus define the concept ofopponent weakness, and present a method for learning a model of this simpler concept. We analyze examples of past behavior of an opponent in a particular domain, judging its actions using a trusted judge. We then infer aweakness model based on the opponents actions relative to the domain state, and incorporate this model into our agents decision procedure. We also make use of a similar self-weakness model, allowing the agent to prefer states in which the opponent is weak and our agent strong; where we have arelative advantage over the opponent. Experimental results spanning two different test domains demonstrate the agents improved performance when making use of the weakness models. 相似文献

9.

优化的协作多智能体强化学习架构

下载免费PDF全文

刘玮程旭李浩源《计算机系统应用》2024,33(11):79-89

在现实环境中,许多任务需要多个智能体的协作来完成,然而智能体之间通常存在着通信受限和观察不完整的问题.深度多智能体强化学习(Deep-MARL)算法在解决这类具有挑战性的场景中表现出卓越的性能.其中QTRAN和QTRAN++是能够学习一类广泛的联合动作-价值函数的代表性方法,且同时具备强大的理论保证.然而,由于依赖于单一联合动作-价值估计量以及忽视了对智能体观察的预处理,使得QTRAN和QTRAN++的性能受到了影响.本文提出了一种称为OPTQTRAN的新算法,其在QTRAN和QTRAN++的性能基础上取得了显著的提升.首先,本文引入了一种双联合动作-价值估计量的结构,利用一个分解网络模块计算额外的联合动作-价值.为了确保准确计算联合动作-价值,本文设计了一个自适应网络模块,有效促进了值函数学习.此外,本文引入了一个多元网络结构,将智能体的观察分组到不同的单元中,以有效估计各智能体的效用函数.在广泛使用的StarCraft基准测试中进行的多场景实验表明,与最先进的多智能体强化学习方法相比,本文的方法表现出更卓越的性能. 相似文献

10.

Marie Ossenkopf Mackenzie Jorgensen Kurt Geihs 《控制论与系统》2019,50(8):672-692

Abstract

Multi-agent systems need to communicate to coordinate a shared task. We show that a recurrent neural network (RNN) can learn a communication protocol for coordination, even if the actions to coordinate are performed steps after the communication phase. We show that a separation of tasks with different temporal scale is necessary for successful learning. We contribute a hierarchical deep reinforcement learning model for multi-agent systems that separates the communication and coordination task from the action picking through a hierarchical policy. We further on show, that a separation of concerns in communication is beneficial but not necessary. As a testbed, we propose the Dungeon Lever Game and we extend the Differentiable Inter-Agent Learning (DIAL) framework. We present and compare results from different model variations on the Dungeon Lever Game. 相似文献

11.

Reinforcement Learning in the Multi-Robot Domain 总被引：16，自引：4，他引：16

Maja J. Matarić 《Autonomous Robots》1997,4(1):73-83

This paper describes a formulation of reinforcement learning that enables learning in noisy, dynamic environments such as in the complex concurrent multi-robot learning domain. The methodology involves minimizing the learning space through the use of behaviors and conditions, and dealing with the credit assignment problem through shaped reinforcement in the form of heterogeneous reinforcement functions and progress estimators. We experimentally validate the approach on a group of four mobile robots learning a foraging task. 相似文献

12.

基于分布式自适应内模的多智能体系统协同最优输出调节

下载免费PDF全文

董昱辰高伟男姜钟平《自动化学报》2025,51(3):678-691

针对离散时间多智能体系统的协同最优输出调节问题,在不依赖多智能体系统矩阵精确信息的条件下提出分布式数据驱动自适应控制策略.基于自适应动态规划和分布式自适应内模,通过引入值迭代和策略迭代两种强化学习算法,利用在线数据学习最优控制器,实现多智能体系统的协同输出调节.考虑到跟随者只能访问领导者的估计值进行在线学习,对闭环系统的稳定性和学习算法的收敛性进行严格的理论分析,证明所学习的控制增益可以收敛到最优控制增益.仿真结果验证了所提控制方法的有效性. 相似文献

13.

Hierarchical multi-agent reinforcement learning

Mohammad Ghavamzadeh Sridhar Mahadevan Rajbala Makar 《Autonomous Agents and Multi-Agent Systems》2006,13(2):197-229

In this paper, we investigate the use of hierarchical reinforcement learning (HRL) to speed up the acquisition of cooperative multi-agent tasks. We introduce a hierarchical multi-agent reinforcement learning (RL) framework, and propose a hierarchical multi-agent RL algorithm called Cooperative HRL. In this framework, agents are cooperative and homogeneous (use the same task decomposition). Learning is decentralized, with each agent learning three interrelated skills: how to perform each individual subtask, the order in which to carry them out, and how to coordinate with other agents. We define cooperative subtasks to be those subtasks in which coordination among agents significantly improves the performance of the overall task. Those levels of the hierarchy which include cooperative subtasks are called cooperation levels. A fundamental property of the proposed approach is that it allows agents to learn coordination faster by sharing information at the level of cooperative subtasks, rather than attempting to learn coordination at the level of primitive actions. We study the empirical performance of the Cooperative HRL algorithm using two testbeds: a simulated two-robot trash collection task, and a larger four-agent automated guided vehicle (AGV) scheduling problem. We compare the performance and speed of Cooperative HRL with other learning algorithms, as well as several well-known industrial AGV heuristics. We also address the issue of rational communication behavior among autonomous agents in this paper. The goal is for agents to learn both action and communication policies that together optimize the task given a communication cost. We extend the multi-agent HRL framework to include communication decisions and propose a cooperative multi-agent HRL algorithm called COM-Cooperative HRL. In this algorithm, we add a communication level to the hierarchical decomposition of the problem below each cooperation level. Before an agent makes a decision at a cooperative subtask, it decides if it is worthwhile to perform a communication action. A communication action has a certain cost and provides the agent with the actions selected by the other agents at a cooperation level. We demonstrate the efficiency of the COM-Cooperative HRL algorithm as well as the relation between the communication cost and the learned communication policy using a multi-agent taxi problem. 相似文献

14.

Wenzhang Liu Lu Dong Dan Niu Changyin Sun 《IEEE/CAA Journal of Automatica Sinica》2022,9(9):1673-1686

In multi-agent reinforcement learning (MARL), the behaviors of each agent can influence the learning of others, and the agents have to search in an exponentially enlarged joint-action space. Hence, it is challenging for the multi-agent teams to explore in the environment. Agents may achieve suboptimal policies and fail to solve some complex tasks. To improve the exploring efficiency as well as the performance of MARL tasks, in this paper, we propose a new approach by transferring the knowledge across tasks. Differently from the traditional MARL algorithms, we first assume that the reward functions can be computed by linear combinations of a shared feature function and a set of task-specific weights. Then, we define a set of basic MARL tasks in the source domain and pre-train them as the basic knowledge for further use. Finally, once the weights for target tasks are available, it will be easier to get a well-performed policy to explore in the target domain. Hence, the learning process of agents for target tasks is speeded up by taking full use of the basic knowledge that was learned previously. We evaluate the proposed algorithm on two challenging MARL tasks: cooperative box-pushing and non-monotonic predator-prey. The experiment results have demonstrated the improved performance compared with state-of-the-art MARL algorithms. 相似文献

15.

基于奖励高速路网络的多智能体强化学习中的全局信用分配算法

姚兴虎谭晓阳《计算机应用》2021,41(1):1-7

针对多智能体系统中联合动作空间随智能体数量的增加而产生的指数爆炸的问题,采用\"中心训练-分散执行\"的框架来避免联合动作空间的维数灾难并降低算法的优化代价.针对在众多的多智能体强化学习场景下,环境仅给出所有智能体的联合行为所对应的全局奖励这一问题,提出一种新的全局信用分配机制——奖励高速路网络(RHWNet).通过在原有... 相似文献

16.

强化学习算法与应用综述

下载免费PDF全文

李茹杨彭慧民李仁刚赵坤《计算机系统应用》2020,29(12):13-25

强化学习是机器学习领域的研究热点,是考察智能体与环境的相互作用,做出序列决策、优化策略并最大化累积回报的过程.强化学习具有巨大的研究价值和应用潜力,是实现通用人工智能的关键步骤.本文综述了强化学习算法与应用的研究进展和发展动态,首先介绍强化学习的基本原理,包括马尔可夫决策过程、价值函数、探索-利用问题.其次,回顾强化学... 相似文献

17.

深度强化学习下的多智能体思考型半多轮通信网络

下载免费PDF全文

邹启杰汤宇高兵赵锡玲张哲婕《控制理论与应用》2025,42(3):553-562

针对多智能体系统在合作环境中通信内容单一和信息稀疏问题,本文提出一种基于多智能体深度强化学习的思考型通信网络(TMACN).首先,智能体在交互过程中考虑不同信息源的差异性,智能体将接收到的通信信息与自身历史经验信息进行融合,形成推理信息,并将此信息作为新的发送消息,从而达到提高通信内容多样化的目标;然后,该模型在软注意力机制的基础上设计了一种半多轮通信策略,提高了信息饱和度,从而提升系统的通信交互效率.本文在合作导航、捕猎任务和交通路口3个模拟环境中证明, TMACN对比其他方法,提高了系统的准确率与稳定性. 相似文献

18.

多作用体系统的研究现状 总被引：9，自引：0，他引：9

肖晴田华萧蕴诗吴启迪《控制与决策》1997,(Z1)

概述了多作用体系统(MAS)的组织与结构,MAS中的通信机制,合作与冲突消解,控制与管理,面向作用体的编程语言以及MAS的应用。相似文献

19.

The Advantages of Designing Adaptive Business Agents Using Reputation Modeling Compared to the Approach of Recursive Modeling

Thomas Tran Robin Cohen 《Computational Intelligence》2004,20(4):532-547

Adaptive business agents operate in electronic marketplaces, learning from past experiences to make effective decisions on behalf of their users. How best to design these agents is an open question. In this article, we present an approach for the design of adaptive business agents that uses a combination of reinforcement learning and reputation modeling. In particular, we take into account the fact that multiple selling agents may offer the same good with different qualities, and that selling agents may alter the quality of their goods. We also consider the possibility of dishonest agents in the marketplace. Our buying agents exploit the reputation of selling agents to avoid interaction with the disreputable ones, and therefore to reduce the risk of purchasing low value goods. We then experimentally compare the performance of our agents with those designed using a recursive modeling approach. We are able to show that agents designed according to our algorithms achieve better performance in terms of satisfaction and computational time and as such are well suited for the design of electronic marketplaces. 相似文献

20.

多智能体深度强化学习及可扩展性研究进展

刘延飞;李超;王忠;王杰铃《计算机工程与应用》2025,61(4):1-24

多智能体深度强化学习近年来在解决智能体协作、竞争和通信问题上展现出巨大潜力。然而伴随着其在更多领域的应用;可扩展性问题备受关注;是理论研究到大规模工程应用的重要问题。回顾了强化学习理论和深度强化学习的典型算法;介绍了多智能体深度强化学习三类学习范式及其代表算法;并简要整理出当前主流的开源实验平台。详细探讨了多智能体深度强化学习在数量和场景上的可扩展性研究进展;分析了各自面临的核心问题并给出了现有的解决思路。展望了多智能体深度强化学习的应用前景和发展趋势;为推动该领域的进一步研究提供参考和启示。相似文献