首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 218 毫秒
1.
研究了在限制交互范围条件下具有一个虚拟领导者的二阶多智能体系统的一致性问题。假设多智能体系统中所有智能体均可以接收到领导者的信息,而智能体之间只有距离在一定范围内才可以进行相互通信。在相对状态反馈的线性一致性协议下,通过李雅普诺夫法,证明了该多智能体系统二阶一致性的充分条件。接着通过仿真实例验证了理论结果,并对该算法作出了总结。  相似文献   

2.
张明悦  金芝  刘坤 《软件学报》2024,35(2):739-757
合作-竞争混合型多智能体系统由受控的目标智能体和不受控的外部智能体组成.目标智能体之间互相合作,同外部智能体展开竞争,应对环境和外部智能体的动态变化,最终完成指定的任务.针对如何训练目标智能体使他们获得完成任务的最优策略的问题,现有工作从两个方面展开:(1)仅关注目标智能体间的合作,将外部智能体视为环境的一部分,利用多智能体强化学习来训练目标智能体.这种方法难以应对外部智能体策略未知或者动态改变的情况;(2)仅关注目标智能体和外部智能体间的竞争,将竞争建模为双人博弈,采用自博弈的方法训练目标智能体.这种方法主要针对单个目标智能体和单个外部智能体的情况,难以扩展到由多个目标智能体和多个外部智能体组成的系统中.结合这两类研究,提出一种基于虚拟遗憾优势的自博弈方法.具体地,首先以虚拟遗憾最小化和虚拟多智能体策略梯度为基础,设计虚拟遗憾优势策略梯度方法,使目标智能体能更准确地更新策略;然后,引入模仿学习,以外部智能体的历史决策轨迹作为示教数据,模仿外部智能体的策略,显式地建模外部智能体的行为,来应对自博弈过程中外部智能体策略的动态变化;最后,以虚拟遗憾优势策略梯度和外部智能体行为建模为基础,设...  相似文献   

3.
博弈智能是一个涵盖博弈论、人工智能等方向的交叉领域,重点研究个体或组织间的交互作用,以及如何通过对博弈关系的定量建模进而实现最优策略的精确求解,最终形成智能化决策和决策知识库.近年来,随着行为数据的海量爆发和博弈形式的多样化,博弈智能吸引了越来越多学者的研究兴趣,并在现实生活中得到广泛应用.本文围绕博弈智能这一研究领域,分别从3个方面进行了系统的调研、分析和总结.首先,回顾了博弈智能的相关背景,涵盖了单智能体马尔可夫(Markov)决策过程,基于博弈论的多智能体建模技术,以及强化学习、博弈学习等多智能体求解方案.其次,依照智能体之间的博弈关系不同,将博弈分为合作博弈、对抗博弈以及混合博弈这三大类范式,并分别介绍了每种博弈智能范式下的主要研究问题、主流研究方法以及当前典型应用.最后,总结了博弈智能的研究现状,以及亟待解决的主要问题与研究挑战,并展望了学术界和工业界的未来应用前景,为相关研究人员提供参考,进一步推动国家人工智能发展战略.  相似文献   

4.
发挥逻辑Petri网对批处理和传值不确定性的建模优势,融合多主体博弈过程的相关博弈要素,为多主体决策问题建模,解决多主体动态博弈决策优化问题,文中提出了逻辑博弈决策Petri网.首先,定义每个token的属性为理性人并为其定义效用函数值以及状态概率转移函数.其次,引入决策变迁,依据token效用函数值对比确定最优决策变...  相似文献   

5.
本文考察了具有定常输入的二阶多智能体系统平均一致性滤波问题,提出了一种比例-积分一致性滤波算法。在定常输入和固定对称连通拓扑的前提下,根据Routh判据和Nyquist判据分别得到二阶多智能体系统在无时延和相同通信时延约束下渐近收敛一致的收敛条件,且多智能体系统最终一致性状态为定常输入的平均值。最后,通过由5个智能体组成的多智能体系统在连通拓扑结构下的数值仿真,验证了理论结果的正确性。  相似文献   

6.
郝禹哲  王振雷 《计算机应用研究》2023,40(6):1692-1696+1701
合作马尔可夫博弈中,每个智能体不仅要实现共同的目标,还需要保证联合动作能够满足设定的约束条件。为此提出了安全约束下的合作型多智能体TD3算法MACTD3 (multi-agent constrainted twin delayed deep deterministic policy gradient)。首先,结合注意力机制对各个智能体采取的动作与决策过程约束条件进行了协调。然后利用拉格朗日乘子构造了修正的代价函数。进而为保证算法的收敛性,保证每一个智能体能够满足预先设定的约束条件,设计了不同时间尺度分学习策略:在短时间尺度上执行Actor-Critic网络的梯度下降,在长时间尺度上对拉格朗日参数进行迭代。最后在异质和同质的合作型多智能体环境下进行实验。实验结果表明,与其他算法相比,提出的MACTD3算法始终能够获得最小的惩罚成本;通过数量的扩展性实验表明了MACTD3在不同数量智能体的情况下仍然能够满足约束条件,证明了算法的有效性与扩展性。  相似文献   

7.
崔艳  李庆华 《计算机工程》2020,46(4):273-278,286
目前二阶多智能体系统尚未明确给出自适应参数的确定方法,且系统的收敛速度较慢.为在实际应用中预测飞行器多智能体系统下一时刻的状态并提高收敛速度,提出一种参数自适应的一致性算法.将当前智能体间位置和速度的差值作为一致性协议的反馈参数,研究固定拓扑和切换拓扑情形下二阶多智能体系统的有限时间一致性问题,构造Lyapunov函数,同时利用LaSalle不变集原理和齐次理论,得到系统在有限时间内达到稳定的条件,实现对不同飞行器输入状态的自适应调节.仿真结果表明,该算法能够保证多智能体系统在有限时间内实现一致跟踪,且收敛速度较快.  相似文献   

8.
孙小童  郭戈  张鹏飞 《自动化学报》2021,47(6):1368-1376
本文研究了有向拓扑网络中具有非匹配扰动的二阶多智能体系统固定时间一致跟踪问题. 基于固定时间扰动观测器, 估计系统匹配扰动, 其次引入正弦补偿函数设计非奇异分布协议, 在避免系统奇异性的同时克服了非匹配扰动, 使多智能体系统实现固定时间一致跟踪. 最后通过仿真验证了算法的有效性.  相似文献   

9.
基于行为法多智能体系统构形控制研究   总被引:2,自引:0,他引:2  
宋运忠  杨飞飞 《控制工程》2012,19(4):687-690
为实现多智能体系统的构形控制,针对二阶多智能体系统,采用了一种基于智能体行为的控制算法,这种控制算法考虑到智能体的驶向目标行为和构形维持行为,可以有效实现智能体相对于期望目标的构形控制,由于采用该算法使得多智能体系统中有明确的队形反馈,因而有利于分布式控制和实时控制。智能体的动力学模型采用多智能体问题研究广泛使用的独轮车模型,通过反馈线性化方法,将这种非线性模型转化成了实用的双积分系统模型。通过Matlab仿真验证了算法的有效性,结果表明控制器参数整定简单,具有很好的稳定性和鲁棒性。  相似文献   

10.
针对不确定非线性二阶多智能体系统中存在的时变通信时延和未知干扰问题,提出了一种鲁棒自适应蜂拥控制规律。为了使二阶多智能体系统能够具有更好的抗干扰能力,设计了基于智能体位置状态信息和速度状态信息的鲁棒自适应算子,实现了系统在时变通信时延扰动下的分布控制。通过使用Lyapunov-Krasovskii方法构造能量函数,证明了多智能体系统的网络连通性,智能体的速度收敛于虚拟领导者的速度,并给出了具有时变通信时延的多智能体系统收敛条件。仿真实验结果表明,在不同干扰强度和不同通信时延下系统均能实现快速收敛,形成稳定的拓扑结构,证明所提方法正确有效。  相似文献   

11.
《Information & Management》1999,36(4):221-232
This paper discusses the multimedia processing environment, the applicability of analytic hierarchy process (AHP) in problem solving, and how AHP can be applied to the selection of multimedia authorizing systems (MAS) in a group decision environment. A MAS selection model is proposed to facilitate the group's decision making in the selection of MAS. Six software engineers, who are technically competent and experienced, participated in our study. They were trained to use AHP and then applied this technique to evaluate three MAS products for adoption decision. The results indicated that AHP offers chances for every participant to fully understand, discuss, and objectively evaluate all MAS products before identifying and selecting the most efficient MAS.  相似文献   

12.
针对绩效评价过程中一般只考虑DMU与评价者之间的合作竞争而忽视DMU间的非合作竞争的博弈,引入交叉竞争的博弈理念,将评价问题界定为评价者与DMU间合作竞争与博弈、DMU间交叉竞争的博弈两大类;考虑到在交叉竞争的博弈情境下,DMU的指标值不再是固定不变,而是随之动态调整的特点,设计交叉竞争的博弈规则,并运用决策树法描述考虑交叉竞争博弈下的DEA评价与选择过程;变评价过程中效用值改变的途径由“基于权重的交换”转化为“基于交叉竞争博弈的指标值调整”,实施对DEA模型的改进,设计交叉竞争的博弈效率DEA评价方法,得出确定型、风险型和不确定型DEA方法的分类和交叉竞争的博弈效率评价过程;从经济学的博弈论和管理学的决策分析来解释DEA,实现更加直观的DMU评价过程和更符合客观实际的评价情景.最后通过算例验证所提出方法的可行性、有效性和保序性.  相似文献   

13.
Many algorithms have been developed for the multiple objective decision making (MODM) problem. Unfortunately, there is a lack of empirical testing of these algorithms. From a managerial point of view, it is desirable to conduct a multi-model evaluation so that decision-makers can determine which method would be best in a given situation. This study compares the applicability of two computerized interactive MODM procedures: the Steuer Method and the Franz Method. Experimentation is designed to determine how effectively the two procedures arrive at a solution based upon interaction with a simulated decision-maker. Non-parametric statistical analysis is performed on the results of randomly generated problems. Both linear and nonlinear utility functions are examined. The experimentation also determines the efficiency by which the procedures derive solutions as measured by the number of iterations necessary for convergence. The results of the study recommend the use of the Steuer procedure when (1) the accuracy of the solution is the critical factor in selecting an interactive MODM algorithm, and/or (2) the objectives under consideration are of relatively equal importance to the decision maker. It is also recommended that the Franz procedure be used when (1) the number of iterations necessary for convergence is the critical factor in the algorithm selection, and/or (2) there is one main objective that the decision maker wishes to accomplish and the remaining objectives are of diminishing importance.  相似文献   

14.
In the paper, we propose a bilevel direct search method for the distributed computation of equilibria in leader–follower problems. This type of direct search methods is designed for characterizing the decision making process where the players' objective functions are not analytically available. We investigate the convergence of the accumulation points yielded by the method to the stationary points of the problems. Finally, we apply the method to a health insurance problem and carry out several numerical examples to illustrate how the method performs when solving leader–follower problems.  相似文献   

15.
In this paper, we investigate Reinforcement learning (RL) in multi-agent systems (MAS) from an evolutionary dynamical perspective. Typical for a MAS is that the environment is not stationary and the Markov property is not valid. This requires agents to be adaptive. RL is a natural approach to model the learning of individual agents. These Learning algorithms are however known to be sensitive to the correct choice of parameter settings for single agent systems. This issue is more prevalent in the MAS case due to the changing interactions amongst the agents. It is largely an open question for a developer of MAS of how to design the individual agents such that, through learning, the agents as a collective arrive at good solutions. We will show that modeling RL in MAS, by taking an evolutionary game theoretic point of view, is a new and potentially successful way to guide learning agents to the most suitable solution for their task at hand. We show how evolutionary dynamics (ED) from Evolutionary Game Theory can help the developer of a MAS in good choices of parameter settings of the used RL algorithms. The ED essentially predict the equilibriums outcomes of the MAS where the agents use individual RL algorithms. More specifically, we show how the ED predict the learning trajectories of Q-Learners for iterated games. Moreover, we apply our results to (an extension of) the COllective INtelligence framework (COIN). COIN is a proved engineering approach for learning of cooperative tasks in MASs. The utilities of the agents are re-engineered to contribute to the global utility. We show how the improved results for MAS RL in COIN, and a developed extension, are predicted by the ED. Author funded by a doctoral grant of the institute for advancement of scientific technological research in Flanders (IWT).  相似文献   

16.
由于无线传感器网络承载服务的多样性和工作环境的复杂性,使得基于单层信息设计的拓扑控制方法面临挑战。针对该问题,通过引入博弈理论和超模博弈的概念,将节点度、网络连通性和MAC层干扰程度等跨层信息融入到效用函数的设计中,构建了一种新的拓扑博弈模型,并证明了该模型属于超模博弈且存在纯策略纳什均衡,进而提出了一种跨层优化的WSN能耗均衡拓扑博弈算法(COETG)。通过仿真实验与对比分析表明,COETG算法能在保证网络连通性和鲁棒性的前提下,降低节点发射功率,拥有良好的能耗均衡性和能量效率,有效延长了网络生存时间,提升了网络性能。  相似文献   

17.
Possible techniques for representing automatic decision-making behavior approximating human experts in complex simulation model experiments are of interest. Here, fuzzy logic (FL) and constraint satisfaction problem (CSP) methods are applied in a hybrid design of automatic decision making in simulation game models. The decision processes of a military headquarters are used as a model for the FL/CSP decision agents choice of variables and rulebases. The hybrid decision agent design is applied in two different types of simulation games to test the general applicability of the design. The first application is a two-sided zero-sum sequential resource allocation game with imperfect information interpreted as an air campaign game. The second example is a network flow stochastic board game designed to capture important aspects of land manoeuvre operations. The proposed design is shown to perform well also in this complex game with a very large (billionsize) action set. Training of the automatic FL/CSP decision agents against selected performance measures is also shown and results are presented together with directions for future research.  相似文献   

18.
The distributed operation of dynamic systems, such as traffic networks and the power grid, can be viewed as a dynamic game among their control agents. As the agents respond to one another’s decisions by resolving their problems, they trace a trajectory in decision space that, if convergent, arrives at a fixed point. Thus, two issues of concern are the convergence to attractors and their location relative to Pareto optimal solutions. This paper addresses these issues in games where each agent continually solves a problem from a family of unconstrained, but general optimization problems. Specifically, it delivers simple yet effective problem transformations to influence the convergence to and location of attractors—these transformations are referred to as altruistic factors and the agents that implement them are called altruistic agents. This paper proposes algorithms to draw attractors towards Pareto optimal solutions: for the case of quadratic functions, a thorough analysis of the rate of convergence is provided; for the case of general functions, a trust-region-based algorithm is proposed. An application of this game-theoretic framework is improvement of the quality of the solutions attained by distributed model predictive control, particularly in scenarios whose objective functions are quadratic and whose dynamics are linear. The text was submitted by the authors in English.  相似文献   

19.
基于BDI的足球agent和TH—soccer平台   总被引:9,自引:0,他引:9  
机器人足球赛是典型的MAS问题。建造了TH-soccer比赛平台,利用BDI实现了足球agent模型和比赛算法,并在模型上实现了team work方法,可解决个体与team行为选择矛盾的问题,还给出了个体对抗,局部 战术和全局战术等协商、协作和对抗方法。改进了Milind Tambe等人的研究。  相似文献   

20.
We study the convergence times of dynamics in games involving graphical relationships of players. Our model of interaction games generalizes a variety of recently studied games in game theory and distributed computing. In a local interaction games each agent is a node embedded in a graph and plays the same 2-player game with each neighbor. He can choose his strategy only once and must apply his choice in each 2-player game he is involved in. This represents a fundamental model of decision making with local interaction and distributed control. Furthermore, we introduce a generalization called 2-type interaction games, in which one 2-player game is played on edges and possibly another game is played on non-edges. For the popular case with symmetric 2 ×?2 games, we show that several dynamics converge to a pure Nash equilibrium in polynomial time. This includes arbitrary sequential better-response dynamics, as well as concurrent dynamics resulting from a distributed protocol that does not rely on global knowledge. We supplement these results with an experimental comparison of sequential and concurrent dynamics.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号