首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 265 毫秒
1.
智能博弈对抗场景中,多智能体强化学习算法存在“非平稳性”问题,智能体的策略不仅取决于环境,还受到环境中对手(其他智能体)的影响。根据对手与环境的交互信息,预测其策略和意图,并以此调整智能体自身策略是缓解上述问题的有效方式。提出一种基于对手动作预测的智能博弈对抗算法,对环境中的对手进行隐式建模。该算法通过监督学习获得对手的策略特征,并将其与智能体的强化学习模型融合,缓解对手对学习稳定性的影响。在1v1足球环境中的仿真实验表明,提出的算法能够有效预测对手的动作,加快学习收敛速度,提升智能体的对抗水平。  相似文献   

2.
对于复杂大系统的建模与仿真,多智能体系统可以提供并行逻辑的支持,而高层体系结构(HLA)则可以提供公共的技术框架.两者的结合能有效地提高系统仿真与建模的效果,但是HLA/RTI的规范对多智能体系统的灵活性与开放性有一定的限制,不能充分发挥智能体在推理交互和协商合作等高层次通信方面的优秀特性.在高层体系结构下利用JADE平台工具对多智能体仿真环境进行了研究,提出了集成系统的总体架构和具体方法,并建立了原型系统,通过实验验证了集成方案的可行性,为进一步的研究奠定了基础.  相似文献   

3.
多智能体强化学习及其在足球机器人角色分配中的应用   总被引:2,自引:0,他引:2  
足球机器人系统是一个典型的多智能体系统, 每个机器人球员选择动作不仅与自身的状态有关, 还要受到其他球员的影响, 因此通过强化学习来实现足球机器人决策策略需要采用组合状态和组合动作. 本文研究了基于智能体动作预测的多智能体强化学习算法, 使用朴素贝叶斯分类器来预测其他智能体的动作. 并引入策略共享机制来交换多智能体所学习的策略, 以提高多智能体强化学习的速度. 最后, 研究了所提出的方法在足球机器人动态角色分配中的应用, 实现了多机器人的分工和协作.  相似文献   

4.
智能博弈对抗一直是人工智能研究的热点。在博弈对抗环境中,通过对对手进行建模,可以推测敌对智能体动作、目标、策略等相关属性,为博弈策略制定提供关键信息。对手建模方法在竞技类游戏和作战仿真推演等领域的应用前景广阔,博弈策略的制定必须以博弈各方的行动策略为前提,因此建立一个准确的对手行为模型对于预测其意图尤其重要。从内涵、方法、应用三个方面,阐述了对手建模的必要性,对现有建模方式进行了分类;对基于强化学习的预测方法、基于心智理论的推理方法和基于贝叶斯的优化方法进行了梳理与总结;以序贯博弈(德州扑克)、即时策略博弈(星际争霸)和元博弈为典型应用场景,分析了智能博弈对抗过程中的对手建模的作用;从有限理性、策略欺骗性和可解释性三个方面进行了对手建模技术发展的展望。  相似文献   

5.
多智能体系统是规划识别的一个有效应用平台,提出一种基于规划识别多智能体协作算法,对对抗环境和非对抗环境中的基于规划识别的多智能体协作算法进行了分析,实现了对队友和对手行为目的的认识和建模,减少了协作主体间需要通信的时间厦难度。该协作算法应用到多智能体的有效测试平台机器人足球赛中,试验结果证明,该算法在通信受限、信息受限或信息延时的系统中可有效预测队友和对手的行为,从而实现智能体间的协作。  相似文献   

6.
引入复杂适应系统理论,将股市、房地产系统视为一类复杂适应系统。采用多智能体方法进行建模,设计随机决策Agent、模仿Agent、BP神经网络Agent等模块,引入订单簿技术实现Agent之间的信息交互。通过仿真模拟该类复杂系统的动态演化过程,验证了模型的有效性。  相似文献   

7.
提出一种采用高级智能对象(ASO)的建模方法,实现虚拟人的行为建模。首先, 引入 ASO 的概念,对交互特征中的交互元素、交互部位及对象动作进行定义,并对交互动作进 行分类;其次,提出了对象驱动方法,解决由虚拟对象作为主动体动作而导致的虚拟人作为被 动体的运动计算问题,实现对象以交互特征为主的建模;最后,根据人机任务需要对虚拟人的 基本行为动作进行分析,选择常用的 4 种基本动作,对其进行定义并进行动作组合,给出以交 互元素为主的位姿、手型的计算方法,实现了行为建模并进行仿真实现,解决了任务仿真中的 交互量大问题,使仿真结果具有重用性。并以飞机装配的手工铆接仿真为例对方法进行验证。  相似文献   

8.
社会因果关系推理是社会智能的一个核心方面.建立社会推理计算模型有助于增强智能系统的认知和社会功能,并有力促进多智能体系统的设计与实现.本文基于领域任务因果知识和智能体交互行为描述,提出一种对社会因果关系和社会行为进行推理评判的计算模型.结合应用环境,给出该模型的运行实例.与相关工作进行比较,验证本文计算模型的有效性.  相似文献   

9.
按照共享控制模式建立基于多智能体的多机器人遥操作系统网络控制体系。设计了具有感知、决策和交互等公共属性的智能体模块化层次结构,给出了各模块的功能描述,阐明了多个智能体之间的交互特性。在此基础上,实现了融合多层分布式黑板模型和智能体节点的多机器人网络遥操作控制体系结构。最后实验测试了状态推理智能体的激活状态,验证了多智能体结构框架下网络遥操作控制体系的有效性。  相似文献   

10.
轨迹预测是自动驾驶和智能交通领域的关键技术,对于车辆和移动行人轨迹的准确预测可提升自动驾驶系统对周围环境变化的感知能力,保障自动驾驶系统的安全性。数据驱动轨迹预测方法可捕捉智能体之间的交互特征,对场景内智能体历史运动和静态环境信息进行分析,准确预测智能体的未来轨迹。介绍轨迹预测的数学模型并将其分为传统轨迹预测方法和数据驱动轨迹预测方法 2类,阐述主流数据驱动轨迹预测方法所面临的智能体交互建模、运动行为意图预测、轨迹多样性预测、场景内静态环境信息融合等4个主要挑战,从轨迹预测数据集使用、性能评价指标、模型特点等方面出发对典型数据驱动轨迹预测方法进行分析与对比,总结归纳这些典型数据驱动轨迹预测方法针对上述挑战的解决思路和应用场景,并对自动驾驶场景下轨迹预测技术的未来发展方向进行展望。  相似文献   

11.
A negotiation between agents is typically an incomplete information game, where the agents initially do not know their opponent’s preferences or strategy. This poses a challenge, as efficient and effective negotiation requires the bidding agent to take the other’s wishes and future behavior into account when deciding on a proposal. Therefore, in order to reach better and earlier agreements, an agent can apply learning techniques to construct a model of the opponent. There is a mature body of research in negotiation that focuses on modeling the opponent, but there exists no recent survey of commonly used opponent modeling techniques. This work aims to advance and integrate knowledge of the field by providing a comprehensive survey of currently existing opponent models in a bilateral negotiation setting. We discuss all possible ways opponent modeling has been used to benefit agents so far, and we introduce a taxonomy of currently existing opponent models based on their underlying learning techniques. We also present techniques to measure the success of opponent models and provide guidelines for deciding on the appropriate performance measures for every opponent model type in our taxonomy.  相似文献   

12.
智能体的行为预测是多智能体系统中的一个具有挑战性的问题。根据多agent系统(MAS)中的研究,为了提高进攻的速度,文中将一种新的合作策略引入到机器人足球中。首先,介绍了MAS的特点。同时,引入了一种agent削减算法来进行行为决策。进一步,调整了涉及到的agent的数目,在非通讯情况下实现了球员之间的协作。以后工作的重点是增加对对手的建模,以期提高进攻的成功率。  相似文献   

13.
In agents that operate in environments where decision-making needs to take into account, not only the environment, but also the minimizing actions of an opponent (as in games), it is fundamental that the agent is endowed with the ability of progressively tracing the profile of its adversaries, in such a manner that this profile aids in the process of selecting appropriate actions. However, it would be unsuitable to construct an agent with a decision-making system based only on the elaboration of such a profile, as this would prevent the agent from having its “own identity,” which would leave the agent at the mercy of its opponent. Following this direction, this study proposes an automatic Checkers player, called ACE-RL-Checkers, equipped with a dynamic decision-making module, which adapts to the profile of the opponent over the course of the game. In such a system, the action selection process is conducted through a composition of multilayer perceptron neural network and case library. In this case, the neural network represents the “identity” of the agent, i.e., it is an already trained static decision-making module. On the other hand, the case library represents the dynamic decision-making module of the agent, which is generated by the Automatic Case Elicitation technique. This technique has a pseudo-random exploratory behavior, which allows the dynamic decision-making of the agent to be directed either by the opponent’s game profile or randomly. In order to avoid a high occurrence of pseudo-random decision-making in the game initial phases—in which the agent counts on very little information about its opponent—this work proposes a new module based on sequential pattern mining for generating a base of experience rules extracted from human expert’s game records. This module will improve the agent’s move selection in the game initial phases. Experiments carried out in tournaments involving ACE-RL-Checkers and other agents correlated to this work, confirm the superiority of the dynamic architecture proposed herein.  相似文献   

14.
In this paper, a multi-agent reinforcement learning method based on action prediction of other agent is proposed. In a multi-agent system, action selection of the learning agent is unavoidably impacted by other agents’ actions. Therefore, joint-state and joint-action are involved in the multi-agent reinforcement learning system. A novel agent action prediction method based on the probabilistic neural network (PNN) is proposed. PNN is used to predict the actions of other agents. Furthermore, the sharing policy mechanism is used to exchange the learning policy of multiple agents, the aim of which is to speed up the learning. Finally, the application of presented method to robot soccer is studied. Through learning, robot players can master the mapping policy from the state information to the action space. Moreover, multiple robots coordination and cooperation are well realized.  相似文献   

15.
Negotiation is the most famous tool for reaching an agreement between parties. Usually, the different parties can be modeled as a buyer and a seller, who negotiate about the price of a given item. In most cases, the parties have incomplete information about one another, but they can invest money and efforts in order to acquire information about each other. This leads to the question of how much each party will be willing to invest on information about its opponent, prior to the negotiation process. In this paper, we consider the profitability of automated negotiators acquiring information on their opponents. In our model, a buyer and a seller negotiate on the price of a given item. Time is costly, and incomplete information exists about the reservation price of both parties. The reservation price of the buyer is the maximum price it is willing to pay for an item or service, and the reservation price of the seller is the minimum price it is willing to receive in order to sell the item or service. Our research is based on Cramton’s symmetrical protocol of negotiation that provides the agents with stable and symmetric strategies, and involves a delay in proposing an offer for signaling. The parties in Cramton’s model delay their offers in order to signal their strength, and then an agreement is reached after one or two offers. We determine the Nash equilibrium for agents that prefer to purchase information. Then, in addition to the theoretical background, we used simulations to check which type of equilibrium will actually be obtained. We found that in most of the cases, each agent will prefer to purchase information only if its opponent does. The reason for these results lies in the fact that an agent that prefers to purchase information according to a one-side method, signals its weakness and thereby reduces its position in the negotiation. Our results demonstrate the efficiency of joint information acquisition by both agents, but they also show that one-sided information purchasing may be inefficient, if the acquisition activity is revealed by the opponent, which causes it to infer that the informed agent is relatively weak.  相似文献   

16.
基于多主体的建模和仿真已经被广泛地应用到了复杂系统所涉及到的各个领域,系统中智能主体的实现直接影响系统的性能和仿真结果的有效性。本文通过分析反应主体和慎思主体,指出其各自的优点和缺陷,结合复杂系统仿真的实际情况,提出融合两种主体的多主体系统框架,并对实现中的关键问题给出了详细说明。  相似文献   

17.
Multiagent research provides an extensive literature on formal Beliefs-Desires-Intentions (BDI) based models describing the notion of teamwork and cooperation. However, multiagent environments are often not cooperative nor collaborative; in many cases, agents have conflicting interests, leading to adversarial interactions. This form of interaction has not yet been formally defined in terms of the agents mental states, beliefs, desires and intentions. This paper presents the Adversarial Activity model, a formal Beliefs-Desires-Intentions (BDI) based model for bounded rational agents operating in a zero-sum environment. In complex environments, attempts to use classical utility-based search methods with bounded rational agents can raise a variety of difficulties (e.g. implicitly modeling the opponent as an omniscient utility maximizer, rather than leveraging a more nuanced, explicit opponent model). We define the Adversarial Activity by describing the mental states of an agent situated in such environment. We then present behavioral axioms that are intended to serve as design principles for building such adversarial agents. We illustrate the advantages of using the model as an architectural guideline by building agents for two adversarial environments: the Connect Four game and the Risk strategic board game. In addition, we explore the application of our approach by analyzing log files of completed Connect Four games, and gain additional insights on the axioms’ appropriateness.  相似文献   

18.
计算机博弈是人工智能的果蝇和通用测试基准.近年来,序贯不完美信息博弈求解一直是计算机博弈研究领域的前沿课题.围绕计算机博弈中不完美信息博弈求解问题展开综述分析.首先,梳理计算机博弈领域标志性突破的里程碑事件,简要介绍4类新评估基准,归纳3种研究范式,提出序贯不完美信息博弈求解研究框架;然后,着重对序贯不完美信息博弈的博弈模型和解概念进行调研,从博弈构建、子博弈和元博弈、解概念以及评估3方面进行简要介绍;接着,围绕离线策略求解,系统梳理算法博弈论、优化理论和博弈学习3大类方法,围绕在线策略求解,系统梳理对手近似式学习、对手判别式适变和对手生成式搜索3大类方法;最后,从环境、智能体(对手)和策略求解3个角度分析面临的挑战,从博弈动力学和策略空间理论、多模态对抗博弈和序贯建模、通用策略学习和离线预训练、对手建模(剥削)和反剥削、临机组队和零样本协调5方面展望未来研究前沿课题.对于当前不完美信息博弈求解问题进行全面概述,期望能够为人工智能和博弈论领域相关研究带来启发.  相似文献   

19.
The success or failure of any learning algorithm is partially due to the exploration strategy it exerts. However, most exploration strategies assume that the environment is stationary and non-strategic. In this work we shed light on how to design exploration strategies in non-stationary and adversarial environments. Our proposed adversarial drift exploration (DE) is able to efficiently explore the state space while keeping track of regions of the environment that have changed. This proposed exploration is general enough to be applied in single agent non-stationary environments as well as in multiagent settings where the opponent changes its strategy in time. We use a two agent strategic interaction setting to test this new type of exploration, where the opponent switches between different behavioral patterns to emulate a non-deterministic, stochastic and adversarial environment. The agent’s objective is to learn a model of the opponent’s strategy to act optimally. Our contribution is twofold. First, we present DE as a strategy for switch detection. Second, we propose a new algorithm called R-max# for learning and planning against non-stationary opponent. To handle such opponents, R-max# reasons and acts in terms of two objectives: (1) to maximize utilities in the short term while learning and (2) eventually explore opponent behavioral changes. We provide theoretical results showing that R-max# is guaranteed to detect the opponent’s switch and learn a new model in terms of finite sample complexity. R-max# makes efficient use of exploration experiences, which results in rapid adaptation and efficient DE, to deal with the non-stationary nature of the opponent. We show experimentally how using DE outperforms the state of the art algorithms that were explicitly designed for modeling opponents (in terms average rewards) in two complimentary domains.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号