首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到17条相似文献,搜索用时 637 毫秒
1.
对抗条件下的资源分配是大多数博弈决策问题的核心。从拟合最优解到博弈均衡解,基于博弈论的资源分配策略求解是认知决策领域的前沿课题。文中围绕对抗条件下资源分配的布洛托上校博弈模型和求解方法展开综述分析。首先,简要介绍了离线与在线策略学习的区别,策略博弈与相关解概念,在线优化与遗憾值;其次,梳理了6类布洛托上校博弈典型模型(连续布洛托上校博弈、离散布洛托上校博弈、广义布洛托上校博弈、广义乐透布洛托博弈、广义规则布洛托上校博弈与在线离散布洛托上校博弈);然后,区分2个阶段(离线与在线)3类博弈场景(单次、重复、多阶段),分析了多类布洛托上校博弈求解方法;最后,从典型应用探索、广义博弈模型、博弈求解方法、未来研究展望共4方面进行了未来研究前沿分析及展望。通过对当前布洛托上校博弈进行概述,期望能为对抗条件下资源分配与博弈论相关领域的研究带来启发。  相似文献   

2.
计算机博弈是人工智能领域的“果蝇”,备受人工智能领域研究者的关注,已然成为研究认知智能的有利平台。扑克类博弈对抗问题可建模成边界确定、规则固定的不完美信息动态博弈,计算机扑克 AI 需要具备不完全信息动态决策、对手误导欺诈行为识别以及多回合筹码和风险管理等能力。首先梳理了以德州扑克为代表的计算机扑克智能博弈的发展历程,其次针对计算机扑克智能博弈典型模型算法、关键技术以及存在的主要问题进行了综述分析,最后探讨了计算机扑克智能博弈的未来发展趋势和应用前景。  相似文献   

3.
张蒙  李凯  吴哲  臧一凡  徐航  兴军亮 《自动化学报》2022,48(4):1004-1017
以德州扑克游戏为代表的大规模不完美信息博弈是现实世界中常见的一种博弈类型.现有以求解纳什均衡策略为目标的主流德州扑克求解算法存在依赖博弈树模型、算力消耗大、策略过于保守等问题,导致智能体在面对不同对手时无法最大化自身收益.为解决上述问题,提出一种轻量高效且能快速适应对手策略变化进而剥削对手的不完美信息博弈求解框架.本框...  相似文献   

4.
智能博弈对抗一直是人工智能研究的热点。在博弈对抗环境中,通过对对手进行建模,可以推测敌对智能体动作、目标、策略等相关属性,为博弈策略制定提供关键信息。对手建模方法在竞技类游戏和作战仿真推演等领域的应用前景广阔,博弈策略的制定必须以博弈各方的行动策略为前提,因此建立一个准确的对手行为模型对于预测其意图尤其重要。从内涵、方法、应用三个方面,阐述了对手建模的必要性,对现有建模方式进行了分类;对基于强化学习的预测方法、基于心智理论的推理方法和基于贝叶斯的优化方法进行了梳理与总结;以序贯博弈(德州扑克)、即时策略博弈(星际争霸)和元博弈为典型应用场景,分析了智能博弈对抗过程中的对手建模的作用;从有限理性、策略欺骗性和可解释性三个方面进行了对手建模技术发展的展望。  相似文献   

5.
智能博弈对抗是人工智能认知决策领域亟待解决的前沿热点问题。以反事实后悔最小化算法为代表的博弈论方法和以虚拟自博弈算法为代表的强化学习方法,依托大规模算力支撑,在求解智能博弈策略中脱颖而出,但对两种范式之间的关联缺乏深入发掘。文中针对智能博弈对抗问题,定义智能博弈对抗的内涵与外延,梳理智能博弈对抗的发展历程,总结其中的关键挑战。从博弈论和强化学习两种视角出发,介绍智能博弈对抗模型、算法。多角度对比分析博弈理论和强化学习的优势与局限,归纳总结博弈理论与强化学习统一视角下的智能博弈对抗方法和策略求解框架,旨在为两种范式的结合提供方向,推动智能博弈技术前向发展,为迈向通用人工智能蓄力。  相似文献   

6.
博弈智能是一个涵盖博弈论、人工智能等方向的交叉领域,重点研究个体或组织间的交互作用,以及如何通过对博弈关系的定量建模进而实现最优策略的精确求解,最终形成智能化决策和决策知识库.近年来,随着行为数据的海量爆发和博弈形式的多样化,博弈智能吸引了越来越多学者的研究兴趣,并在现实生活中得到广泛应用.本文围绕博弈智能这一研究领域,分别从3个方面进行了系统的调研、分析和总结.首先,回顾了博弈智能的相关背景,涵盖了单智能体马尔可夫(Markov)决策过程,基于博弈论的多智能体建模技术,以及强化学习、博弈学习等多智能体求解方案.其次,依照智能体之间的博弈关系不同,将博弈分为合作博弈、对抗博弈以及混合博弈这三大类范式,并分别介绍了每种博弈智能范式下的主要研究问题、主流研究方法以及当前典型应用.最后,总结了博弈智能的研究现状,以及亟待解决的主要问题与研究挑战,并展望了学术界和工业界的未来应用前景,为相关研究人员提供参考,进一步推动国家人工智能发展战略.  相似文献   

7.
研究在不完美信息扩展式博弈中对次优对手弱点的利用.针对该领域中一种常用方法——对手建模方法——的不足,提出了从遗憾最小化的角度来利用次优对手弱点的思想,并基于一种离线的均衡计算方法——虚拟遗憾最小化方法——将其扩展到在线博弈的场景中,实现对次优对手弱点的利用.提出了从博弈结果中估计各个信息集的虚拟价值的方法,给出2种估计手段:静态估计法和动态估计法.静态估计法直接从博弈结果的分布中进行估计,并对每个结果给以相等的估计权重;而动态估计法则对新产生的博弈结果给以较高的估计权重,以便快速地适应对手的策略变化.基于2种估计方法,提出在线博弈中虚拟遗憾最小化的算法,并在基于单牌扑克的实验中,与4种在线学习算法(DBBR,MCCFR-os,Q-learning,Sarsa)进行了对比.实验结果显示所提出的算法不仅对较弱对手的利用效果最好,还能在与4种对比算法的比赛中取得最高的胜率.  相似文献   

8.
基于贝叶斯序贯博弈模型的智能电网信息物理安全分析   总被引:1,自引:0,他引:1  
李军  李韬 《自动化学报》2019,45(1):98-109
智能电网是利用信息技术优化从供应者到消费者的电力传输和配电网络.作为一种信息物理系统(Cyber-physical system,CPS),智能电网由物理设备和负责数据计算与通信的网络组成.智能电网的诸多安全问题会出现在通信网络和物理设备这两个层面,例如注入坏数据和收集客户隐私信息的网络攻击,攻击电网物理设备的物理攻击等.本文主要研究了智能电网的系统管理员(防护者)如何确定攻击者类型,从而选择最优防护策略的问题.提出了一种贝叶斯序贯博弈模型以确定攻击者的类型,根据序贯博弈树得到博弈双方的均衡策略.首先,对类型不确定的攻击者和防护者构建静态贝叶斯博弈模型,通过海萨尼转换将不完全信息博弈转换成完全信息博弈,得到贝叶斯纳什均衡解,进而确定攻击者的类型.其次,考虑攻击者和防护者之间的序贯博弈模型,它能够有效地帮助防护者进行决策分析.通过逆向归纳法分别对两种类型的攻击者和防护者之间的博弈树进行分析,得到博弈树的均衡路径,进而得到攻击者的最优攻击策略和防护者的最优防护策略.分析表明,贝叶斯序贯博弈模型能够使防护者确定攻击者的类型,并且选择最优防护策略,从而为涉及智能电网信息安全的相关研究提供参考.  相似文献   

9.
高博  冯伟 《计算机应用文摘》2004,(10):I026-I032
博弈是研究使自己取胜、战胜对手的策略。电脑下棋,也就是计算机博弈,是人工智能(AI)的一个重要的研究领域。“深蓝”与国际象棋世界冠军帕斯帕罗夫的人机博弈大战一直为人们津津乐道。  相似文献   

10.
为寻求益智类游戏"沙漠掘金"在多人参与下的游戏策略,针对具体的游戏规则进行了深入的分析,分别提出了基于完全信息静态博弈与完全信息动态博弈的游戏策略.首先,通过简化游戏规则将其转变为一个非合作博弈问题.其次,考虑单人游戏中的最优化问题,分析单人游戏的策略并在此基础上采用博弈论的方法对多人游戏的情况进行求解.最后,针对第一关,满足完全信息静态博弈的情况,模拟玩家行动,得出博弈支付矩阵,通过混合策略纳什均衡的方法计算最优策略;针对第二关,满足完全信息动态博弈的情况,构建博弈树并通过逆向递归求解得出最佳的游戏攻略,并分析了多人竞争策略.  相似文献   

11.
The article describes a gradient search based reinforcement learning algorithm for two-player zero-sum games with imperfect information. Simple gradient search may result in oscillation around solution points, a problem similar to the Crawford puzzle. To dampen oscillations, the algorithm uses lagging anchors, drawing the strategy state of the players toward a weighted average of earlier strategy states. The algorithm is applicable to games represented in extensive form. We develop methods for sampling the parameter gradient of a player's performance against an opponent, using temporal-difference learning. The algorithm is used successfully for a simplified poker game with infinite sets of pure strategies, and for the air combat game Campaign, using neural nets. We prove exponential convergence of the algorithm for a subset of matrix games.  相似文献   

12.
王龙  黄锋 《自动化学报》2023,49(3):580-613
近年来,人工智能(Artificial intelligence, AI)技术在棋牌游戏、计算机视觉、自然语言处理和蛋白质结构解析与预测等研究领域取得了众多突破性进展,传统学科之间的固有壁垒正在被逐步打破,多学科深度交叉融合的态势变得越发明显.作为现代智能科学的三个重要组成部分,博弈论、多智能体学习与控制论自诞生之初就逐渐展现出一种“你中有我,我中有你”的关联关系.特别地,近年来在AI技术的促进作用下,这三者间的交叉研究成果正呈现出一种井喷式增长的态势.为及时反映这一学术动态和趋势,本文对这三者的异同、联系以及最新的研究进展进行了系统梳理.首先,介绍了作为纽带连接这三者的四种基本博弈形式,进而论述了对应于这四种基本博弈形式的多智能体学习方法;然后,按照不同的专题,梳理了这三者交叉研究的最新进展;最后,对这一新兴交叉研究领域进行了总结与展望.  相似文献   

13.
 Traditional game theory is based on the assumption that the opponent is a perfect reasoner and all payoff information is available. Based on this assumption, game theory recommends to estimate the quality of each possible strategy by its worst possible consequences. In real-life, opponents are often not perfect and payoff information is often not exact. If the only disadvantage of some action is that an unusually clever opponent can find a complicated way to counter it, then this action may be a perfect recommendation for a play against a normal (not unusually clever) opponent. In other words, to estimate the quality of each move, instead of a normal minimum of possible consequences, we must consider the robust minimum that takes into consideration the fact that some of the consequences will never occur to the normal opponent. We show that in a reasonable statistical setting, this idea leads to the class of OWA operators. It turns out that playing against an imperfect opponent is not only a more realistic strategy, it is also often a simpler one: e.g., for the simplest game for which playing against a perfect opponent is computationally intractable (NP-hard), playing against an imperfect opponent is computationally feasible.  相似文献   

14.
德州扑克是机器博弈领域中一种很好的研究对象,与国际象棋、西洋棋等不同,其涉及了诸如不完整信息抽象与处理、多智能体竞争、风险评估与管理、对手建模等多方面的问题研究。其中,不完备信息抽象与处理是其他问题研究的基础,抽象与处理信息就是要对当前牌局进行评估以供后续工作使用。针对牌局信息不完整的情况,讨论了一些评估牌力方法,分析了这些方法的利弊,并提出了一种新的牌力评估方法,以提升计算速度,减少资源占用。实验结果表明,新的方法摒弃了查找表,达到了节省空间的目的,同时也具备较快的计算速度。  相似文献   

15.
《Artificial Intelligence》2002,134(1-2):201-240
Poker is an interesting test-bed for artificial intelligence research. It is a game of imperfect information, where multiple competing agents must deal with probabilistic knowledge, risk assessment, and possible deception, not unlike decisions made in the real world. Opponent modeling is another difficult problem in decision-making applications, and it is essential to achieving high performance in poker.This paper describes the design considerations and architecture of the poker program Poki. In addition to methods for hand evaluation and betting strategy, Poki uses learning techniques to construct statistical models of each opponent, and dynamically adapts to exploit observed patterns and tendencies. The result is a program capable of playing reasonably strong poker, but there remains considerable research to be done to play at a world-class level.  相似文献   

16.
We consider the learning problem faced by two self-interested agents repeatedly playing a general-sum stage game. We assume that the players can observe each other’s actions but not the payoffs received by the other player. The concept of Nash Equilibrium in repeated games provides an individually rational solution for playing such games and can be achieved by playing the Nash Equilibrium strategy for the single-shot game in every iteration. Such a strategy, however can sometimes lead to a Pareto-Dominated outcome for games like Prisoner’s Dilemma. So we prefer learning strategies that converge to a Pareto-Optimal outcome that also produces a Nash Equilibrium payoff for repeated two-player, n-action general-sum games. The Folk Theorem enable us to identify such outcomes. In this paper, we introduce the Conditional Joint Action Learner (CJAL) which learns the conditional probability of an action taken by the opponent given its own actions and uses it to decide its next course of action. We empirically show that under self-play and if the payoff structure of the Prisoner’s Dilemma game satisfies certain conditions, a CJAL learner, using a random exploration strategy followed by a completely greedy exploitation technique, will learn to converge to a Pareto-Optimal solution. We also show that such learning will generate Pareto-Optimal payoffs in a large majority of other two-player general sum games. We compare the performance of CJAL with that of existing algorithms such as WOLF-PHC and JAL on all structurally distinct two-player conflict games with ordinal payoffs.  相似文献   

17.
We use case-injected genetic algorithms (CIGARs) to learn to competently play computer strategy games. CIGARs periodically inject individuals that were successful in past games into the population of the GA working on the current game, biasing search toward known successful strategies. Computer strategy games are fundamentally resource allocation games characterized by complex long-term dynamics and by imperfect knowledge of the game state. CIGAR plays by extracting and solving the game's underlying resource allocation problems. We show how case injection can be used to learn to play better from a human's or system's game-playing experience and our approach to acquiring experience from human players showcases an elegant solution to the knowledge acquisition bottleneck in this domain. Results show that with an appropriate representation, case injection effectively biases the GA toward producing plans that contain important strategic elements from previously successful strategies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号