首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 129 毫秒
1.
We consider network contribution games, where each agent in a network has a budget of effort that he can contribute to different collaborative projects or relationships. Depending on the contribution of the involved agents a relationship will flourish or drown, and to measure the success we use a reward function for each relationship. Every agent is trying to maximize the reward from all relationships that it is involved in. We consider pairwise equilibria of this game, and characterize the existence, computational complexity, and quality of equilibrium based on the types of reward functions involved. When all reward functions are concave, we prove that the price of anarchy is at most?2. For convex functions the same only holds under some special but very natural conditions. Another special case extensively treated are minimum effort games, where the reward of a relationship depends only on the minimum effort of any of the participants. In these games, we can show existence of pairwise equilibrium and a price of anarchy of 2 for concave functions and special classes of games with convex functions. Finally, we show tight bounds for approximate equilibria and convergence of dynamics in these games.  相似文献   

2.
We consider a class of multilevel coalition games, with each game formed as the union of traditional cooperative and coalition games. Between-level interactions are described by a sequence of such games with different sets of reward functions.Translated from Kibernetika i Sistemnyi Analiz, No. 1, pp. 107–115, January–February, 1992.  相似文献   

3.
In this paper, we present a human-robot teaching framework that uses “virtual” games as a means for adapting a robot to its user through natural interaction in a controlled environment. We present an experimental study in which participants instruct an AIBO pet robot while playing different games together on a computer generated playfield. By playing the games and receiving instruction and feedback from its user, the robot learns to understand the user’s typical way of giving multimodal positive and negative feedback. The games are designed in such a way that the robot can reliably predict positive or negative feedback based on the game state and explore its user’s reward behavior by making good or bad moves. We implemented a two-staged learning method combining Hidden Markov Models and a mathematical model of classical conditioning to learn how to discriminate between positive and negative feedback. The system combines multimodal speech and touch input for reliable recognition. After finishing the training, the system was able to recognize positive and negative reward with an average accuracy of 90.33%.  相似文献   

4.
This paper investigates the evolutionary dynamics and optimization problem of the boxed pig games with the mechanism of passive reward and punishment by using the semi‐tensor product method. First, an algorithm is provided to construct the algebraic formulation for the dynamics of the networked evolutionary boxed pig games with the mechanism of passive reward and punishment. Then, the impact of reward and punishment parameters on the final cooperation level of the whole network is discussed. Finally, an example is provided to show the effectiveness of our results.  相似文献   

5.

Repeated quantum game theory addresses long-term relations among players who choose quantum strategies. In the conventional quantum game theory, single-round quantum games or at most finitely repeated games have been widely studied; however, less is known for infinitely repeated quantum games. Investigating infinitely repeated games is crucial since finitely repeated games do not much differ from single-round games. In this work, we establish the concept of general repeated quantum games and show the Quantum Folk Theorem, which claims that by iterating a game one can find an equilibrium strategy of the game and receive reward that is not obtained by a Nash equilibrium of the corresponding single-round quantum game. A significant difference between repeated quantum prisoner’s dilemma and repeated classical prisoner’s dilemma is that the classical Pareto optimal solution is not always an equilibrium of the repeated quantum game when entanglement is sufficiently strong. When entanglement is sufficiently strong and reward is small, mutual cooperation cannot be an equilibrium of the repeated quantum game. In addition, we present several concrete equilibrium strategies of the repeated quantum prisoner’s dilemma.

  相似文献   

6.
杨瑞  严江鹏  李秀   《智能系统学报》2020,15(5):888-899
近年来,强化学习在游戏、机器人控制等序列决策领域都获得了巨大的成功,但是大量实际问题中奖励信号十分稀疏,导致智能体难以从与环境的交互中学习到最优的策略,这一问题被称为稀疏奖励问题。稀疏奖励问题的研究能够促进强化学习实际应用与落地,在强化学习理论研究中具有重要意义。本文调研了稀疏奖励问题的研究现状,以外部引导信息为线索,分别介绍了奖励塑造、模仿学习、课程学习、事后经验回放、好奇心驱动、分层强化学习等方法。本文在稀疏奖励环境Fetch Reach上实现了以上6类方法的代表性算法进行实验验证和比较分析。使用外部引导信息的算法平均表现好于无外部引导信息的算法,但是后者对数据的依赖性更低,两类方法均具有重要的研究意义。最后,本文对稀疏奖励算法研究进行了总结与展望。  相似文献   

7.
The gaming approach to crowdsourcing is a major way to foster engagement and sustained participation. Also known as crowdsourcing games, players contribute their effort to tackle problems and receive enjoyment in return. As in any game, a fundamental mechanism in crowdsourcing games is its virtual reward system. This paper investigates how virtual reward systems evoke intrinsic motivation, perceived enjoyment and output quality in the context of crowdsourcing games. Three mobile applications for crowdsourcing location-based content were developed for an experimental study. The Track version offered a points-based reward system for actions such as contribution of content. The Badge version offered different badges for collection while the Share version served as a control which did not have any virtual reward system. For each application, participants performed a series of tasks after which a questionnaire survey was administered. Results showed that Badge and Track enhanced enjoyment emotionally, cognitively and behaviorally. They also increased perceptions of the quality of outputs when compared to Share. As well, they better satisfied the motivational needs for autonomy and competence than Share. Interestingly, there were also significant differences in how Badge and Track were perceived.  相似文献   

8.
We consider multistage network games with perfect information. At each time instant, a current network structure connecting the players is specified. It is assumed that each edge has some utility value (player’s benefit from being linked to another player), and players can change the network structure at each stage. We propose a method for finding optimal behavior for players in games of this type.  相似文献   

9.
We present automatic verification techniques for the modelling and analysis of probabilistic systems that incorporate competitive behaviour. These systems are modelled as turn-based stochastic multi-player games, in which the players can either collaborate or compete in order to achieve a particular goal. We define a temporal logic called rPATL for expressing quantitative properties of stochastic multi-player games. This logic allows us to reason about the collective ability of a set of players to achieve a goal relating to the probability of an event’s occurrence or the expected amount of cost/reward accumulated. We give an algorithm for verifying properties expressed in this logic and implement the techniques in a probabilistic model checker, as an extension of the PRISM tool. We demonstrate the applicability and efficiency of our methods by deploying them to analyse and detect potential weaknesses in a variety of large case studies, including algorithms for energy management in Microgrids and collective decision making for autonomous systems.  相似文献   

10.
This paper proposes model-free deep inverse reinforcement learning to find nonlinear reward function structures. We formulate inverse reinforcement learning as a problem of density ratio estimation, and show that the log of the ratio between an optimal state transition and a baseline one is given by a part of reward and the difference of the value functions under the framework of linearly solvable Markov decision processes. The logarithm of density ratio is efficiently calculated by binomial logistic regression, of which the classifier is constructed by the reward and state value function. The classifier tries to discriminate between samples drawn from the optimal state transition probability and those from the baseline one. Then, the estimated state value function is used to initialize the part of the deep neural networks for forward reinforcement learning. The proposed deep forward and inverse reinforcement learning is applied into two benchmark games: Atari 2600 and Reversi. Simulation results show that our method reaches the best performance substantially faster than the standard combination of forward and inverse reinforcement learning as well as behavior cloning.  相似文献   

11.
Microsoft's Xbox and Sony's PlayStation overlay achievement and trophy systems onto their video games. Though these meta-game reward systems are growing in popularity, little research has examined whether players notice, use, or seek out these systems. In this study, game players participated in focus groups to discuss the advantages and disadvantages of meta-game reward systems. Participants described the value of meta-game reward systems in promoting different ways to play games, giving positive feedback about game play, and boosting self-esteem and online and offline social status. Participants discussed completionists, or gamers that want to earn all of the badges associated with the meta-game. Though self-determination theory and its subtheory cognitive evaluation theory suggest that extrinsic rewards might harm players' intrinsic motivation, our findings suggest players may see these systems as intrinsically motivating in this context. The implications of rewards systems for motivation, video game habits, and internet gaming disorder are discussed.  相似文献   

12.
Sampled fictitious play (SFP) is a recently proposed iterative learning mechanism for computing Nash equilibria of non-cooperative games. For games of identical interests, every limit point of the sequence of mixed strategies induced by the empirical frequencies of best response actions that players in SFP play is a Nash equilibrium. Because discrete optimization problems can be viewed as games of identical interests wherein Nash equilibria define a type of local optimum, SFP has recently been employed as a heuristic optimization algorithm with promising empirical performance. However, there have been no guarantees of convergence to a globally optimal Nash equilibrium established for any of the problem classes considered to date. In this paper, we introduce a variant of SFP and show that it converges almost surely to optimal policies in model-free, finite-horizon stochastic dynamic programs. The key idea is to view the dynamic programming states as players, whose common interest is to maximize the total multi-period expected reward starting in a fixed initial state. We also offer empirical results suggesting that our SFP variant is effective in practice for small to moderate sized model-free problems.  相似文献   

13.
Psychological experiment studies reveal that human interaction behaviors are often not the same as what game theory predicts. One of important reasons is that they did not put relevant constraints into consideration when the players choose their best strategies. However, in real life, games are often played in certain contexts where players are constrained by their capabilities, law, culture, custom, and so on. For example, if someone wants to drive a car, he/she has to have a driving license. Therefore, when a human player of a game chooses a strategy, he/she should consider not only the material payoff or monetary reward from taking his/her best strategy and others' best responses but also how feasible to take the strategy in that context where the game is played. To solve such a game, this paper establishes a model of fuzzily constrained games and introduces a solution concept of constrained equilibrium for the games of this kind. Our model is consistent with psychological experiment results of ultimatum games. We also discuss what will happen if Prisoner's Dilemma and Stag Hunt are played under fuzzy constraints. In general, after putting constraints into account, our model can reflect well the human behaviors of fairness, altruism, self‐interest, and so on, and thus can predict the outcomes of some games more accurate than conventional game theory.  相似文献   

14.
In this paper we first derive a necessary and sufficient condition for a stationary strategy to be the Nash equilibrium of discounted constrained stochastic game under certain assumptions. In this process we also develop a nonlinear (non-convex) optimization problem for a discounted constrained stochastic game. We use the linear best response functions of every player and complementary slackness theorem for linear programs to derive both the optimization problem and the equivalent condition. We then extend this result to average reward constrained stochastic games. Finally, we present a heuristic algorithm motivated by our necessary and sufficient conditions for a discounted cost constrained stochastic game. We numerically observe the convergence of this algorithm to Nash equilibrium.  相似文献   

15.
In this paper adaptive dynamic programming (ADP) is applied to learn to play Gomoku. The critic network is used to evaluate board situations. The basic idea is to penalize the last move taken by the loser and reward the last move selected by the winner at the end of a game. The results show that the presented program is able to improve its performance by playing against itself and has approached the candidate level of a commercial Gomoku program called 5-star Gomoku. We also examined the influence of two methods for generating games: self-teaching and learning through watching two experts playing against each other and presented the comparison results and reasons.  相似文献   

16.
We study assignment games in which jobs select machines, and in which certain pairs of jobs may conflict, which is to say they may incur an additional cost when they are both assigned to the same machine, beyond that associated with the increase in load. Questions regarding such interactions apply beyond allocating jobs to machines: when people in a social network choose to align themselves with a group or party, they typically do so based upon not only the inherent quality of that group, but also who amongst their friends (or enemies) chooses that group as well. We show how semi-smoothness, a recently introduced generalization of smoothness, is necessary to find tight bounds on the robust price of anarchy, and thus on the quality of correlated and Nash equilibria, for several natural job-assignment games with interacting jobs. For most cases, our bounds on the robust price of anarchy are either exactly 2 or approach 2. We also prove new convergence results implied by semi-smoothness for our games. Finally we consider coalitional deviations, and prove results about the existence and quality of strong equilibrium.  相似文献   

17.
Learning automata (LA) were recently shown to be valuable tools for designing Multi-Agent Reinforcement Learning algorithms and are able to control the stochastic games. In this paper, the concepts of stigmergy and entropy are imported into learning automata based multi-agent systems with the purpose of providing a simple framework for interaction and coordination in multi-agent systems and speeding up the learning process. The multi-agent system considered in this paper is designed to find optimal policies in Markov games. We consider several dummy agents that walk around in the states of the environment, make local learning automaton active, and bring information so that the involved learning automaton can update their local state. The entropy of the probability vector for the learning automata of the next state is used to determine reward or penalty for the actions of learning automata. The experimental results have shown that in terms of the speed of reaching the optimal policy, the proposed algorithm has better learning performance than other learning algorithms.  相似文献   

18.
目前无人驾驶技术领域的研究重点主要集中在单车层面的感知、决策与控制,而缺少对多车 之间交互及博弈的研究,因此无法有效降低交通系统整体事故率并提升通行效率。该文提出一种基于 合作博弈理论的大规模自动驾驶策略涌现方法。通过建立面向网联汽车、多目标优化决策的合作博弈 演化平台,并构造了一种网格道路模型和车辆运动学模型,使得系统中各车辆之间以近邻博弈的方式 进行交互;同时系统采用分布式算法并具有间接交互的特点,最终模型计算复杂度与模拟车辆规模呈 线性关系。实验结果表明,最佳策略涌现后,事故率和平均速度均取得明显改善,其中事故率降低了 90%,模型计算速度提升了 30%。该方法可应用于包含数百万辆自动驾驶汽车的城市级智能交通规划 系统中。  相似文献   

19.
In this paper, we apply evolutionary games to non-cooperative forwarding control in Delay Tolerant Networks (DTNs). The main focus is on mechanisms to rule the participation of the relays to the delivery of messages in DTNs. Thus, we express the success probability as a function of the competition that takes place within a large population of mobiles, and we characterize the effect of reward-based mechanisms on the performance of such systems. Devices acting as active relays, in fact, sacrifice part of their batteries in order to support message replication and thus increase the probability to reach the destination. In our scheme, a relay can choose the strategy by which they participate to the message relaying. A mobile that participates receives a unit of reward based on the reward mechanism selected by the network. A utility function is introduced as the difference between the expected reward and the energy cost, i.e., the cost spent by the relay to sustain forwarding operations. We show how the evolution dynamics and the equilibrium behavior (called Evolutionary Stable Strategy – ESS) are influenced by the characteristics of inter contact time, energy expenditure and pricing characteristics.We extend our analysis to mechanisms that the system can introduce in order to have the message delivered to the destination with high probability within a given deadline and under energy constraints which bound the number of released copies per message. Finally, we apply our findings in order to devise decentralized forwarding algorithms that are rooted in the theory of stochastic approximations. Thus, we demonstrate that the ESS can be attained without complete knowledge of the system state and letting the source monitor number of released copies per message only. We provide extensive numerical results to validate the proposed scheme.  相似文献   

20.
In this paper we develop a reinforcement fuzzy learning scheme for robots playing a differential game. Differential games are games played in continuous time, with continuous states and actions. Fuzzy controllers are used to approximate the calculation of future reinforcements of the game due to actions taken at a specific time. If an immediate reinforcement reward function is defined, we may use a fuzzy system to tell what is the predicted reinforcement in a specified time ahead. This reinforcement is then used to adapt a fuzzy controller that stores the experience accumulated by the player. Simulations of a modified two car game are provided in order to show the potentiality of the technique. Experiments are performed in order to validate the method. Finally, it should be noted that although the game used as an example involves only two players, the technique may also be used in a multi-game environment.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号