首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 68 毫秒
1.
In this paper we introduce a new multi-agent reinforcement learning algorithm, called exploring selfish reinforcement learning (ESRL). ESRL allows agents to reach optimal solutions in repeated non-zero sum games with stochastic rewards, by using coordinated exploration. First, two ESRL algorithms for respectively common interest and conflicting interest games are presented. Both ESRL algorithms are based on the same idea, i.e. an agent explores by temporarily excluding some of the local actions from its private action space, to give the team of agents the opportunity to look for better solutions in a reduced joint action space. In a latter stage these two algorithms are transformed into one generic algorithm which does not assume that the type of the game is known in advance. ESRL is able to find the Pareto optimal solution in common interest games without communication. In conflicting interest games ESRL only needs limited communication to learn a fair periodical policy, resulting in a good overall policy. Important to know is that ESRL agents are independent in the sense that they only use their own action choices and rewards to base their decisions on, that ESRL agents are flexible in learning different solution concepts and they can handle both stochastic, possible delayed rewards and asynchronous action selection. A real-life experiment, i.e. adaptive load-balancing of parallel applications is added.  相似文献   

2.
In this paper, we propose a new method of computing an approximate Nash equilibrium with additional features. Existing algorithms often fail to produce an exact solution for games involving more than 3 players. Similarly, existing algorithms do not permit additional constraints on the problem. The principle idea of this paper involves proposing a methodology for computing approximate solutions through evolutionary computation. To do so, we first provide formal definitions of these problems and their approximate versions. Following which, we present the details of our solution. One of the most important advantages of the proposed solution is flexibility, which provides solutions to problems related to Nash equilibrium extensions. The proposed idea is tested on several types of games that vary with difficulty and size. All test sets are generated based on the well-known Gamut program. Additional comparisons with classical algorithms are also performed. Results indicate that Differential Evolution is capable of obtaining satisfactory solutions to large random and covariant games. The results also demonstrate that there is a high probability that even large games, in which a set of strategies with a non-zero probability of being chosen are very small, have a solution. The computation time depends mainly on the problem size, and the original Nash equilibrium problem is unaffected by additional modifications.  相似文献   

3.
研究了一类带Poisson跳扩散过程的线性二次随机微分博弈,包括非零和博弈的Nash均衡策略与零和博弈的鞍点均衡策略问题.利用微分博弈的最大值原理,得到Nash均衡策略的存在条件等价于两个交叉耦合的矩阵Riccati方程存在解,鞍点均衡策略的存在条件等价于一个矩阵Riccati方程存在解的结论,并给出了均衡策略的显式表达及最优性能泛函值.最后,将所得结果应用于现代鲁棒控制中的随机H2/H控制与随机H控制问题,得到了鲁棒控制策略的存在条件及显式表达,并验证所得结果在金融市场投资组合优化问题中的应用.  相似文献   

4.
In this paper, we study a new type of differential game problems of backward stochastic differential delay equations under partial information. A class of time‐advanced stochastic differential equations (ASDEs) is introduced as the adjoint process via duality relation. By means of ASDEs, we suggest the necessary and sufficient conditions called maximum principle for an equilibrium point of non‐zero sum games. As an application, an economic problem is putted into our framework to illustrate the theoretical results. In terms of the maximum principle and some auxiliary filtering results, an equilibrium point is obtained.  相似文献   

5.
Recent progress in Stackelberg dynamic games concentrates on either the deterministic situations or partially nested stochastic problems. In this paper, a class of "nonnested' stochastic Stackelberg dynamic games, namely LOG additive incentive problems, is solved. By explicitly introducing two essential ideas-"matching answers" and "GPD phenomenon," each incentive problem can be converted into a set of decoupled inverse team problems (ITP's). The solution of each ITP can then be found by solving a set of linear algebraic equations if it exists. Moreover, it is shown that, for a wide class of problems, if there exist more agents, then it is more advantageous to the leader in the sense that he has more free parameters to manipulate. Therefore, the leader can always induce the agents' cooperation as a team by means of incentives if there are enough agents. Application to a class of stochastic closed-loop dynamic Stackelberg games is also given. Since typically there exist a large number of free parameters in our solution, adequate parameters can then be chosen to make the incentive scheme satisfy additional useful properties such as balanced budget and noise robustness. These topics are treated in the Appendices.  相似文献   

6.
Many models from a variety of areas involve the computation of an equilibrium or fixed point of some kind. Examples include Nash equilibria in games; market equilibria; computing optimal strategies and the values of competitive games (stochastic and other games); stable configurations of neural networks; analysing basic stochastic models for evolution like branching processes and for language like stochastic context-free grammars; and models that incorporate the basic primitives of probability and recursion like recursive Markov chains. It is not known whether these problems can be solved in polynomial time. There are certain common computational principles underlying different types of equilibria, which are captured by the complexity classes PLS, PPAD, and FIXP. Representative complete problems for these classes are, respectively, pure Nash equilibria in games where they are guaranteed to exist, (mixed) Nash equilibria in two-player normal form games, and (mixed) Nash equilibria in normal form games with three (or more) players. This paper reviews the underlying computational principles and the corresponding classes.  相似文献   

7.
Sampled fictitious play (SFP) is a recently proposed iterative learning mechanism for computing Nash equilibria of non-cooperative games. For games of identical interests, every limit point of the sequence of mixed strategies induced by the empirical frequencies of best response actions that players in SFP play is a Nash equilibrium. Because discrete optimization problems can be viewed as games of identical interests wherein Nash equilibria define a type of local optimum, SFP has recently been employed as a heuristic optimization algorithm with promising empirical performance. However, there have been no guarantees of convergence to a globally optimal Nash equilibrium established for any of the problem classes considered to date. In this paper, we introduce a variant of SFP and show that it converges almost surely to optimal policies in model-free, finite-horizon stochastic dynamic programs. The key idea is to view the dynamic programming states as players, whose common interest is to maximize the total multi-period expected reward starting in a fixed initial state. We also offer empirical results suggesting that our SFP variant is effective in practice for small to moderate sized model-free problems.  相似文献   

8.
针对一类连续时间线性Markov跳变系统,本文提出了一种新的策略迭代算法用于求解系统的非零和微分反馈Nash控制问题.通过求解耦合的数值迭代解,以获得具有线性动力学特性和无限时域二次成本的双层非零和微分策略的Nash均衡解.在每一个策略层,采用策略迭代算法来计算与每一组给定的反馈控制策略相关联的最小无限时域值函数.然后,通过子系统分解将Markov跳变系统分解为N个并行的子系统,并将该算法应用于跳变系统.本文提出的策略迭代算法可以很容易求解非零和微分策略所对应的耦合代数Riccati方程,且对高维系统有效.最后通过仿真示例证明了本文设计方法的有效性和可行性.  相似文献   

9.
This paper develops an equilibrium theory for two-person two-criteria stochastic decision problems with static information patterns, wherein the decision makers (DM's) have different probabilistic models of the underlying process, the objective functionals are quadratic, and the decision spaces are general inner-product spaces. Under two different modes of decision making (viz. symmetric and asymmetric), sufficient conditions are obtained for the existence and uniqueness of equilibrium solutions (stable in the former case), and in each case a uniformly convergent iterative scheme is developed whereby the equilibrium policies of the DM's can be obtained by evaluating a number of conditional expectations. When the probability measures are Gaussian, the equilibrium solution is linear under the symmetric mode of decision making, whereas it is generically nonlinear in the asymmetric case, with the linear structure prevailing only in some special cases which are delineated in the paper.  相似文献   

10.
B. Tołwiński 《Automatica》1982,18(4):431-441
The paper proposes an equilibrium solution concept for dynamic games where players can communicate with one another, but cannot make contractual agreements. In such games, unlike the static problems without contracting possibilities, the cooperation between players is possible due to the fact that the realization of negotiated agreements can be enforced by suitably-defined strategies. The definition presented combines dynamic programming, the theory of bargaining and the notion of enforceable agreements to produce a class of cooperative solutions defined in the form of memory Nash equilibria satisfying the principle of optimality along the equilibrium trajectory. The choice of a particular solution in this class depends on players' expected actions in case of disagreement, and on an adopted negotiation scheme formalized in the form of a bargaining model. Possible formulations of disagreement policies and bargaining models are discussed in some detail.  相似文献   

11.
In this paper we introduce a model of multigenerational stochastic games of capital accumulation where each generation consists of m different players. The main objective is to prove the existence of a perfect stationary equilibrium in an infinite horizon game. A suitable change in the terminology used in this paper provides (in the case of perfect altruism between generations) a new Nash equilibrium theorem for standard stochastic games with uncountable state space.  相似文献   

12.
Most of the literature on oligopoly deals with profit-maximizing firms engaging in “static” repetitive games. As the number of firms increases, the Nash-equilibrium strategy for each Cournot oligopolist converges to the competitive solution. In a two-person, zero-sum differential game model of duopoly [1] we introduced dynamic elements and explored alternative entrepreneurial goals. The duopolists endeavor to outsell each other subject to a no-loss constraint; the saturation of present markets by past sales and the impact on future goodwill by current advertisement are handled through “state variables.” The differential game formulation [1, 2] offers two advantages: (a) near perfect information leads to frequent existence of pure strategy equilibria and (b) the use of optimal control theory facilitates the characterization of the time structure of an equilibrium. However, the two-person, zero-sum framework is too restrictive while a general theory for solving n-person, non-zero sum differential games has still not been developed [3, 4].  相似文献   

13.
Imitating successful behavior is a natural and frequently applied approach when facing complex decision problems. In this paper, we design protocols for distributed latency minimization in atomic congestion games based on imitation. We propose to study concurrent dynamics that emerge when each agent samples another agent and possibly imitates this agent’s strategy if the anticipated latency gain is sufficiently large. Our focus is on convergence properties. We show convergence in a monotonic fashion to stable states, in which none of the agents can improve their latency by imitating others. As our main result, we show rapid convergence to approximate equilibria, in which only a small fraction of agents sustains a latency significantly above or below average. Imitation dynamics behave like an FPTAS, and the convergence time depends only logarithmically on the number of agents. Imitation processes cannot discover unused strategies, and strategies may become extinct with non-zero probability. For singleton games we show that the probability of this event occurring is negligible. Additionally, we prove that the social cost of a stable state reached by our dynamics is not much worse than an optimal state in singleton games with linear latency functions. We concentrate on the case of symmetric network congestion games, but our results do not use the network structure and continue to hold accordingly for general symmetric games. They even apply to asymmetric games when agents sample within the set of agents with the same strategy space. Finally, we discuss how the protocol can be extended such that, in the long run, dynamics converge to a pure Nash equilibrium.  相似文献   

14.
This paper is concerned with a new kind of non-zero sum differential game of backward stochastic differential equations (BSDEs). It is required that the control is adapted to a sub-filtration of the filtration generated by the underlying Brownian motion. We establish a necessary condition in the form of maximum principle with Pontryagin’s type for open-loop Nash equilibrium point of this type of partial information game, and then give a verification theorem which is a sufficient condition for Nash equilibrium point. The theoretical results are applied to study a partial information linear-quadratic (LQ) game and a partial information financial problem.  相似文献   

15.
Cooperative Control and Potential Games   总被引:1,自引:0,他引:1  
We present a view of cooperative control using the language of learning in games. We review the game-theoretic concepts of potential and weakly acyclic games, and demonstrate how several cooperative control problems, such as consensus and dynamic sensor coverage, can be formulated in these settings. Motivated by this connection, we build upon game-theoretic concepts to better accommodate a broader class of cooperative control problems. In particular, we extend existing learning algorithms to accommodate restricted action sets caused by the limitations of agent capabilities and group based decision making. Furthermore, we also introduce a new class of games called sometimes weakly acyclic games for time-varying objective functions and action sets, and provide distributed algorithms for convergence to an equilibrium.  相似文献   

16.
The author formulates and solves a dynamic stochastic optimization problem of a nonstandard type, whose optimal solution features active learning. The proof of optimality and the derivation of the corresponding control policies is an indirect one, which relates the original single-person optimization problem to a sequence of nested zero-sum stochastic games. Existence of saddle points for these games implies the existence of optimal policies for the original control problem, which, in turn, can be obtained from the solution of a nonlinear deterministic, optimal control problem. The author also studies the problem of existence of stationary optimal policies when the time horizon is infinite and the objective function is discounted  相似文献   

17.
本文考虑系数未知的离散时间线性随机系统多人非合作的自适应博弈问题,每个参与者运用最小二乘算法和"必然等价原则"来设计博弈策略组合,目的是自适应优化自身的一步超前收益函数.本文证明此自适应策略组合使得闭环系统全局稳定,并且在一定意义下是该博弈问题的渐近纳什均衡解.  相似文献   

18.
《Automatica》2014,50(12):3038-3053
This paper introduces a new class of multi-agent discrete-time dynamic games, known in the literature as dynamic graphical games. For that reason a local performance index is defined for each agent that depends only on the local information available to each agent. Nash equilibrium policies and best-response policies are given in terms of the solutions to the discrete-time coupled Hamilton–Jacobi equations. Since in these games the interactions between the agents are prescribed by a communication graph structure we have to introduce a new notion of Nash equilibrium. It is proved that this notion holds if all agents are in Nash equilibrium and the graph is strongly connected. A novel reinforcement learning value iteration algorithm is given to solve the dynamic graphical games in an online manner along with its proof of convergence. The policies of the agents form a Nash equilibrium when all the agents in the neighborhood update their policies, and a best response outcome when the agents in the neighborhood are kept constant. The paper brings together discrete Hamiltonian mechanics, distributed multi-agent control, optimal control theory, and game theory to formulate and solve these multi-agent dynamic graphical games. A simulation example shows the effectiveness of the proposed approach in a leader-synchronization case along with optimality guarantees.  相似文献   

19.
We present new algorithms for determining optimal strategies for two-player games with proba- bilistic moves and reachability winning conditions. Such games, known as simple stochastic games, were extensively studied by A.Condon [Anne Condon. The complexity of stochastic games. Information and Computation, 96(2):203–224, 1992, Anne Condon. On algorithms for simple stochastic games. In Jin-Yi Cai, editor, Advances in Computational Complexity Theory, volume 13 of DIMACS Series in Discrete Mathematics and Theoretical Computer Science, pages 51–73. AMS, 1993]. Many interesting problems, including parity games and hence also mu-calculus model checking, can be reduced to simple stochastic games. It is an open problem, whether simple stochastic games can be solved in polynomial time. Our algorithms determine the optimal expected payoffs in the game. We use geometric interpre- tation of the search space as a subset of the hyper-cube [0,1]N. The main idea is to divide this set into convex subregions in which linear optimization methods can be used. We show how one can proceed from one subregion to the other so that, eventually, a region containing the optinal payoffs will be found. The total number of subregions is exponential in the size of the game but, in practice, the algorithms need to visit only few of them to find a solution. We believe that our new algorithms could provide new insights into the difficult problem of deter- mining algorithmic complexity of simple stochastic games and other, equivallent problems.  相似文献   

20.
Stochastic game theoretic framework has been used in many fields of networks with interactive behaviors. However, further use of this framework is limited due to the following reasons. Firstly, it is difficult to build comprehensive and rigorous models for complex network structures by the state-based game model. Secondly, solving and extending the dynamic behaviors of participators of the network are nearly impossible, because of the complexity of state transitions. Last but not least, general game model is not able to describe and analyze specific events and behaviors in some kinds of networks, like enterprise networks. In this paper, we propose a new modeling paradigm (stochastic game net, or SGN) for stochastic games representation with Petri nets. Based on our graphical tool, stochastic game problems can be described clearly, and the model can be solved and extended easily. Moreover, this paper puts forth a series of methods for modeling and analyzing the competitive game by SGN, which is the main contribution of this work. Our achievements are applied to the security analysis for enterprise networks. The analysis results prove the powerful ability of our achievements in solving the complicated and dynamic game problems. Furthermore, our approaches can be used to calculate the existence and the value of an equilibrium point.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号