首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
We consider concurrent two-player games with reachability objectives. In such games, at each round, player 1 and player 2 independently and simultaneously choose moves, and the two choices determine the next state of the game. The objective of player 1 is to reach a set of target states; the objective of player 2 is to prevent this. These are zero-sum games, and the reachability objective is one of the most basic objectives: determining the set of states from which player 1 can win the game is a fundamental problem in control theory and system verification. There are three types of winning states, according to the degree of certainty with which player 1 can reach the target. From type-1 states, player 1 has a deterministic strategy to always reach the target. From type-2 states, player 1 has a randomized strategy to reach the target with probability 1. From type-3 states, player 1 has for every real ε>0ε>0 a randomized strategy to reach the target with probability greater than 1−ε1ε. We show that for finite state spaces, all three sets of winning states can be computed in polynomial time: type-1 states in linear time, and type-2 and type-3 states in quadratic time. The algorithms to compute the three sets of winning states also enable the construction of the winning and spoiling strategies.  相似文献   

2.
3.
We establish the existence of optimal scheduling strategies for time-bounded reachability in continuous-time Markov decision processes, and of co-optimal strategies for continuous-time Markov games. Furthermore, we show that optimal control does not only exist, but has a surprisingly simple structure: the optimal schedulers from our proofs are deterministic and timed positional, and the bounded time can be divided into a finite number of intervals, in which the optimal strategies are positional. That is, we demonstrate the existence of finite optimal control. Finally, we show that these pleasant properties of Markov decision processes extend to the more general class of continuous-time Markov games, and that both early and late schedulers show this behaviour.  相似文献   

4.
A stochastic graph game is played by two players on a game graph with probabilistic transitions. We consider stochastic graph games with ω-regular winning conditions specified as parity objectives, and mean-payoff (or limit-average) objectives. These games lie in NP ∩ coNP. We present a polynomial-time Turing reduction of stochastic parity games to stochastic mean-payoff games.  相似文献   

5.
We consider the complexity of stochastic games—simple games of chance played by two players. We show that the problem of deciding which player has the greatest chance of winning the game is in the class NP co-NP.  相似文献   

6.
In this paper, the Cournot competition is modeled as a stochastic dynamic game. In the proposed model, a stochastic market price function and stochastic dynamic decision functions of the rivals are considered. Since the optimal decision of a player needs the estimation of the unknown parameters of the market and rivals’ decisions, a combined estimation-optimization algorithm for decision making is proposed. The history of the rivals’ output quantities (supplies) and the market clearing price (MCP) are the only available information to the players. The convergence of the algorithm (for both estimation and decision making processes) is discussed. In addition, the stability conditions of the equilibrium points are analyzed using the converse Lyapunov theorem. Through the case studies, which are performed based on the California Independent System Operator (CA-ISO) historical public data, the theoretical results and the applicability of the proposed method are verified. Moreover, a comparative study among the agents using the proposed method, naïve expectation and adaptive expectation in the market is performed to show the effectiveness and applicability of the proposed method.  相似文献   

7.
In this work, probabilistic reachability over a finite horizon is investigated for a class of discrete time stochastic hybrid systems with control inputs. A suitable embedding of the reachability problem in a stochastic control framework reveals that it is amenable to two complementary interpretations, leading to dual algorithms for reachability computations. In particular, the set of initial conditions providing a certain probabilistic guarantee that the system will keep evolving within a desired ‘safe’ region of the state space is characterized in terms of a value function, and ‘maximally safe’ Markov policies are determined via dynamic programming. These results are of interest not only for safety analysis and design, but also for solving those regulation and stabilization problems that can be reinterpreted as safety problems. The temperature regulation problem presented in the paper as a case study is one such case.  相似文献   

8.
We consider two-player zero-sum stochastic games on graphs with ω-regular winning conditions specified as parity objectives. These games have applications in the design and control of reactive systems. We survey the complexity results for the problem of deciding the winner in such games, and in classes of interest obtained as special cases, based on the information and the power of randomization available to the players, on the class of objectives and on the winning mode. On the basis of information, these games can be classified as follows: (a) partial-observation (both players have partial view of the game); (b) one-sided partial-observation (one player has partial-observation and the other player has complete-observation); and (c) complete-observation (both players have complete view of the game). The one-sided partial-observation games have two important subclasses: the one-player games, known as partial-observation Markov decision processes (POMDPs), and the blind one-player games, known as probabilistic automata. On the basis of randomization, (a) the players may not be allowed to use randomization (pure strategies), or (b) they may choose a probability distribution over actions but the actual random choice is external and not visible to the player (actions invisible), or (c) they may use full randomization. Finally, various classes of games are obtained by restricting the parity objective to a reachability, safety, Büchi, or coBüchi condition. We also consider several winning modes, such as sure-winning (i.e., all outcomes of a strategy have to satisfy the winning condition), almost-sure winning (i.e., winning with probability 1), limit-sure winning (i.e., winning with probability arbitrarily close to 1), and value-threshold winning (i.e., winning with probability at least ν, where ν is a given rational).  相似文献   

9.
V.R. Saksena  J.B. Cruz 《Automatica》1982,18(3):295-305
A decentralized filtering and control scheme is presented for obtaining low-order Nash equilibrium strategies for decision makers using different models of the same large-scale system. Multiparameter singular perturbations are employed to capture the multimodel nature of the fast dynamic subsystems interconnected through slow dynamic variables. The small parameters are appropriately scaled so that the variables in both time scales are well defined. The decision makers have decentralized information structures and are constrained to use only finite dimensional compensators of a specified dimension. It is shown that the proposed scheme is in fact the asymptotic limit of the exact solution as the small parameters go to zero.  相似文献   

10.
We consider two classes of scalar stochastic differential games with hard constraints on controls. The solutions are found to be bang-bang, by extending a technique developed earlier for stochastic optimal control problems.  相似文献   

11.
We consider duality relations between risk-sensitive stochastic control problems and dynamic games. They are derived from two basic duality results, the first involving free energy and relative entropy and resulting from a Legendre-type transformation, the second involving power functions. Our approach allows us to treat, in essentially the same way, continuous- and discrete-time problems, with complete and partial state observation, and leads to a very natural formal justification of the structure of the cost functional of the dual. It also allows us to obtain the solution of a stochastic game problem by solving a risk-sensitive control problem.  相似文献   

12.
Consider the problem of value iteration for solving Markov stochastic games. One simply iterates backward, via a Jacobi-like procedure. The convergence of the Gauss-Seidel form of this procedure is shown for both the discounted and ergodic cost problems, under appropriate conditions, with extensions to problems where one stops when a boundary is hit or if any one of the players chooses to stop, with associated costs. Generally, the Gauss-Seidel procedure accelerates convergence.  相似文献   

13.
Non-cooperative decision-making problems in a decentralized supply chain can be characterized and studied using a stochastic game model. In an earlier paper, the authors developed a methodology that uses machine learning for finding (near) optimal policies for non-zero sum stochastic games, and applied their methodology on an N-retailer and W-warehouse inventory-planning problem. The focus of this paper is on making the methodology more amenable to practical applications by making it completely simulation-based. It is also demonstrated, through numerical example problems, how this methodology can be used to find (near) equilibrium policies, and evaluate short-term rewards of stochastic games. Short-term rewards of stochastic games could be, in many instances, more critical than equilibrium rewards. To our knowledge, no methodology exists in the open literature that can capture the short-term behaviour of non-zero sum stochastic games as examined in this paper.  相似文献   

14.
The solution for a class of stochastic pursuit-evasion differential games between two linear dynamic systems is given. This class includes the classical interception game in Euclidean space. The performance index which is optimized is quadratic, and one of the two players has imperfect (noisy) knowledge of the states of the two systems. The "certainty-equivalence principle' or, equivalently, the technique of separating the estimator and the controller which characterizes the standard stochastic control problem is shown to be applicable to this class of differential games.  相似文献   

15.
In this paper we introduce a new multi-agent reinforcement learning algorithm, called exploring selfish reinforcement learning (ESRL). ESRL allows agents to reach optimal solutions in repeated non-zero sum games with stochastic rewards, by using coordinated exploration. First, two ESRL algorithms for respectively common interest and conflicting interest games are presented. Both ESRL algorithms are based on the same idea, i.e. an agent explores by temporarily excluding some of the local actions from its private action space, to give the team of agents the opportunity to look for better solutions in a reduced joint action space. In a latter stage these two algorithms are transformed into one generic algorithm which does not assume that the type of the game is known in advance. ESRL is able to find the Pareto optimal solution in common interest games without communication. In conflicting interest games ESRL only needs limited communication to learn a fair periodical policy, resulting in a good overall policy. Important to know is that ESRL agents are independent in the sense that they only use their own action choices and rewards to base their decisions on, that ESRL agents are flexible in learning different solution concepts and they can handle both stochastic, possible delayed rewards and asynchronous action selection. A real-life experiment, i.e. adaptive load-balancing of parallel applications is added.  相似文献   

16.
Asynchronous systems consist of a set of transitions which are non-deterministically chosen and executed. We present a theory of guiding symbolic reachability in such systems by scheduling clusters of transitions. A theory of reachability expressions which specify the schedules is presented. This theory allows proving equivalence of different schedules which may have radically different performance in BDD-based search. We present experimental evidence to show that optimized reachability expressions give rise to significant performance advantages. The profiling is carried out in the NuSMV framework using examples from discrete timed automata and circuits with delays. A variant tool called NuSMVDP has been developed for interpreting reachability expressions to carry out the experiments.  相似文献   

17.
In this paper, we discuss infinite-horizon soft-constrained stochastic Nash games involving state-dependent noise in weakly coupled large-scale systems. First, we formulate linear quadratic differential games in which robustness is attained against model uncertainty. It is noteworthy that this is the first time conditions for the existence of robust equilibria have been derived based on the solutions of sets of cross-coupled stochastic algebraic Riccati equations (CSAREs). After establishing an asymptotic structure with positive definiteness for CSAREs solutions, we derive a recursive algorithm by means of Newton’s method so that it can be used to obtain solutions for CSAREs. As another important feature, we propose a high-order approximate Nash strategy based on iterative solutions. Finally, we provide a numerical example to verify the efficiency of the proposed algorithms.  相似文献   

18.
Formal solutions for a class of stochastic pursuit-evasion games   总被引:1,自引:0,他引:1  
A class of differential pursuit-evasion games is examined in which the dynamics are linear and perturbed by additive white Gaussian noise, the performance index is quadratic, and both players receive measurements perturbed independently by additive white Gaussian noise. Linear minimax solutions are characterized in terms of a set of implicit integro-differential equations. A game of this type also possesses a "certainty-coincidence" property, meaning that its minimax behavior coincides with that of the corresponding deterministic game in the event that all noise values are zero. This property is used to decompose the minimax strategies into sums of a certainty-equivalent term and error terms.  相似文献   

19.
We describe a framework for analyzing probabilistic reachability and safety problems for discrete time stochastic hybrid systems within a dynamic games setting. In particular, we consider finite horizon zero-sum stochastic games in which a control has the objective of reaching a target set while avoiding an unsafe set in the hybrid state space, and a rational adversary has the opposing objective. We derive an algorithm for computing the maximal probability of achieving the control objective, subject to the worst-case adversary behavior. From this algorithm, sufficient conditions of optimality are also derived for the synthesis of optimal control policies and worst-case disturbance strategies. These results are then specialized to the safety problem, in which the control objective is to remain within a safe set. We illustrate our modeling framework and computational approach using both a tutorial example with jump Markov dynamics and a practical application in the domain of air traffic management.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号