首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
This paper considers a market where two large companies provide services to the population through “cloud” virtual operators buying companies’ services and reselling them to clients. Each large company assigns a price for selling its services to virtual operators. Also the number of its clients and its resource (a characteristic of company’s attractiveness for clients) are known. The game process is a repetition of two-step games where virtual operators choose companies and prices for their services. Each virtual operator needs to choose a company whose services he is going to sell and also to define a price for the services to be sold to clients. Each virtual operator establishes the probability to choose the company and the price for services, taking into account that the partition of company’s clients choosing a given operator is defined by the Hotelling specification. At each step, each virtual operator seeks to maximize his payoff. We find the optimal strategies of the virtual operators and also explore the following question. Does the system achieve some stationary state in this repeated two-step game or a repeating cycle of states is formed instead?  相似文献   

2.
We study the effects of agent movement on equilibrium selection in network based spatial coordination games with Pareto dominant and risk dominant Nash equilibria. Our primary interest is in understanding how endogenous partner selection on networks influences equilibrium selection in games with multiple equilibria. We use agent based models and best response behaviors of agents to study our questions of interest. In general, we find that allowing agents to move and choose new game play partners greatly increases the probability of attaining the Pareto dominant Nash equilibrium in coordination games. We also find that agent diversity increases the ability of agents to attain larger payoffs on average.  相似文献   

3.
In this paper, by simulations on an artificial social model, we analyze cooperative behavior of agents playing the prisoner's dilemma game, in which each of the agents has the two strategies: cooperate and defect. Because defect yields a better payoff whichever strategy an opponent chooses, it is rational for an agent to choose defect in a single game or a finite number of games. However, it is known that a pair of cooperates can also be a Nash equilibrium pair if the players do not know when the game is over or the game is infinitely repeated. To investigate such cooperative behavior, we employ an artificial social model called the Sugarscape and carry out simulations on the model. Arranging three kinds of environments in the Sugarscape, we examine cooperative behavior of agents who are essentially selfish, in a sense that they maximize their payoffs, and investigate influence of environmental changes on the cooperative behavior.  相似文献   

4.
考虑盗版的影响,分析软件开发商如何选择价格和样本试用策略。结合学习效应和网络外部性原理,基于收益管理理论,将价格和试用期限作为决策变量,建立了目标最大化的收益模型,按照Stacklberg博弈理论,采用逆向推导原理分析最优策略,并且探讨试用策略与盗版风险成本的关系。最后,通过数值算例,应用Matlab仿真分析相关参数对价格、免费试用期限和收益的影响。结论显示:软件开发商应向市场提供功能水平较高的产品,同时定价较高,而免费试用期限相对较短的策略,从而获取较高的收益;盗版率随着盗版风险成本的增加而减少;试用策略与保护策略在抵制盗版问题上是相互补充的关系。  相似文献   

5.
In order to improve the ability of achieving good performance in self-organizing teams, this paper presents a self-adaptive learning algorithm for team members. Members of the self-organizing teams are simulated by agents. In the virtual self-organizing team, agents adapt their knowledge according to cooperative principles. The self-adaptive learning algorithm is approached to learn from other agents with minimal costs and improve the performance of the self-organizing team. In the algorithm, agents learn how to behave (choose different game strategies) and how much to think about how to behave (choose the learning radius). The virtual team is self-adaptively improved according to the strategies’ ability of generating better quality solutions in the past generations. Six basic experiments are manipulated to prove the validity of the adaptive learning algorithm. It is found that the adaptive learning algorithm often causes agents to converge to optimal actions, based on agents’ continually updated cognitive maps of how actions influence the performance of the virtual self-organizing team. This paper considered the influence of relationships in self-organizing teams over existing works. It is illustrated that the adaptive learning algorithm is beneficial to both the development of self-organizing teams and the performance of the individual agent.  相似文献   

6.
This paper presents a general C++ platform for the implementation of a trade network game (TNG) that combines evolutionary game play with preferential partner selection. In the TNG, successive generations of resource constrained traders choose and refuse trade partners on the basis of continually updated expected payoffs, engage in risky trades modelled as two-person games, and evolve their trade strategies over time. The modular design of the TNG platform facilitates experimentation with alternative specifications for market structure, trade partner matching, trading, expectation formation, and trade strategy evolution. The TNG platform can be used to study the evolutionary implications of these specifications at three different levels: individual trader attributes, trade network formation, and social welfare.  相似文献   

7.
一种基于分布式强化学习的多智能体协调方法   总被引:2,自引:0,他引:2  
范波  潘泉  张洪才 《计算机仿真》2005,22(6):115-118
多智能体系统研究的重点在于使功能独立的智能体通过协商、协调和协作,完成复杂的控制任务或解决复杂的问题。通过对分布式强化学习算法的研究和分析,提出了一种多智能体协调方法,协调级将复杂的系统任务进行分解,协调智能体利用中央强化学习进行子任务的分配,行为级中的任务智能体接受各自的子任务,利用独立强化学习分别选择有效的行为,协作完成系统任务。通过在Robot Soccer仿真比赛中的应用和实验,说明了基于分布式强化学习的多智能体协调方法的效果优于传统的强化学习。  相似文献   

8.
While effective competition can force service providers to seek economically efficient methods to reduce costs, the deregulated electricity supply industry still allows some generators to exercise market power at particular locations, thereby preventing the deregulated power market to be perfectly competitive. In this paper, we investigate the interdependence of pricing mechanisms and strategy behaviors of the suppliers. A multiperiod dynamic profit-maximizing problem is converted to a bimatrix game that is solved in the framework of mixed strategies. By this procedure, we have at least one Nash solution. Instead of considering only perfectly competitive price and monopoly price, we introduce other prices between these two to simulate the real market better. Numerical examples show that the new entrant that maximizes its profit will not choose the perfectly competitive price even as an entry price.  相似文献   

9.
The iterated prisoner’s dilemma (IPD) game has frequently been used to examine the evolution of cooperative behavior among agents. When the effect of representation schemes of IPD game strategies was examined, the same representation scheme was usually assigned to all agents. That is, in the literature, a population of homogeneous agents was usually used in computational experiments. In this article, we focus on a slightly different situation where every agent does not necessarily use the same representation scheme. That is, a population can be a mixture of heterogeneous agents with different representation schemes. In computational experiments, we used binary strings of different lengths (i.e., three-bit and five-bit strings) to represent IPD game strategies. We examined the evolution of cooperative behavior among heterogeneous agents in comparison with the case of homogeneous ones for the standard IPD game with typical payoff values of 0, 1, 3, and 5. Experimental results showed that the evolution of cooperative behavior was slowed down by the use of heterogeneous agents. It was also demonstrated that a faster evolution of cooperative behavior is achieved among majority agents than among minority ones in a heterogeneous population.  相似文献   

10.
Population learning in dynamic economies with endogenous network formation has been traditionally studied in basic settings where agents face quite simple and predictable strategic situations (e.g. coordination). In this paper, we start instead to explore economies where the payoff landscape is very complicated (rugged). We propose a model where the payoff to any agent changes in an unpredictable way as soon as any small variation in the strategy configuration within its network occurs. We study population learning where agents: (i) are allowed to periodically adjust both the strategy they play in the game and their interaction network; (ii) employ some simple criteria (e.g. statistics such as MIN, MAX, MEAN, etc.) to myopically form expectations about their payoff under alternative strategy and network configurations. Computer simulations show that: (i) allowing for endogenous networks implies higher average payoff as compared to static networks; (ii) populations learn by employing network updating as a “global learning” device, while strategy updating is used to perform “fine tuning”; (iii) the statistics employed to evaluate payoffs strongly affect the efficiency of the system, i.e. convergence to a unique (multiple) steady-state(s); (iv) for some class of statistics (e.g. MIN or MAX), the likelihood of efficient population learning strongly depends on whether agents are change-averse in discriminating between options associated to the same expected payoff.  相似文献   

11.
This paper presents a novel approach to the facility layout design problem based on multi-agent society where agents’ interactions form the facility layout design. Each agent corresponds to a facility with inherent characteristics, emotions, and a certain amount of money, forming its utility function. An agent’s money is adjusted during the learning period by a manager agent while each agent tries to tune the parameters of its utility function in such a way that its total layout cost can be minimized in competition with others. The agents’ interactions are formed based on market mechanism. In each step, an unoccupied location is presented to all applicant agents, for which each agent proposes a price proportionate to its utility function. The agent proposing a higher price is selected as the winner and assigned to that location by an appropriate space-filling curve. The proposed method utilizes the fuzzy theory to establish each agent’s utility function. In addition, it provides a simulation environment using an evolutionary algorithm to form different interactions among the agents and makes it possible for them to experience various strategies. The experimental results show that the proposed approach achieves a lower total layout cost compared with state of the art methods.  相似文献   

12.
当前博弈理论的研究主要集中在合作竞争及稳定性的分析上,有关心理因素对博弈过程的影响的研究尚不多见。将情绪特征引入连续进化博弈过程,可更为真实地仿真群体博弈过程。针对不同决策风格的个体,建立个体学习机制及根据情绪特征的策略变异机制。仿真结果表明,学习能力有助于个体之间的博弈平均合作率及群体收益总量的提高;情绪对个体收益产生一定的波动性影响,但对群体平均收益的波动性影响不大。  相似文献   

13.
A cooperative aircraft differential game where an Attacker missile pursues an unmanned aerial vehicle (UAV) herein called the Target is addressed. The Target UAV cooperates with up to two Defender missiles which are launched in order to intercept the Attacker before the latter reaches the Target. This is a scenario with important military applications where each one of the agents is an autonomous air vehicle. Each agent plans and corrects its course of action in order to defeat an opposing force while simultaneously optimizing an operational relevant cost/payoff performance measure. The Target and the Defenders cooperate to form a team against the Attacker. The results in this paper build on the solution of a three agent differential game, where the three players are the Target, the Attacker, and one Defender; in this paper, the benefits of firing a second Defender are considered. Indeed, launching two interceptor missiles is a standard procedure by providing redundant backup. Building on the solution of the one-Defender problem, it is possible to address a seemingly intractable problem, where the Target needs to decide which Defender(s) to cooperate with, in addition to obtaining the optimal headings of every player in the game. Given the initial positions of the players, we solve the problem of determining if a second Defender improves the Target/Defender(s) payoff and provide the optimal strategies for each of the agents involved. Finally, we address the game of kind (for the case of one Defender) which provides the safety regions to determine which side will win based on the initial state. These safety regions provides the Target’s area of vulnerability, and using these results, we describe the reduction to the Target’s vulnerability area brought by an additional Defender.  相似文献   

14.
This paper presents an approach to develop bidding agents that participate in multiple auctions with the goal of obtaining an item with a given probability. The approach consists of a prediction method and a planning algorithm. The prediction method exploits the history of past auctions to compute probability functions capturing the belief that a bid of a given price may win a given auction. The planning algorithm computes a price and a set of compatible auctions, such that by sequentially bidding this price in each of the auctions, the agent can obtain the item with the desired probability. Experiments show that the approach increases the payoff of their users and the welfare of the market.  相似文献   

15.
16.
Optimal Search and One-Way Trading Online Algorithms   总被引:15,自引:0,他引:15  
This paper is concerned with the time series search and one-way trading problems. In the (time series) search problem a player is searching for the maximum (or minimum) price in a sequence that unfolds sequentially, one price at a time. Once during this game the player can decide to accept the current price p in which case the game ends and the player's payoff is p . In the one-way trading problem a trader is given the task of trading dollars to yen. Each day, a new exchange rate is announced and the trader must decide how many dollars to convert to yen according to the current rate. The game ends when the trader trades his entire dollar wealth to yen and his payoff is the number of yen acquired. The search and one-way trading are intimately related. Any (deterministic or randomized) one-way trading algorithm can be viewed as a randomized search algorithm. Using the competitive ratio as a performance measure we determine the optimal competitive performance for several variants of these problems. In particular, we show that a simple threat-based strategy is optimal and we determine its competitive ratio which yields, for realistic values of the problem parameters, surprisingly low competitive ratios. We also consider and analyze a one-way trading game played against an adversary called Nature where the online player knows the probability distribution of the maximum exchange rate and that distribution has been chosen by Nature. Finally, we consider some applications for a special case of portfolio selection called two-way trading in which the trader may trade back and forth between cash and one asset. Received October 19, 1998; revised August 12, 1999.  相似文献   

17.
A voter model of the spatial prisoner's dilemma   总被引:1,自引:0,他引:1  
The prisoner's dilemma (PD) involves contests between two players and may naturally be played on a spatial grid using voter model rules. In the model of spatial PD discussed here, the sites of a two-dimensional lattice are occupied by strategies. At each time step, a site is chosen to play a PD game with one of its neighbors. The strategy of the chosen site then invades its neighbor with a probability that is proportional to the payoff from the game. Using results from the analysis of voter models, it is shown that with simple linear strategies, this scenario results in the long-term survival of only one strategy. If three nonlinear strategies have a cyclic dominance relation between one another, then it is possible for relatively cooperative strategies to persist indefinitely. With the voter model dynamics, however, the average level of cooperation decreases with time if mutation of the strategies is included. Spatial effects are not in themselves sufficient to lead to the maintenance of cooperation  相似文献   

18.
林华 《计算机工程与设计》2005,26(6):1612-1613,1644
研究Agent多次协商过程中的策略调整问题,目的是使得Agent在协商过程中具有自学能力,对环境和协商对手更敏感。结合资源分配问题,讨论Agent协商过程中的学习问题,基于博弈论分别分析了单次协商和多次协商模型,给出了协商过程中在不同信息条件下遵循的策略,并进行了证明。  相似文献   

19.
We envision a future economy where e–markets will play an essential role as exchange hubs for commodities and services. Future e–markets should be designed to be robust to manipulation, flexible, and sufficiently efficient in facilitating exchanges. One of the most important aspects of designing an e–market is market mechanism design. A market mechanism defines the organization, information exchange process, trading procedure, and clearance rules of a market. If we view an e–market as a multi–agent system, the market mechanism also defines the structure and rules of the environment in which agents (buyers and sellers) play the market game. We design an e–market mechanism that is strategy–proof with respect to reservation price, weakly budget–balanced , and individually rational . Our mechanism also makes sellers unlikely to underreport the supply volume to drive up the market price. In addition, by bounding our market's efficiency loss, we provide fairly unrestrictive sufficient conditions for the efficiency of our mechanism to converge in a strong sense when (1) the number of agents who successfully trade is large, or (2) the number of agents, trading and not , is large. We implement our design using the RETSINA infrastructure, a multi–agent system development toolkit. This enables us to validate our analytically derived bounds by numerically testing our e–market.  相似文献   

20.
In machine scheduling, a set of jobs must be scheduled on a set of machines so as to minimize some global objective function, such as the makespan, which we consider in this paper. In practice, jobs are often controlled by independent, selfishly acting agents, which each select a machine for processing that minimizes the (expected) completion time. This scenario can be formalized as a game in which the players are job owners, the strategies are machines, and a player’s disutility is the completion time of its jobs in the corresponding schedule. The equilibria of these games may result in larger-than-optimal overall makespan. The price of anarchy is the ratio of the worst-case equilibrium makespan to the optimal makespan. In this paper, we design and analyze scheduling policies, or coordination mechanisms, for machines which aim to minimize the price of anarchy of the corresponding game. We study coordination mechanisms for four classes of multiprocessor machine scheduling problems and derive upper and lower bounds on the price of anarchy of these mechanisms. For several of the proposed mechanisms, we also prove that the system converges to a pure-strategy Nash equilibrium in a linear number of rounds. Finally, we note that our results are applicable to several practical problems arising in communication networks.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号