首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
《Applied Soft Computing》2007,7(3):818-827
This paper proposes a reinforcement learning (RL)-based game-theoretic formulation for designing robust controllers for nonlinear systems affected by bounded external disturbances and parametric uncertainties. Based on the theory of Markov games, we consider a differential game in which a ‘disturbing’ agent tries to make worst possible disturbance while a ‘control’ agent tries to make best control input. The problem is formulated as finding a min–max solution of a value function. We propose an online procedure for learning optimal value function and for calculating a robust control policy. Proposed game-theoretic paradigm has been tested on the control task of a highly nonlinear two-link robot system. We compare the performance of proposed Markov game controller with a standard RL-based robust controller, and an H theory-based robust game controller. For the robot control task, the proposed controller achieved superior robustness to changes in payload mass and external disturbances, over other control schemes. Results also validate the effectiveness of neural networks in extending the Markov game framework to problems with continuous state–action spaces.  相似文献   

2.
Reinforcement learning (RL) has now evolved as a major technique for adaptive optimal control of nonlinear systems. However, majority of the RL algorithms proposed so far impose a strong constraint on the structure of environment dynamics by assuming that it operates as a Markov decision process (MDP). An MDP framework envisages a single agent operating in a stationary environment thereby limiting the scope of application of RL to control problems. Recently, a new direction of research has focused on proposing Markov games as an alternative system model to enhance the generality and robustness of the RL based approaches. This paper aims to present this new direction that seeks to synergize broad areas of RL and Game theory, as an interesting and challenging avenue for designing intelligent and reliable controllers. First, we briefly review some representative RL algorithms for the sake of completeness and then describe the recent direction that seeks to integrate RL and game theory. Finally, open issues are identified and future research directions outlined.  相似文献   

3.
This paper develops an adaptive fuzzy controller for robot manipulators using a Markov game formulation. The Markov game framework offers a promising platform for robust control of robot manipulators in the presence of bounded external disturbances and unknown parameter variations. We propose fuzzy Markov games as an adaptation of fuzzy Q-learning (FQL) to a continuous-action variation of Markov games, wherein the reinforcement signal is used to tune online the conclusion part of a fuzzy Markov game controller. The proposed Markov game-adaptive fuzzy controller uses a simple fuzzy inference system (FIS), is computationally efficient, generates a swift control, and requires no exact dynamics of the robot system. To illustrate the superiority of Markov game-adaptive fuzzy control, we compare the performance of the controller against a) the Markov game-based robust neural controller, b) the reinforcement learning (RL)-adaptive fuzzy controller, c) the FQL controller, d) the Hinfin theory-based robust neural game controller, and e) a standard RL-based robust neural controller, on two highly nonlinear robot arm control problems of i) a standard two-link rigid robot arm and ii) a 2-DOF SCARA robot manipulator. The proposed Markov game-adaptive fuzzy controller outperformed other controllers in terms of tracking errors and control torque requirements, over different desired trajectories. The results also demonstrate the viability of FISs for accelerating learning in Markov games and extending Markov game-based control to continuous state-action space problems.  相似文献   

4.
5.
One of the key problems in reinforcement learning (RL) is balancing exploration and exploitation. Another is learning and acting in large Markov decision processes (MDPs) where compact function approximation has to be used. This paper introduces REKWIRE, a provably efficient, model-free algorithm for finite-horizon RL problems with value function approximation (VFA) that addresses the exploration-exploitation tradeoff in a principled way. The crucial element of this algorithm is a reduction of RL to online regression in the recently proposed KWIK learning model. We show that, if the KWIK online regression problem can be solved efficiently, then the sample complexity of exploration of REKWIRE is polynomial. Therefore, the reduction suggests a new and sound direction to tackle general RL problems. The efficiency of our algorithm is verified on a set of proof-of-concept experiments where popular, ad hoc exploration approaches fail.  相似文献   

6.
Some domains, such as real-time strategy (RTS) games, pose several challenges to traditional planning and machine learning techniques. In this article, we present a novel on-line case-based planning architecture that addresses some of these problems. Our architecture addresses issues of plan acquisition, on-line plan execution, interleaved planning and execution, and on-line plan adaptation. We also introduce the Darmok system, which implements this architecture to play Wargus (an open source clone of the well-known RTS game Warcraft II ). We present empirical evaluation of the performance of Darmok and show that it successfully learns to play the Wargus game.  相似文献   

7.
Mainly motivated by the current lack of a qualitative and quantitative entertainment formulation of computer games and the procedures to generate it, this article covers the following issues: It presents the features—extracted primarily from the opponent behavior—that make a predator/prey game appealing; provides the qualitative and quantitative means for measuring player entertainment in real time, and introduces a successful methodology for obtaining games of high satisfaction. This methodology is based on online (during play) learning opponents who demonstrate cooperative action. By testing the game against humans, we confirm our hypothesis that the proposed entertainment measure is consistent with the judgment of human players. As far as learning in real time against human players is concerned, results suggest that longer games are required for humans to notice some sort of change in their entertainment.  相似文献   

8.
Abstract Serious games open up many new opportunities for complex skills learning in higher education. The inherent complexity of such games, though, requires large efforts for their development. This paper presents a framework for serious game design, which aims to reduce the design complexity at conceptual, technical and practical levels. The approach focuses on a relevant subset of serious games labelled as scenario‐based games. At the conceptual level, it identifies the basic elements that make up the static game configuration; it also describes the game dynamics, i.e. the state changes of the various game components in the course of time. At the technical level, it presents a basic system architecture, which comprises various building tools. Various building tools will be explained and illustrated with technical implementations that are part of the Emergo toolkit for scenario‐based game development. At the practical level, a set of design principles are presented for controlling and reducing game design complexity. The principles cover the topics of game structure, feedback and game representation, respectively. Practical application of the framework and the associated toolkit is briefly reported and evaluated.  相似文献   

9.
10.
Reinforcement learning (RL) can provide a basic framework for autonomous robots to learn to control and maximize future cumulative rewards in complex environments. To achieve high performance, RL controllers must consider the complex external dynamics for movements and task (reward function) and optimize control commands. For example, a robot playing tennis and squash needs to cope with the different dynamics of a tennis or squash racket and such dynamic environmental factors as the wind. In addition, this robot has to tailor its tactics simultaneously under the rules of either game. This double complexity of the external dynamics and reward function sometimes becomes more complex when both the multiple dynamics and multiple reward functions switch implicitly, as in the situation of a real (multi-agent) game of tennis where one player cannot observe the intention of her opponents or her partner. The robot must consider its opponent's and its partner's unobservable behavioral goals (reward function). In this article, we address how an RL agent should be designed to handle such double complexity of dynamics and reward. We have previously proposed modular selection and identification for control (MOSAIC) to cope with nonstationary dynamics where appropriate controllers are selected and learned among many candidates based on the error of its paired dynamics predictor: the forward model. Here we extend this framework for RL and propose MOSAIC-MR architecture. It resembles MOSAIC in spirit and selects and learns an appropriate RL controller based on the RL controller's TD error using the errors of the dynamics (the forward model) and the reward predictors. Furthermore, unlike other MOSAIC variants for RL, RL controllers are not a priori paired with the fixed predictors of dynamics and rewards. The simulation results demonstrate that MOSAIC-MR outperforms other counterparts because of this flexible association ability among RL controllers, forward models, and reward predictors.  相似文献   

11.
Cooperative Multi-Agent Learning: The State of the Art   总被引:1,自引:4,他引:1  
Cooperative multi-agent systems (MAS) are ones in which several agents attempt, through their interaction, to jointly solve tasks or to maximize utility. Due to the interactions among the agents, multi-agent problem complexity can rise rapidly with the number of agents or their behavioral sophistication. The challenge this presents to the task of programming solutions to MAS problems has spawned increasing interest in machine learning techniques to automate the search and optimization process. We provide a broad survey of the cooperative multi-agent learning literature. Previous surveys of this area have largely focused on issues common to specific subareas (for example, reinforcement learning, RL or robotics). In this survey we attempt to draw from multi-agent learning work in a spectrum of areas, including RL, evolutionary computation, game theory, complex systems, agent modeling, and robotics. We find that this broad view leads to a division of the work into two categories, each with its own special issues: applying a single learner to discover joint solutions to multi-agent problems (team learning), or using multiple simultaneous learners, often one per agent (concurrent learning). Additionally, we discuss direct and indirect communication in connection with learning, plus open issues in task decomposition, scalability, and adaptive dynamics. We conclude with a presentation of multi-agent learning problem domains, and a list of multi-agent learning resources.  相似文献   

12.
Adaptive dynamic programming (ADP) is an important branch of reinforcement learning to solve various optimal control issues. Most practical nonlinear systems are controlled by more than one controller. Each controller is a player, and to make a tradeoff between cooperation and conflict of these players can be viewed as a game. Multi-player games are divided into two main categories: zero-sum game and non-zero-sum game. To obtain the optimal control policy for each player, one needs to solve Hamilton–Jacobi–Isaacs equations for zero-sum games and a set of coupled Hamilton–Jacobi equations for non-zero-sum games. Unfortunately, these equations are generally difficult or even impossible to be solved analytically. To overcome this bottleneck, two ADP methods, including a modified gradient-descent-based online algorithm and a novel iterative offline learning approach, are proposed in this paper. Furthermore, to implement the proposed methods, we employ single-network structure, which obviously reduces computation burden compared with traditional multiple-network architecture. Simulation results demonstrate the effectiveness of our schemes.  相似文献   

13.
The aim of this paper is to study the implementation of online games to encourage public participation in urban planning. Its theoretical foundations are based on previous work in public participatory geographical information systems (PP GISs), play and games, with a special focus on serious games. Serious games aim to support learning processes in a new, more playful way. We developed the concept of playful public participation in urban planning, including playful elements such as storytelling, walking and moving, sketching, drawing, and games. A group of students designed an online serious public participatory game entitled NextCampus. The case study used in NextCampus was taken from the real-world question of a possible move of a university campus to a new location in the city of Hamburg, Germany. The development of the serious public participatory game NextCampus resulted in a physical prototype, user interface design, and a computational model of the game. The NextCampus game was tested with the help of two groups of urban planning students and presented to three external experts who provided valuable recommendations for further development. The critical comments questioned the level of complexity involved in such games. The positive comments included recognition of the potential for joy and the playfulness a game like NextCampus could evoke.  相似文献   

14.
网络游戏客户端通常采用单台显示器作为显示设备,无法给玩家提供大尺寸、高分辨率、广视角的沉浸式游戏画面。基于对现有网络游戏架构的分析,提出一种面向网络游戏的沉浸式显示框架。该框架不修改网络游戏服务器逻辑,只对游戏客户端源程序进行修改。改造后的沉浸式客户端采用集群系统驱动,集群包括一个Master节点及多个Slave节点。Master节点执行普通客户端除绘制外的所有功能。所有Slave节点与Master节点保持游戏世界状态同步,并行绘制多通道画面,并通过多投影沉浸式显示系统展示游戏画面。应用该框架对一款第一人称射击(FPS)网络游戏客户端进行改造,实验结果表明沉浸式客户端能实时展示沉浸式游戏画面,且相对集群节点数具有良好的可伸缩性。  相似文献   

15.
We propose a method for visual tracking-by-detection based on online feature learning. Our learning framework performs feature encoding with respect to an over-complete dictionary, followed by spatial pyramid pooling. We then learn a linear classifier based on the resulting feature encoding. Unlike previous work, we learn the dictionary online and update it to help capture the appearance of the tracked target as well as the background. In more detail, given a test image window, we extract local image patches from it and each local patch is encoded with respect to the dictionary. The encoded features are then pooled over a spatial pyramid to form an aggregated feature vector. Finally, a simple linear classifier is trained on these features.Our experiments show that the proposed powerful—albeit simple—tracker, outperforms all the state-of-the-art tracking methods that we have tested. Moreover, we evaluate the performance of different dictionary learning and feature encoding methods in the proposed tracking framework, and analyze the impact of each component in the tracking scenario. In particular, we show that a small dictionary, learned and updated online is as effective and more efficient than a huge dictionary learned offline. We further demonstrate the flexibility of feature learning by showing how it can be used within a structured learning tracking framework. The outcome is one of the best trackers reported to date, which facilitates the advantages of both feature learning and structured output prediction. We also implement a multi-object tracker, which achieves state-of-the-art performance.  相似文献   

16.
Networked interactivity is one of the essential factors that differentiate recent online educational games from traditional stand-alone CD-based games. Despite the growing popularity of online educational games, empirical studies about the effects of networked interactivity are relatively rare. The current study tests the effects of networked interactivity on game users' learning outcomes by comparing three groups (online educational quiz game vs. off-line educational quiz game vs. traditional classroom lecture). In addition, the study examines the mediating role of social presence in the context of educational games. Results indicate that networked interactivity in the online educational quiz game condition enhances game users' positive evaluation of learning, test performance, and feelings of social presence. However, there was no significant difference between the off-line educational quiz game and the lecture-based conditions in terms of learning outcomes. Further analyses indicate that feelings of social presence mediate the effect of networked interactivity on various learning outcomes. Theoretical and practical implications are discussed.  相似文献   

17.
We present a novel and uniform formulation of the problem of reinforcement learning against bounded memory adaptive adversaries in repeated games, and the methodologies to accomplish learning in this novel framework. First we delineate a novel strategic definition of best response that optimises rewards over multiple steps, as opposed to the notion of tactical best response in game theory. We show that the problem of learning a strategic best response reduces to that of learning an optimal policy in a Markov Decision Process (MDP). We deal with both finite and infinite horizon versions of this problem. We adapt an existing Monte Carlo based algorithm for learning optimal policies in such MDPs over finite horizon, in polynomial time. We show that this new efficient algorithm can obtain higher average rewards than a previously known efficient algorithm against some opponents in the contract game. Though this improvement comes at the cost of increased domain knowledge, simple experiments in the Prisoner's Dilemma, and coordination games show that even when no extra domain knowledge (besides that an upper bound on the opponent's memory size is known) is assumed, the error can still be small. We also experiment with a general infinite-horizon learner (using function-approximation to tackle the complexity of history space) against a greedy bounded memory opponent and show that while it can create and exploit opportunities of mutual cooperation in the Prisoner's Dilemma game, it is cautious enough to ensure minimax payoffs in the Rock–Scissors–Paper game.  相似文献   

18.
Based on the flipped‐classroom model and the potential motivational and instructional benefits of digital games, we describe a flipped game‐based learning (FGBL) strategy focused on preclass and overall learning outcomes. A secondary goal is to determine the effects, if any, of the classroom aspects of the FGBL strategy on learning efficiency. Our experiments involved 2 commercial games featuring physical motion concepts: Ballance (Newton's law of motion) and Angry Birds (mechanical energy conservation). We randomly assigned 87 8th‐grade students to game instruction (digital game before class and lecture‐based instruction in class), FGBL strategy (digital game before class and cooperative learning in the form of group discussion and practice in class), or lecture‐based instruction groups (no gameplay). Results indicate that the digital games exerted a positive effect on preclass learning outcomes and that FGBL‐strategy students achieved better overall learning outcomes than their lecture‐based peers. Our observation of similar overall outcomes between the cooperative learning and lecture‐based groups suggests a need to provide additional teaching materials or technical support when introducing video games to cooperative classroom learning activities.  相似文献   

19.
Despite media coverage that is usually negative or non-existent, digital (video) game playing is a major form of entertainment for significant numbers of people in developed countries. This paper begins with a brief outline of what these games are, who plays them and why there is interest within the library community. We then move through several examples of the current direct impact of digital games on libraries. This is followed by more abstract examples of how video games could, and do, influence library practise and technology. Attributes of online games are discussed, focusing on one particular game, Second Life. This game is used widely within the library community, with many librarians spending significant time developing avatars, open services and infrastructure. The paper concludes with a summary of the ten attributes of video game players, found across the research literature body, that are of relevance to digital information and library services.  相似文献   

20.
Cheating is a key issue in online games. Whatever the rules that govern a game, some players will always be tempted to break or elude these rules so as to gain an unfair advantage over other players. Mitigation schemes are thus needed in online gaming platforms. However, it is widely recognized that typical cheating prevention schemes introduce complications and overheads in the distributed game system, which may seriously jeopardize the online gaming experience. It turns out that, often, detecting the cheaters, instead of preventing the cheats, could represent a viable solution, especially for time cheats. We present a general framework able to model game time advancements in P2P online games. Based on this framework, time cheat detection schemes can be easily devised, which monitor the communication patterns among peers and do not affect the performances of the game system. To provide evidence of our claim, we present in this paper two different time cheats, namely fast rate cheat and look-ahead cheat, and discuss on practicable methods to detect them. Simulation results confirm the viability of the proposed approach.
Stefano FerrettiEmail:
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号