首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Colearning in Differential Games   总被引:1,自引:0,他引:1  
Sheppard  John W. 《Machine Learning》1998,33(2-3):201-233
Game playing has been a popular problem area for research in artificial intelligence and machine learning for many years. In almost every study of game playing and machine learning, the focus has been on games with a finite set of states and a finite set of actions. Further, most of this research has focused on a single player or team learning how to play against another player or team that is applying a fixed strategy for playing the game. In this paper, we explore multiagent learning in the context of game playing and develop algorithms for co-learning in which all players attempt to learn their optimal strategies simultaneously. Specifically, we address two approaches to colearning, demonstrating strong performance by a memory-based reinforcement learner and comparable but faster performance with a tree-based reinforcement learner.  相似文献   

2.
Machine learning is traditionally formalized and investigated as the study of learning concepts and decision functions from labeled examples, requiring a representation that encodes information about the domain of the decision function to be learned. We are interested in providing a way for a human teacher to interact with an automated learner using natural instructions, thus allowing the teacher to communicate the relevant domain expertise to the learner without necessarily knowing anything about the internal representations used in the learning process. In this paper we suggest to view the process of learning a decision function as a natural language lesson interpretation problem, as opposed to learning from labeled examples. This view of machine learning is motivated by human learning processes, in which the learner is given a lesson describing the target concept directly and a few instances exemplifying it. We introduce a learning algorithm for the lesson interpretation problem that receives feedback from its performance on the final task, while learning jointly (1) how to interpret the lesson and (2) how to use this interpretation to do well on the final task. traditional machine learning by focusing on supplying the learner only with information that can be provided by a task expert. We evaluate our approach by applying it to the rules of the solitaire card game. We show that our learning approach can eventually use natural language instructions to learn the target concept and play the game legally. Furthermore, we show that the learned semantic interpreter also generalizes to previously unseen instructions.  相似文献   

3.
We present a novel and uniform formulation of the problem of reinforcement learning against bounded memory adaptive adversaries in repeated games, and the methodologies to accomplish learning in this novel framework. First we delineate a novel strategic definition of best response that optimises rewards over multiple steps, as opposed to the notion of tactical best response in game theory. We show that the problem of learning a strategic best response reduces to that of learning an optimal policy in a Markov Decision Process (MDP). We deal with both finite and infinite horizon versions of this problem. We adapt an existing Monte Carlo based algorithm for learning optimal policies in such MDPs over finite horizon, in polynomial time. We show that this new efficient algorithm can obtain higher average rewards than a previously known efficient algorithm against some opponents in the contract game. Though this improvement comes at the cost of increased domain knowledge, simple experiments in the Prisoner's Dilemma, and coordination games show that even when no extra domain knowledge (besides that an upper bound on the opponent's memory size is known) is assumed, the error can still be small. We also experiment with a general infinite-horizon learner (using function-approximation to tackle the complexity of history space) against a greedy bounded memory opponent and show that while it can create and exploit opportunities of mutual cooperation in the Prisoner's Dilemma game, it is cautious enough to ensure minimax payoffs in the Rock–Scissors–Paper game.  相似文献   

4.
Employing a mixed-method explorative approach, this study examined the in situ use of and opinions about an educational computer game for learning English introduced in three schools offering different levels of freedom to choose school activities. The results indicated that the general behaviour of the children with the game was very different for each of the schools while there were no significant differences in subjective opinions or previous computer game experience as measured with a questionnaire. The gaming records and interviews informed that children do enjoy playing the game in comparison with other formal learning activities, but appreciate it less as a leisure-time activity. Furthermore it appears that children used to teacher-initiated activities tend to depend on their teacher’s directions for how and when to play. The study highlights the level of choice as one of the important aspects to consider when introducing a game in the classroom. The study also points out some suggestions for the design of educational games, such as providing communication possibilities between players and integrating fast-paced motor-skill based games with learning content in a meaningful way.  相似文献   

5.
The game of Tantrix™ provides a challenging, mathematical and graphic domain for evolutionary computation. The simple task of forming long loops of colored arcs quickly becomes a search nightmare for humans and computers alike as the number of game pieces scales linearly. This paper introduces Tantrix-GA, a genetic algorithm that solves several types and sizes of Tantrix puzzles but still falls well short of (at least a few) human Tantrix experts. By introducing this problem to evolutionary computation researchers, we hope to motivate an evolutionary attack on the holy-grail Tantrix puzzles, one of which has yet to be solved by any intelligence, real or artificial.  相似文献   

6.
7.
An over-zealous machine learner can automatically generate large, intricate, theories which can be hard to understand. However, such intricate learning is not necessary in domains that lack complex relationships. A much simpler learner can suffice in domains with narrow funnels; i.e. where most domain variables are controlled by a very small subset. Such a learner is TAR2: a weighted-class minimal contrast-set association rule learner that utilizes confidence-based pruning, but not support-based pruning. TAR2 learns treatments; i.e. constraints that can change an agent’s environment. Treatments take two forms. Controller treatments hold the smallest number of conjunctions that most improve the current state of the system. Monitor treatments hold the smallest number of conjunctions that best detect future faulty system behavior. Such treatments tell an agent what to do (apply the controller) and what to watch for (the monitor conditions) within the current environment. Because TAR2 generates very small theories, our experience has been that users prefer its tiny treatments. The success of such a simple learner suggests that many domains lack complex relationships.  相似文献   

8.
Neri  Filippo 《Machine Learning》2000,38(1-2):181-211
The goal of the reported research is the development of a computational approach that could help a cognitive scientist to interactively represent a learner's mental models, and to automatically validate their coherence with respect to the available experimental data. In a reported case-study, the student's mental models are inferred from questionnaires and interviews collected during a sequence of teaching sessions. These putative cognitive models are based on a theory of knowledge representation, derived from psychological results and educational studies, which accounts for the evolution of the student's knowledge over a learning period. The learning system WHY, able to handle (causal) domain knowledge, shows how to model the answers and the causal explanations given by the learner.  相似文献   

9.
Proof planning is a technique for theorem proving which replaces the ultra-efficient but blind search of classical theorem proving systems by an informed knowledge-based planning process that employs mathematical knowledge at a human-oriented level of abstraction. Standard proof planning uses methods as operators and control rules to find an abstract proof plan which can be expanded (using tactics) down to the level of the underlying logic calculus.In this paper, we propose more flexible refinements and a modification of the proof planner with an additional strategic level of control above the previous proof planning control. This strategic control guides the cooperation of the problem solving strategies by meta-reasoning.We present a general framework for proof planning with multiple strategies and describe its implementation in the Multi system. The benefits are illustrated by several large case studies, which significantly push the limits of what can be achieved by a machine today.  相似文献   

10.
We consider the learning problem faced by two self-interested agents repeatedly playing a general-sum stage game. We assume that the players can observe each other’s actions but not the payoffs received by the other player. The concept of Nash Equilibrium in repeated games provides an individually rational solution for playing such games and can be achieved by playing the Nash Equilibrium strategy for the single-shot game in every iteration. Such a strategy, however can sometimes lead to a Pareto-Dominated outcome for games like Prisoner’s Dilemma. So we prefer learning strategies that converge to a Pareto-Optimal outcome that also produces a Nash Equilibrium payoff for repeated two-player, n-action general-sum games. The Folk Theorem enable us to identify such outcomes. In this paper, we introduce the Conditional Joint Action Learner (CJAL) which learns the conditional probability of an action taken by the opponent given its own actions and uses it to decide its next course of action. We empirically show that under self-play and if the payoff structure of the Prisoner’s Dilemma game satisfies certain conditions, a CJAL learner, using a random exploration strategy followed by a completely greedy exploitation technique, will learn to converge to a Pareto-Optimal solution. We also show that such learning will generate Pareto-Optimal payoffs in a large majority of other two-player general sum games. We compare the performance of CJAL with that of existing algorithms such as WOLF-PHC and JAL on all structurally distinct two-player conflict games with ordinal payoffs.  相似文献   

11.
An active learner has a collection of data points, each with a label that is initially hidden but can be obtained at some cost. Without spending too much, it wishes to find a classifier that will accurately map points to labels. There are two common intuitions about how this learning process should be organized: (i) by choosing query points that shrink the space of candidate classifiers as rapidly as possible; and (ii) by exploiting natural clusters in the (unlabeled) data set. Recent research has yielded learning algorithms for both paradigms that are efficient, work with generic hypothesis classes, and have rigorously characterized labeling requirements. Here we survey these advances by focusing on two representative algorithms and discussing their mathematical properties and empirical performance.  相似文献   

12.
Spatial ability is a critical skill in geometric learning. Several studies investigate how to use digital games to improve spatial abilities. However, not every learner favors this kind of support. To this end, there is a need to examine how human factors affect learners’ reactions to the use of a digital game to support geometric learning. In this vein, this paper addresses this issue by developing a digital pentominoes game and examining the effects of two essential human factors, especially gender differences and spatial abilities, on students’ performance. The results demonstrate that students’ spatial abilities were significantly improved after they took the digital pentominoes game. The results also demonstrate that the digital game can reasonably reduce the differences between boys and girls. Moreover, the major gender differences lie within mental rotation among the three types of spatial ability and also mainly exist in the low spatial ability group. Finally, the findings are applied to develop a framework that can be used to enhance the understanding of gender differences and spatial abilities within the digital pentominoes game.  相似文献   

13.
In this paper we maintain that there are benefits to extending the scope of student models to include additional information as part of the explicit student model. We illustrate our argument by describing a student model which focuses on 1. performance in the domain; 2. acquisition order of the target knowledge; 3. analogy; 4. learning strategies; 5. awareness and reflection. The first four of these issues are explicitly represented in the student model. Awareness and reflection should occur as the student model is transparent; it is used to promote learner reflection by encouraging the learner to view, and even negotiate changes to the model. Although the architecture is transferable across domains, each instantiation of the student model will necessarily be domain specific due to the importance of factors such as the relevant background knowledge for analogy, and typical progress through the target material. As an example of this approach we describe the student model of an intelligent computer assisted language learning system which was based on research findings on the above five topics in the field of second language acquisition. Throughout we address the issue of the generality of this model, with particular reference to the possibility of a similar architecture reflecting comparable issues in the domain of learning about electrical circuits.  相似文献   

14.
User modelling within tutoring systems often concentrates on the representation of the learner's status with respect to the domain, paying little attention to the user's individual characteristics in terms of capabilities and preferences. A composite learner model, incorporating both domain related data and information about personal attributes is useful in determining not only which items should be presented, but how the student may best be able to learn them. A model of users' individual characteristics has been developed using multivariate statistical techniques as a means of generating user stereotypes from empirical data. Each stereotype has an associated profile in terms of attributes which are useful for the application in which the model is used.This paper describes the development of the model of learner attributes and its use within an adaptive tutoring system. The representation of the domain related information was in this case a basic overlay model. The results of experiments using the system with two classes of students in two successive academic years are discussed. The possibilities for application of the user model in other areas and the potential effects of combining an attribute learner model of this type with more sophisticated domain models are considered.  相似文献   

15.
Kulkarni  S.R.  Mitter  S.K.  Tsitsiklis  J.N. 《Machine Learning》1993,11(1):23-35
The original and most widely studied PAC model for learning assumes a passive learner in the sense that the learner plays no role in obtaining information about the unknown concept. That is, the samples are simply drawn independently from some probability distribution. Some work has been done on studying more powerful oracles and how they affect learnability. To find bounds on the improvement in sample complexity that can be expected from using oracles, we consider active learning in the sense that the learner has complete control over the information received. Specifically, we allow the learner to ask arbitrary yes/no questions. We consider both active learning under a fixed distribution and distribution-free active learning. In the case of active learning, the underlying probability distribution is used only to measure distance between concepts. For learnability with respect to a fixed distribution, active learning does not enlarge the set of learnable concept classes, but can improve the sample complexity. For distribution-free learning, it is shown that a concept class is actively learnable iff it is finite, so that active learning is in fact less powerful than the usual passive learning model. We also consider a form of distribution-free learning in which the learner knows the distribution being used, so that distribution-free refers only to the requirement that a bound on the number of queries can be obtained uniformly over all distributions. Even with the side information of the distribution being used, a concept class is actively learnable iff it has finite VC dimension, so that active learning with the side information still does not enlarge the set of learnable concept classes.  相似文献   

16.
The Strength of Weak Learnability   总被引:136,自引:0,他引:136  
This paper addresses the problem of improving the accuracy of an hypothesis output by a learning algorithm in the distribution-free (PAC) learning model. A concept class is learnable (or strongly learnable) if, given access to a source of examples of the unknown concept, the learner with high probability is able to output an hypothesis that is correct on all but an arbitrarily small fraction of the instances. The concept class is weakly learnable if the learner can produce an hypothesis that performs only slightly better than random guessing. In this paper, it is shown that these two notions of learnability are equivalent.A method is described for converting a weak learning algorithm into one that achieves arbitrarily high accuracy. This construction may have practical applications as a tool for efficiently converting a mediocre learning algorithm into one that performs extremely well. In addition, the construction has some interesting theoretical consequences, including a set of general upper bounds on the complexity of any strong learning algorithm as a function of the allowed error .  相似文献   

17.
ABSTRACT

Adapting to learner characteristics is essential when selecting exercises for learners in an intelligent tutoring system. This paper investigates how humans adapt next exercise selection (in particular difficulty level) to learner personality, invested mental effort, and performance to inspire an adaptive exercise selection algorithm. First, the paper describes the investigations to produce validated materials for the main studies, namely the creation and validation of self-esteem personality stories, mental effort statements, and mathematical exercises with varying levels of difficulty. Next, through empirical studies, we investigate the impact on exercise selection of learner's self-esteem (low versus high self-esteem) and effort (minimal, little, moderate, much, and all possible effort). Three studies investigate this for learners who had different performances on a previous exercise: just passing, just failing, and performed well. Participants considered a fictional learner with a certain performance, self-esteem and effort, and selected the difficulty level of the next mathematical exercise. We found that self-esteem, mental effort, and performance all impacted the difficulty level of the exercises selected for learners. Finally, using the results from the studies, we propose an algorithm that selects exercises with varying difficulty levels adapted to learner characteristics.  相似文献   

18.
19.
Search procedures, such as alpha-beta and SSS1, are used to solve minimax game trees. With a notable exception of B1, most of these procedures assume the static model, i.e., the computation is done solely on the basis of static values given to terminal nodes. The first goal of this paper is to generalize these to the informed model, which permits the usage of heuristic information pertaining to nonterminal nodes, such as upper and lower bounds, and estimates, of the exact values realizable from the corresponding game positions. We provide a general framework, within which various conventional procedures including alpha-beta and SSS1 can be naturally generalized to the informed model.For the static model, it is known that SSS1 surpasses alpha-beta in the sense that it explores only a subset of the nodes which are explored by alpha-beta. The second goal of this paper is, assuming the informed model, to develop a precise characterization of the class of search procedures that surpass alpha-beta. It turns out that the class contains many search procedures other than SSS1 (even for the static model). Finally some computational comparison among these search procedures is made by solving the 4 × 4 Othello game.  相似文献   

20.
In this paper, we introduce a chess program able to adapt its game strategy to its opponent, as well as to adapt the evaluation function that guides the search process according to its playing experience. The adaptive and learning abilities have been implemented through Bayesian networks. We show how the program learns through an experiment consisting on a series of games that point out that the results improve after the learning stage.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号