首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
This paper investigates how adaptive software agents may utilize reinforcement learning algorithms such as Q-learning to make economic decisions such as setting prices in a competitive marketplace. For a single adaptive agent facing fixed-strategy opponents, ordinary Q-learning is guaranteed to find the optimal policy. However, for a population of agents each trying to adapt in the presence of other adaptive agents, the problem becomes non-stationary and history dependent, and it is not known whether any global convergence will be obtained, and if so, whether such solutions will be optimal. In this paper, we study simultaneous Q-learning by two competing seller agents in three moderately realistic economic models. This is the simplest case in which interesting multi-agent phenomena can occur, and the state space is small enough so that lookup tables can be used to represent the Q-functions. We find that, despite the lack of theoretical guarantees, simultaneous convergence to self-consistent optimal solutions is obtained in each model, at least for small values of the discount parameter. In some cases, exact or approximate convergence is also found even at large discount parameters. We show how the Q-derived policies increase profitability and damp out or eliminate cyclic price wars compared to simpler policies based on zero lookahead or short-term lookahead. In one of the models (the Shopbot model) where the sellers' profit functions are symmetric, we find that Q-learning can produce either symmetric or broken-symmetry policies, depending on the discount parameter and on initial conditions.  相似文献   

2.
Continuous-Action Q-Learning   总被引:1,自引:0,他引:1  
This paper presents a Q-learning method that works in continuous domains. Other characteristics of our approach are the use of an incremental topology preserving map (ITPM) to partition the input space, and the incorporation of bias to initialize the learning process. A unit of the ITPM represents a limited region of the input space and maps it onto the Q-values of M possible discrete actions. The resulting continuous action is an average of the discrete actions of the winning unit weighted by their Q-values. Then, TD() updates the Q-values of the discrete actions according to their contribution. Units are created incrementally and their associated Q-values are initialized by means of domain knowledge. Experimental results in robotics domains show the superiority of the proposed continuous-action Q-learning over the standard discrete-action version in terms of both asymptotic performance and speed of learning. The paper also reports a comparison of discounted-reward against average-reward Q-learning in an infinite horizon robotics task.  相似文献   

3.
Incremental Multi-Step Q-Learning   总被引:23,自引:0,他引:23  
Peng  Jing  Williams  Ronald J. 《Machine Learning》1996,22(1-3):283-290
This paper presents a novel incremental algorithm that combines Q-learning, a well-known dynamic-programming based reinforcement learning method, with the TD() return estimation process, which is typically used in actor-critic learning, another well-known dynamic-programming based reinforcement learning method. The parameter is used to distribute credit throughout sequences of actions, leading to faster learning and also helping to alleviate the non-Markovian effect of coarse state-space quantization. The resulting algorithm, Q()-learning, thus combines some of the best features of the Q-learning and actor-critic learning paradigms. The behavior of this algorithm has been demonstrated through computer simulations.  相似文献   

4.
Agent-based technology has been identified as an important approach for developing next generation manufacturing systems. One of the key techniques needed for implementing such advanced systems will be learning. This paper first discusses learning issues in agent-based manufacturing systems and reviews related approaches, then describes how to enhance the performance of an agent-based manufacturing system through learning from history (based on distributed case-based learning and reasoning) and learning from the future (through system forecasting simulation). Learning from history is used to enhance coordination capabilities by minimizing communication and processing overheads. Learning from the future is used to adjust promissory schedules through forecasting simulation, by taking into account the shop floor interactions, production and transportation time. Detailed learning and reasoning mechanisms are described and partial experimental results are presented.  相似文献   

5.
6.
Kumiko Ikuta 《AI & Society》1990,4(2):137-146
The role of craft language in the process of teaching (learning) Waza (skill) will be discussed from the perspective of human intelligence.It may be said that the ultimate goal of learning Waza in any Japanese traditional performance is not the perfect reproduction of the teaching (learning) process of Waza. In fact, a special metaphorical language (craft language) is used, which has the effect of encouraging the learner to activate his creative imagination. It is through this activity that the he learns his own habitus (Kata).It is suggested that, in considering the difference of function between natural human intelligence and artificial intelligence, attention should be paid to the imaginative activity of the learner as being an essential factor for mastering Kata.This article is a modified English version of Chapter 5 of my bookWaza kara shiru (Learning from Skill), Tokyo University Press, 1987, pp. 93–105.  相似文献   

7.
Concept learning in robotics is an extremely challenging problem: sensory data is often high dimensional, and noisy due to specularities and other irregularities. In this paper, we investigate two general strategies to speed up learning, based on spatial decomposition of the sensory representation, and simultaneous learning of multiple classes using a shared structure. We study two concept learning scenarios: a hallway navigation problem, where the robot has to induce features such as opening or wall. The second task is recycling, where the robot has to learn to recognize objects, such as a trash can. We use a common underlying function approximator in both studies in the form of a feedforward neural network, with several hundred input units and multiple output units. Despite the high degree of freedom afforded by such an approximator, we show the two strategies provide sufficient bias to achieve rapid learning. We provide detailed experimental studies on an actual mobile robot called PAVLOV to illustrate the effectiveness of this approach.  相似文献   

8.
Aizenstein  Howard  Pitt  Leonard 《Machine Learning》1995,19(3):183-208
We present two related results about the learnability of disjunctive normal form (DNF) formulas. First we show that a common approach for learning arbitrary DNF formulas requires exponential time. We then contrast this with a polynomial time algorithm for learning most (rather than all) DNF formulas. A natural approach for learning boolean functions involves greedily collecting the prime implicants of the hidden function. In a seminal paper of learning theory, Valiant demonstrated the efficacy of this approach for learning monotone DNF, and suggested this approach for learning DNF. Here we show that no algorithm using such an approach can learn DNF in polynomial time. We show this by constructing a counterexample DNF formula which would force such an algorithm to take exponential time. This counterexample seems to capture much of what makes DNF hard to learn, and thus is useful to consider when evaluating the run-time of a proposed DNF learning algorithm. This hardness result, as well as other hardness results for learning DNF, relies on the construction of particular hard-to-learn formulas, formulas that appear to be relatively rare. This raises the question of whether most DNF formulas are learnable. For certain natural definitions of most DNF formulas, we answer this question affirmatively.  相似文献   

9.
The sharpest visible divide in Internet utilisation, which has deepened in recent years, is an educational one. Especially with regard to the learning disabled, the educational digital divide requires the improvement of inclusive didactical measures to promote media competence. A major prerequisite, which as a basic architectural principle determines systems design, in this respect demands support of evolutionary learning by tutorial learning systems designed as guidance systems which accord closely with the individual pupils evolutionary process.  相似文献   

10.
Improving Generalization with Active Learning   总被引:29,自引:0,他引:29  
Cohn  David  Atlas  Les  Ladner  Richard 《Machine Learning》1994,15(2):201-221
Active learning differs from learning from examples in that the learning algorithm assumes at least some control over what part of the input domain it receives information about. In some situations, active learning is provably more powerful than learning from examples alone, giving better generalization for a fixed number of training examples.In this article, we consider the problem of learning a binary concept in the absence of noise. We describe a formalism for active concept learning calledselective sampling and show how it may be approximately implemented by a neural network. In selective sampling, a learner receives distribution information from the environment and queries an oracle on parts of the domain it considers useful. We test our implementation, called anSG-network, on three domains and observe significant improvement in generalization.A preliminary version of this article appears as Cohn et al. (1990).  相似文献   

11.
Learning to Recognize Volcanoes on Venus   总被引:1,自引:0,他引:1  
Burl  Michael C.  Asker  Lars  Smyth  Padhraic  Fayyad  Usama  Perona  Pietro  Crumpler  Larry  Aubele  Jayne 《Machine Learning》1998,30(2-3):165-194
Dramatic improvements in sensor and image acquisition technology have created a demand for automated tools that can aid in the analysis of large image databases. We describe the development of JARtool, a trainable software system that learns to recognize volcanoes in a large data set of Venusian imagery. A machine learning approach is used because it is much easier for geologists to identify examples of volcanoes in the imagery than it is to specify domain knowledge as a set of pixel-level constraints. This approach can also provide portability to other domains without the need for explicit reprogramming; the user simply supplies the system with a new set of training examples. We show how the development of such a system requires a completely different set of skills than are required for applying machine learning to toy world domains. This paper discusses important aspects of the application process not commonly encountered in the toy world, including obtaining labeled training data, the difficulties of working with pixel data, and the automatic extraction of higher-level features.  相似文献   

12.
Auer  Peter  Long  Philip M.  Maass  Wolfgang  Woeginger  Gerhard J. 《Machine Learning》1995,18(2-3):187-230
The majority of results in computational learning theory are concerned with concept learning, i.e. with the special case of function learning for classes of functions with range {0, 1}. Much less is known about the theory of learning functions with a larger range such as or . In particular relatively few results exist about the general structure of common models for function learning, and there are only very few nontrivial function classes for which positive learning results have been exhibited in any of these models.We introduce in this paper the notion of a binary branching adversary tree for function learning, which allows us to give a somewhat surprising equivalent characterization of the optimal learning cost for learning a class of real-valued functions (in terms of a max-min definition which does not involve any learning model).Another general structural result of this paper relates the cost for learning a union of function classes to the learning costs for the individual function classes.Furthermore, we exhibit an efficient learning algorithm for learning convex piecewise linear functions from d into . Previously, the class of linear functions from d into was the only class of functions with multidimensional domain that was known to be learnable within the rigorous framework of a formal model for online learning.Finally we give a sufficient condition for an arbitrary class of functions from into that allows us to learn the class of all functions that can be written as the pointwise maximum ofk functions from . This allows us to exhibit a number of further nontrivial classes of functions from into for which there exist efficient learning algorithms.  相似文献   

13.
The mobile communication revolution has led to pervasive connectedness—as evidenced by the explosive growth of instant messaging in the home, and more recently, the enterprise–and, together with the convergence of mobile computing, provides a basis for extending collaborative environments toward truly ubiquitous immersion. Leveraging the true anytime/anywhere access afforded by mobile computing, it becomes possible to develop applications that not only are capable of responding to users whenever/wherever, on demand, but that also may actively seek out and engage users when the need arises. Thus, immersive environments need no longer be thought of strictly in terms of physical immersion with clearly discernable enter and exit events, but rather they may be extended, through mobile-enabled computing, toward ubiquity in terms of both time and space. Based on Media Synchronicity Theory, potential benefits are envisioned, particularly in the case of collaborative learning environments, from shortened response cycles and increased real time interaction opportunities. At the same time, a number of challenging issues must be addressed in designing such an environment to ensure user acceptance and to maximize realization of the potential. Third Generation (3G) Threaded Discussion has been conceptualized as an environment, well suited to mobile learning (m-learning) that could leverage mobile-enabled ubiquity to achieve a degree of extended immersion and thereby accrue the associated collaboration benefits. Exploring this conceptualization serves to help surface both the opportunities and the challenges associated with such environments and to identify promising design approaches, such as the use of intelligent agents.This revised version was published online in March 2005 with corrections to the cover date  相似文献   

14.
In recent years, a new term has arisen—cybercrime—which essentially denotes the use of computer technology to commit or to facilitate the commission of unlawful acts, or crimes. This article explains why we treat cybercrime as a special class of crime and why we need special statutes to define cybercrime offenses. It explains the relationship between state and federal law, notes the various types of cybercrimes and surveys the offenses that are created by state and federal law in the United States.  相似文献   

15.
Many teachers adopt networked collaborative learning strategies even though these approaches systematically increase the time needed to deal with a given subject. But who's making them do it?. Probably there has to be a return on investment, in terms of time and obviously in terms of educational results, which justifies that commitment. After surveying the particular features of two experimental projects based on networked collaborative learning, the paper will then offer a series of thoughts triggered by observation of the results and the dynamics generated by this specific approach. The purpose of these thoughts is to identify some key factors that make it possible to measure the real added value produced by network collaboration in terms of the acquisition of skills, knowledge, methods and attitudes that go beyond the mere learning of contents (however fundamental this may be). And it is precisely on the basis of these considerations that teachers usually answer the above question, explaining who (or what) made them do it!.  相似文献   

16.
Maass  Wolfgang  Turán  György 《Machine Learning》1994,14(3):251-269
The complexity of on-line learning is investigated for the basic classes of geometrical objects over a discrete (digitized) domain. In particular, upper and lower bounds are derived for the complexity of learning algorithms for axis-parallel rectangles, rectangles in general position, balls, halfspaces, intersections of half-spaces, and semi-algebraic sets. The learning model considered is the standard model for on-line learning from counterexamples.  相似文献   

17.
This paper presents a detailed study of Eurotra Machine Translation engines, namely the mainstream Eurotra software known as the E-Framework, and two unofficial spin-offs – the C,A,T and Relaxed Compositionality translator notations – with regard to how these systems handle hard cases, and in particular their ability to handle combinations of such problems. In the C,A,T translator notation, some cases of complex transfer are wild, meaning roughly that they interact badly when presented with other complex cases in the same sentence. The effect of this is that each combination of a wild case and another complex case needs ad hoc treatment. The E-Framework is the same as the C,A,T notation in this respect. In general, the E-Framework is equivalent to the C,A,T notation for the task of transfer. The Relaxed Compositionality translator notation is able to handle each wild case (bar one exception) with a single rule even where it appears in the same sentence as other complex cases.  相似文献   

18.
Dr. T. Ström 《Computing》1972,10(1-2):1-7
It is a commonly occurring problem to find good norms · or logarithmic norms (·) for a given matrix in the sense that they should be close to respectively the spectral radius (A) and the spectral abscissa (A). Examples may be the certification thatA is convergent, i.e. (A)A<1 or stable, i.e. (A)(A)<0. Often the ordinary norms do not suffice and one would like to try simple modifications of them such as using an ordinary norm for a diagonally transformed matrix. This paper treats this problem for some of the ordinary norms.
Minimisierung von Normen und Logarithmischen Normen durch Diagonale Transformationen
Zusammenfassung Ein oft vorkommendes praktisches Problem ist die Konstruktion von guten Normen · und logarithmischen Normen (·) für eine gegebene MatrixA. Mit gut wird dann verstanden, daß A den Spektralradius (A)=max |1| und (A) die Spektralabszisse (A)=max Re i gut approximieren. Beispiele findet man für konvergente Matrizen wo (A)A<1 gewünscht ist, und für stabile Matrizen wo (A)(A)<0 zu zeigen ist. Wir untersuchen hier, wie weit man mit Diagonaltransformationen und dengewöhnlichsten Normen kommen kann.
  相似文献   

19.
We propose a method for constructing regression trees with range and region splitting. We present an efficient algorithm for computing the optimal two-dimensional region that minimizes the mean squared error of an objective numeric attribute in a given database. As two-dimensional regions, we consider a class R of grid-regions, such as x-monotone, rectilinear-convex, and rectangular, in the plane associated with two numeric attributes. We compute the optimal region R. We propose to use a test that splits data into those that lie inside the region R and those that lie outside the region in the construction of regression trees. Experiments confirm that the use of region splitting gives compact and accurate regression trees in many domains.  相似文献   

20.
We analyze four nce Memed novels of Yaar Kemal using six style markers: most frequent words, syllable counts, word type – or part of speech – information, sentence length in terms of words, word length in text, and word length in vocabulary. For analysis we divide each novel into five thousand word text blocks and count the frequencies of each style marker in these blocks. The style markers showing the best separation are most frequent words and sentence lengths. We use stepwise discriminant analysis to determine the best discriminators of each style marker. We then use these markers in cross validation based discriminant analysis. Further investigation based on multiple analysis of variance (MANOVA) reveals how the attributes of each style marker group distinguish among the volumes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号