首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 31 毫秒
Trial and error     
A pac-learning algorithm isd-space bounded, if it stores at mostd examples from the sample at any time. We characterize thed-space learnable concept classes. For this purpose we introduce the compression parameter of a concept classb and design our Trial and Error Learning Algorithm. We show: b isd-space learnable if and only if the compression parameter ofb is at mostd. This learning algorithm does not produce a hypothesis consistent with the whole sample as previous approaches e.g. by Floyd, who presents consistent space bounded learning algorithms, but has to restrict herself to very special concept classes. On the other hand our algorithm needs large samples; the compression parameter appears as exponent in the sample size. We present several examples of polynomial time space bounded learnable concept classes:
  • - all intersection closed concept classes with finite VC-dimension.
  • - convexn-gons in ?2.
  • - halfspaces in ?n.
  • - unions of triangles in ?2.
  • We further relate the compression parameter to the VC-dimension, and discuss variants of this parameter.  相似文献   

    Learning Conjunctive Concepts in Structural Domains   总被引:6,自引:6,他引:0  
    We study the problem of learning conjunctive concepts from examples on structural domains like the blocks world. This class of concepts is formally defined, and it is shown that even for samples in which each example (positive or negative) is a two-object scene, it is NP-complete to determine if there is any concept in this class that is consistent with the sample. We demonstrate how this result affects the feasibility of Mitchell's version of space approach and how it shows that it is unlikely that this class of concepts is polynomially learnable from random examples alone in the PAC framework of Valiant. On the other hand, we show that for any fixed bound on the number of objects per scene, this class is polynomially learnable if, in addition to providing random examples, we allow the learning algorithm to make subset queries. In establishing this result, we calculate the capacity of the hypothesis space of conjunctive concepts in a structural domain and use a general theorem of Vapnik and Chervonenkis. This latter result can also be used to estimate a sample size sufficient for heuristic learning techniques that do not use queries.  相似文献   

    In many machine learning settings, labeled examples are difficult to collect while unlabeled data are abundant. Also, for some binary classification problems, positive examples which are elements of the target concept are available. Can these additional data be used to improve accuracy of supervised learning algorithms? We investigate in this paper the design of learning algorithms from positive and unlabeled data only. Many machine learning and data mining algorithms, such as decision tree induction algorithms and naive Bayes algorithms, use examples only to evaluate statistical queries (SQ-like algorithms). Kearns designed the statistical query learning model in order to describe these algorithms. Here, we design an algorithm scheme which transforms any SQ-like algorithm into an algorithm based on positive statistical queries (estimate for probabilities over the set of positive instances) and instance statistical queries (estimate for probabilities over the instance space). We prove that any class learnable in the statistical query learning model is learnable from positive statistical queries and instance statistical queries only if a lower bound on the weight of any target concept f can be estimated in polynomial time. Then, we design a decision tree induction algorithm POSC4.5, based on C4.5, that uses only positive and unlabeled examples and we give experimental results for this algorithm. In the case of imbalanced classes in the sense that one of the two classes (say the positive class) is heavily underrepresented compared to the other class, the learning problem remains open. This problem is challenging because it is encountered in many real-world applications.  相似文献   

    A Note on Learning from Multiple-Instance Examples   总被引:7,自引:0,他引:7  
    Blum  Avrim  Kalai  Adam 《Machine Learning》1998,30(1):23-29
    We describe a simple reduction from the problem of PAC-learning from multiple-instance examples to that of PAC-learning with one-sided random classification noise. Thus, all concept classes learnable with one-sided noise, which includes all concepts learnable in the usual 2-sided random noise model plus others such as the parity function, are learnable from multiple-instance examples. We also describe a more efficient (and somewhat technically more involved) reduction to the Statistical-Query model that results in a polynomial-time algorithm for learning axis-parallel rectangles with sample complexity Õ(d2r/2) , saving roughly a factor of r over the results of Auer et al. (1997).  相似文献   

    Sample Compression,Learnability, and the Vapnik-Chervonenkis Dimension   总被引:2,自引:0,他引:2  
    Floyd  Sally  Warmuth  Manfred 《Machine Learning》1995,21(3):269-304
    Within the framework of pac-learning, we explore the learnability of concepts from samples using the paradigm of sample compression schemes. A sample compression scheme of size k for a concept class C 2 X consists of a compression function and a reconstruction function. The compression function receives a finite sample set consistent with some concept in C and chooses a subset of k examples as the compression set. The reconstruction function forms a hypothesis on X from a compression set of k examples. For any sample set of a concept in C the compression set produced by the compression function must lead to a hypothesis consistent with the whole original sample set when it is fed to the reconstruction function. We demonstrate that the existence of a sample compression scheme of fixed-size for a class C is sufficient to ensure that the class C is pac-learnable.Previous work has shown that a class is pac-learnable if and only if the Vapnik-Chervonenkis (VC) dimension of the class is finite. In the second half of this paper we explore the relationship between sample compression schemes and the VC dimension. We define maximum and maximal classes of VC dimension d. For every maximum class of VC dimension d, there is a sample compression scheme of size d, and for sufficiently-large maximum classes there is no sample compression scheme of size less than d. We discuss briefly classes of VC dimension d that are maximal but not maximum. It is an open question whether every class of VC dimension d has a sample compression scheme of size O(d).  相似文献   

    This paper proposes the use of constructive ordinals as mistake bounds in the on-line learning model. This approach elegantly generalizes the applicability of the on-line mistake bound model to learnability analysis of very expressive concept classes like pattern languages, unions of pattern languages, elementary formal systems, and minimal models of logic programs. The main result in the paper shows that the topological property of effective finite bounded thickness is a sufficient condition for on-line learnability with a certain ordinal mistake bound. An interesting characterization of the on-line learning model is shown in terms of the identification in the limit framework. It is established that the classes of languages learnable in the on-line model with a mistake bound of α are exactly the same as the classes of languages learnable in the limit from both positive and negative data by a Popperian, consistent learner with a mind change bound of α. This result nicely builds a bridge between the two models.  相似文献   

    Kulkarni  S.R.  Mitter  S.K.  Tsitsiklis  J.N. 《Machine Learning》1993,11(1):23-35
    The original and most widely studied PAC model for learning assumes a passive learner in the sense that the learner plays no role in obtaining information about the unknown concept. That is, the samples are simply drawn independently from some probability distribution. Some work has been done on studying more powerful oracles and how they affect learnability. To find bounds on the improvement in sample complexity that can be expected from using oracles, we consider active learning in the sense that the learner has complete control over the information received. Specifically, we allow the learner to ask arbitrary yes/no questions. We consider both active learning under a fixed distribution and distribution-free active learning. In the case of active learning, the underlying probability distribution is used only to measure distance between concepts. For learnability with respect to a fixed distribution, active learning does not enlarge the set of learnable concept classes, but can improve the sample complexity. For distribution-free learning, it is shown that a concept class is actively learnable iff it is finite, so that active learning is in fact less powerful than the usual passive learning model. We also consider a form of distribution-free learning in which the learner knows the distribution being used, so that distribution-free refers only to the requirement that a bound on the number of queries can be obtained uniformly over all distributions. Even with the side information of the distribution being used, a concept class is actively learnable iff it has finite VC dimension, so that active learning with the side information still does not enlarge the set of learnable concept classes.  相似文献   

    This paper is concerned with a sufficient condition under which a concept class is learnable in Gold’s classical model of identification in the limit from positive data. The standard principle of learning algorithms working under this model is called the MINL strategy, which is to conjecture a hypothesis representing a minimal concept among the ones consistent with the given positive data. The minimality of a concept is defined with respect to the set-inclusion relation – the strategy is semantics-based. On the other hand, refinement operators have been developed in the field of learning logic programs, where a learner constructs logic programs as hypotheses consistent with given logical formulae. Refinement operators have syntax-based definitions – they are defined based on inference rules in first-order logic. This paper investigates the relation between the MINL strategy and refinement operators in inductive inference. We first show that if a hypothesis space admits a refinement operator with certain properties, the concept class will be learnable by an algorithm based on the MINL strategy. We then present an additional condition that ensures the learnability of the class of unbounded finite unions of concepts. Furthermore, we show that under certain assumptions a learning algorithm runs in polynomial time.  相似文献   

    We study the problem of PAC-learning Boolean functions with random attribute noise under the uniform distribution. We define a noisy distance measure for function classes and show that if this measure is small for a class and an attribute noise distribution D then is not learnable with respect to the uniform distribution in the presence of noise generated according to D. The noisy distance measure is then characterized in terms of Fourier properties of the function class. We use this characterization to show that the class of all parity functions is not learnable for any but very concentrated noise distributions D. On the other hand, we show that if is learnable with respect to uniform using a standard Fourier-based learning technique, then is learnable with time and sample complexity also determined by the noisy distance. In fact, we show that this style algorithm is nearly the best possible for learning in the presence of attribute noise. As an application of our results, we show how to extend such an algorithm for learning AC0 so that it handles certain types of attribute noise with relatively little impact on the running time.  相似文献   

    Valiant (1984) and others have studied the problem of learning various classes of Boolean functions from examples. Here we discuss incremental learning of these functions. We consider a setting in which the learner responds to each example according to a current hypothesis. Then the learner updates the hypothesis, if necessary, based on the correct classification of the example. One natural measure of the quality of learning in this setting is the number of mistakes the learner makes. For suitable classes of functions, learning algorithms are available that make a bounded number of mistakes, with the bound independent of the number of examples seen by the learner. We present one such algorithm that learns disjunctive Boolean functions, along with variants for learning other classes of Boolean functions. The basic method can be expressed as a linear-threshold algorithm. A primary advantage of this algorithm is that the number of mistakes grows only logarithmically with the number of irrelevant attributes in the examples. At the same time, the algorithm is computationally efficient in both time and space.  相似文献   

    A version space is a collection of concepts consistent with a given set of positive and negative examples. Mitchell [Artificial Intelligence 18 (1982) 203-226] proposed representing a version space by its boundary sets: the maximally general (G) and maximally specific consistent concepts (S). For many simple concept classes, the size of G and S is known to grow exponentially in the number of positive and negative examples. This paper argues that previous work on alternative representations of version spaces has disguised the real question underlying version space reasoning. We instead show that tractable reasoning with version spaces turns out to depend on the consistency problem, i.e., determining if there is any concept consistent with a set of positive and negative examples. Indeed, we show that tractable version space reasoning is possible if and only if there is an efficient algorithm for the consistency problem. Our observations give rise to new concept classes for which tractable version space reasoning is now possible, e.g., 1-decision lists, monotone depth two formulas, and halfspaces.  相似文献   

    Associated with each learning system there is a class of learnable behaviors. If the target behavior to be acquired is in the learnable class, it will be learned perfectly. If it is outside that class, the machine will only be able to acquire a behavior that approximates the target and it will always make errors. It is desirable for a learning machine to have a large learnable class to maximize the chances of acquiring the unknown behavior and to minimize the expected error when only an approximation is possible. However, it is also desirable to have a small learnable class so that learning can be achieved rapidly. Thus the design of learning machines involves selecting a position on the spectrum: minimum error and slow learning time versus larger error and faster learning time. A computational method is given for finding where a given learning machine is on this spectrum. Machines that have fast learning times, relatively small learnable classes, and thus relatively large expected errors are called realization sparse in this article. These machines do little better than a random coin flipping algorithm in many situations. It is shown that many common learning systems are of this type including signature tables, linear system models, and conjunctive normal form expression based systems. These studies lead to the concept of an “optimum” machine which spreads its learnable behaviors across the behavior space in a manner to minimize the expected error. an approximation to such optimum machines is presented and its behavior is compared to the more traditional learning machines. © 1994 John Wiley & Sons, Inc.  相似文献   

    In this paper, we motivate the need for estimating bounds on learning curves of average-case learning algorithms when they perform the worst on training samples. We then apply the method of reducing learning problems to hypothesis testing ones to investigate the learning curves of a so-called ill-disposed learning algorithm in terms of a system complexity, the Boolean interpolation dimension. Since the ill-disposed algorithm behaves worse than ordinal ones, and the Boolean interpolation dimension is generally bounded by the number of system weights, the results can apply to interpreting or to bounding the worst-case learning curve in real learning situations. This study leads to a new understanding of the worst-case generalization in real learning situations, which differs significantly from that in the uniform learnable setting via Vapnik-Chervonenkis (VC) dimension analysis. We illustrate the results with some numerical simulations.  相似文献   

    We consider a variant of the ‘population learning model’ proposed by Kearns and Seung [8], in which the learner is required to be ‘distribution-free’ as well as computationally efficient. A population learner receives as input hypotheses from a large population of agents and produces as output its final hypothesis. Each agent is assumed to independently obtain labeled sample for the target concept and output a hypothesis. A polynomial time population learner is said to PAC-learn a concept class, if its hypothesis is probably approximately correct whenever the population size exceeds a certain bound which is polynomial, even if the sample size for each agent is fixed at some constant. We exhibit some general population learning strategies, and some simple concept classes that can be learned by them. These strategies include the ‘supremum hypothesis finder’, the ‘minimum superset finder’ (a special case of the ‘supremum hypothesis finder’), and various voting schemes. When coupled with appropriate agent algorithms, these strategies can learn a variety of simple concept classes, such as the ‘high–low game’, conjunctions, axis-parallel rectangles and others. We give upper bounds on the required population size for each of these cases, and show that these systems can be used to obtain a speed up from the ordinary PAC-learning model [11], with appropriate choices of sample and population sizes. With the population learner restricted to be a voting scheme, what we have is effectively a model of ‘population prediction’, in which the learner is to predict the value of the target concept at an arbitrarily drawn point, as a threshold function of the predictions made by its agents on the same point. We show that the population learning model is strictly more powerful than the population prediction model. Finally, we consider a variant of this model with classification noise, and exhibit a population learner for the class of conjunctions in this model. This revised version was published online in June 2006 with corrections to the Cover Date.  相似文献   

    Goldberg  Paul W.  Jerrum  Mark R. 《Machine Learning》1995,18(2-3):131-148
    The Vapnik-Chervonenkis (V-C) dimension is an important combinatorial tool in the analysis of learning problems in the PAC framework. For polynomial learnability, we seek upper bounds on the V-C dimension that are polynomial in the syntactic complexity of concepts. Such upper bounds are automatic for discrete concept classes, but hitherto little has been known about what general conditions guarantee polynomial bounds on V-C dimension for classes in which concepts and examples are represented by tuples of real numbers. In this paper, we show that for two general kinds of concept class the V-C dimension is polynomially bounded in the number of real numbers used to define a problem instance. One is classes where the criterion for membership of an instance in a concept can be expressed as a formula (in the first-order theory of the reals) with fixed quantification depth and exponentially-bounded length, whose atomic predicates are polynomial inequalities of exponentially-bounded degree, The other is classes where containment of an instance in a concept is testable in polynomial time, assuming we may compute standard arithmetic operations on reals exactly in constant time.Our results show that in the continuous case, as in the discrete, the real barrier to efficient learning in the Occam sense is complexity-theoretic and not information-theoretic. We present examples to show how these results apply to concept classes defined by geometrical figures and neural nets, and derive polynomial bounds on the V-C dimension for these classes.  相似文献   

    High dimensionality of state representation is a major limitation for scale-up in reinforcement learning (RL). This work derives the knowledge of complexity reduction from partial solutions and provides algorithms for automated dimension reduction in RL. We propose the cascading decomposition algorithm based on the spectral analysis on a normalized graph Laplacian to decompose a problem into several subproblems and then conduct parameter relevance analysis on each subproblem to perform dynamic state abstraction. The elimination of irrelevant parameters projects the original state space into the one with lower dimension in which some subtasks are projected onto the same shared subtasks. The framework could identify irrelevant parameters based on performed action sequences and thus relieve the problem of high dimensionality in learning process. We evaluate the framework with experiments and show that the dimension reduction approach could indeed make some infeasible problem to become learnable.  相似文献   

    In a statistical setting of the classification (pattern recognition) problem the number of examples required to approximate an unknown labelling function is linear in the VC dimension of the target learning class. In this work we consider the question of whether such bounds exist if we restrict our attention to computable classification methods, assuming that the unknown labelling function is also computable. We find that in this case the number of examples required for a computable method to approximate the labelling function not only is not linear, but grows faster (in the VC dimension of the class) than any computable function. No time or space constraints are put on the predictors or target functions; the only resource we consider is the training examples. The task of classification is considered in conjunction with another learning problem - data compression. An impossibility result for the task of data compression allows us to estimate the sample complexity for pattern recognition.  相似文献   

    Since all the algebras connected to logic have, more or less explicitly, an associated order relation, it follows, by duality principle, that they have two presentations, dual to each other. We classify these dual presentations in “left” and “right” ones and we consider that, when dealing with several algebras in the same research, it is useful to present them unitarily, either as “left” algebras or as “right” algebras. In some circumstances, this choice is essential, for instance if we want to build the ordinal sum (product) between a BL algebra and an MV algebra. We have chosen the “left” presentation and several algebras of logic have been redefined as particular cases of BCK algebras. We introduce several new properties of algebras of logic, besides those usually existing in the literature, which generate a more refined classification, depending on the properties satisfied. In this work (Parts I–V) we make an exhaustive study of these algebras—with two bounds and with one bound—and we present classes of finite examples, in bounded case. In Part II, we continue to present new properties, and consequently new algebras; among them, bounded α γ algebra is a common generalization of MTL algebra and divisible bounded residuated lattice (bounded commutative Rl-monoid). We introduce and study the ordinal sum (product) of two bounded BCK algebras. Dedicated to Grigore C. Moisil (1906–1973).  相似文献   

    In this paper we consider several variants of Valiant's learnability model that have appeared in the literature. We give conditions under which these models are equivalent in terms of the polynomially learnable concept classes they define. These equivalences allow comparisons of most of the existing theorems in Valiant-style learnability and show that several simplifying assumptions on polynomial learning algorithms can be made without loss of generality. We also give a useful reduction of learning problems to the problem of finding consistent hypotheses, and give comparisons and equivalences between Valiant's model and the prediction learning models of Haussler, Littlestone, and Warmuth (in “29th Annual IEEE Symposium on Foundations of Computer Science,” 1988).  相似文献   

    设为首页 | 免责声明 | 关于勤云 | 加入收藏

    Copyright©北京勤云科技发展有限公司  京ICP备09084417号