期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

On-line learning with malicious noise and the closure algorithm

Peter Auer Nicolò Cesa-Bianchi 《Annals of Mathematics and Artificial Intelligence》1998,23(1-2):83-99

We investigate a variant of the on-line learning model for classes of \0,1\-valued functions (concepts) in which the labels of a certain amount of the input instances are corrupted by adversarial noise. We propose an extension of a general learning strategy, known as “Closure Algorithm”, to this noise model, and show a worst-case mistake bound of m + (d+1)K for learning an arbitrary intersection-closed concept class C, where K is the number of noisy labels, d is a combinatorial parameter measuring C's complexity, and m is the worst-case mistake bound of the Closure Algorithm for learning C in the noise-free model. For several concept classes our extended Closure Algorithm is efficient and can tolerate a noise rate up to the information-theoretic upper bound. Finally, we show how to efficiently turn any algorithm for the on-line noise model into a learning algorithm for the PAC model with malicious noise. This revised version was published online in June 2006 with corrections to the Cover Date. 相似文献

2.

Ordinal mind change complexity of language identification

Andris Ambainis Sanjay Jain Arun Sharma 《Theoretical computer science》1999,220(2):976-343

The approach of ordinal mind change complexity, introduced by Freivalds and Smith, uses (notations for) constructive ordinals to bound the number of mind changes made by a learning machine. This approach provides a measure of the extent to which a learning machine has to keep revising its estimate of the number of mind changes it will make before converging to a correct hypothesis for languages in the class being learned. Recently, this notion, which also yields a measure for the difficulty of learning a class of languages, has been used to analyze the learnability of rich concept classes.

The present paper further investigates the utility of ordinal mind change complexity. It is shown that for identification from both positive and negative data and n 1, the ordinal mind change complexity of the class of languages formed by unions of up to n + 1 pattern languages is only ω ×₀ notn(n) (where notn(n) is a notation for n, ω is a notation for the least limit ordinal and ×₀ represents ordinal multiplication). This result nicely extends an observation of Lange and Zeugmann that pattern languages can be identified from both positive and negative data with 0 mind changes.

Existence of an ordinal mind change bound for a class of learnable languages can be seen as an indication of its learning “tractability”. Conditions are investigated under which a class has an ordinal mind change bound for identification from positive data. It is shown that an indexed family of languages has an ordinal mind change bound if it has finite elasticity and can be identified by a conservative machine. It is also shown that the requirement of conservative identification can be sacrificed for the purely topological requirement ofM-finite thickness. Interaction between identification by monotonic strategies and existence of ordinal mind change bound is also investigated. 相似文献

3.

Mind change complexity of learning logic programs

《Theoretical computer science》2002,284(1):143-160

The present paper motivates the study of mind change complexity for learning minimal models of length-bounded logic programs. It establishes ordinal mind change complexity bounds for learnability of these classes both from positive facts and from positive and negative facts. Building on Angluin's notion of finite thickness and Wright's work on finite elasticity, Shinohara defined the property of bounded finite thickness to give a sufficient condition for learnability of indexed families of computable languages from positive data. This paper shows that an effective version of Shinohara's notion of bounded finite thickness gives sufficient conditions for learnability with ordinal mind change bound, both in the context of learnability from positive data and for learnability from complete (both positive and negative) data. Let ω be a notation for the first limit ordinal. Then, it is shown that if a language defining framework yields a uniformly decidable family of languages and has effective bounded finite thickness, then for each natural number m>0, the class of languages defined by formal systems of length ⩽m:

•is identifiable in the limit from positive data with a mind change bound of ω^m;
•is identifiable in the limit from both positive and negative data with an ordinal mind change bound of ω×m.

The above sufficient conditions are employed to give an ordinal mind change bound for learnability of minimal models of various classes of length-bounded Prolog programs, including Shapiro's linear programs, Arimura and Shinohara's depth-bounded linearly covering programs, and Krishna Rao's depth-bounded linearly moded programs. It is also noted that the bound for learning from positive data is tight for the example classes considered. 相似文献

4.

Equivalence of models for polynomial learnability

David Haussler Michael Kearns Nick LittlestoneManfred K. Warmuth 《Information and Computation》1991,95(2)

In this paper we consider several variants of Valiant's learnability model that have appeared in the literature. We give conditions under which these models are equivalent in terms of the polynomially learnable concept classes they define. These equivalences allow comparisons of most of the existing theorems in Valiant-style learnability and show that several simplifying assumptions on polynomial learning algorithms can be made without loss of generality. We also give a useful reduction of learning problems to the problem of finding consistent hypotheses, and give comparisons and equivalences between Valiant's model and the prediction learning models of Haussler, Littlestone, and Warmuth (in “29th Annual IEEE Symposium on Foundations of Computer Science,” 1988). 相似文献

5.

Stochastic Finite Learning of the Pattern Languages

Peter Rossmanith Thomas Zeugmann 《Machine Learning》2001,44(1-2):67-91

The present paper proposes a new learning model—called stochastic finite learning—and shows the whole class of pattern languages to be learnable within this model.This main result is achieved by providing a new and improved average-case analysis of the Lange–Wiehagen (New Generation Computing, 8, 361–370) algorithm learning the class of all pattern languages in the limit from positive data. The complexity measure chosen is the total learning time, i.e., the overall time taken by the algorithm until convergence. The expectation of the total learning time is carefully analyzed and exponentially shrinking tail bounds for it are established for a large class of probability distributions. For every pattern containing k different variables it is shown that Lange and Wiehagen's algorithm possesses an expected total learning time of , where and are two easily computable parameters arising naturally from the underlying probability distributions, and E[] is the expected example string length.Finally, assuming a bit of domain knowledge concerning the underlying class of probability distributions, it is shown how to convert learning in the limit into stochastic finite learning. 相似文献

6.

Generalized notions of mind change complexity

Arun Sharma Frank Stephan Yuri Ventsov 《Information and Computation》2004,189(2):235

Gold introduced the notion of learning in the limit where a class S is learnable iff there is a recursive machine M which reads the course of values of a function f and converges to a program for f whenever f is in S. An important measure for the speed of convergence in this model is the quantity of mind changes before the onset of convergence. The oldest model is to consider a constant bound on the number of mind changes M makes on any input function; such a bound is referred here as type 1. Later this was generalized to a bound of type 2 where a counter ranges over constructive ordinals and is counted down at every mind change. Although ordinal bounds permit the inference of richer concept classes than constant bounds, they still are a severe restriction. Therefore the present work introduces two more general approaches to bounding mind changes. These are based on counting by going down in a linearly ordered set (type 3) and on counting by going down in a partially ordered set (type 4). In both cases the set must not contain infinite descending recursive sequences. These four types of mind changes yield a hierarchy and there are identifiable classes that cannot be learned with the most general mind change bound of type 4. It is shown that existence of type 2 bound is equivalent to the existence of a learning algorithm which converges on every (also nonrecursive) input function and the existence of type 4 is shown to be equivalent to the existence of a learning algorithm which converges on every recursive function. A partial characterization of type 3 yields a result of independent interest in recursion theory. The interplay between mind change complexity and choice of hypothesis space is investigated. It is established that for certain concept classes, a more expressive hypothesis space can sometimes reduce mind change complexity of learning these classes. The notion of mind change bound for behaviourally correct learning is indirectly addressed by employing the above four types to restrict the number of predictive errors of commission in finite error next value learning (NV′′)—a model equivalent to behaviourally correct learning. Again, natural characterizations for type 2 and type 4 bounds are derived. Their naturalness is further illustrated by characterizing them in terms of branches of uniformly recursive families of binary trees. 相似文献

7.

Identifying Terminal Distinguishable Languages

H. Fernau 《Annals of Mathematics and Artificial Intelligence》2004,40(3-4):263-281

We discuss new efficient learning algorithms for certain subclasses of regular and even linear languages based on the notion of terminal distinguishability introduced by Radhakrishnan and Nagaraja. The learning model we use is identification in the limit from positive samples as proposed by Gold and further studied by Angluin and many others. All classes we introduce in this paper are modifications of the language families TDRL (terminal distinguishable regular) and TDELL (terminal distinguishable even linear) defined by Radhakrishnan and Nagaraja. A tradeoff between the power of the language class and the time complexity of the identification algorithm is observed when the size of the underlying alphabet is considered as an additional parameter. Extending the classes of efficiently learnable languages is also important from the viewpoint of applications of the algorithms. One of these extensions is obtained basically by making use of the concept of control language which is known from formal language theory and has been employed for learning theoretic purposes in particular by Takada. 相似文献

8.

Language Learning from Texts: Mindchanges, Limited Memory, and Monotonicity

Kinber E. Stephan F. 《Information and Computation》1995,123(2)

The paper explores language learning in the limit under various constraints on the number of mindchanges, memory, and monotonicity. We define language learning with limited (long term) memory and prove that learning with limited memory is exactly the same as learning via set driven machines (when the order of the input string is not taken into account). Further we show that every language learnable via a set driven machine is learnable via a conservative machine (making only justifiable mindchanges). We get a variety of separation results for learning with bounded number of mindchanges or limited memory under restrictions on monotonicity. A surprising result is that there are families of languages that can be monotonically learned with at most one mindchange, but can neither be weak-monotonically nor conservatively learned. Many separation results have a variant: If a criterion can be separated from , then often it is possible to find a family of languages such that is and learnable, but while it is possible to restrict the number of mindchanges or long term memory on criterion , this is impossible for . 相似文献

9.

Learning from examples with unspecified attribute values

Sally A. Goldman Stephen S. Kwek Stephen D. Scott 《Information and Computation》2003,180(2):82-100

A challenging problem within machine learning is how to make good inferences from data sets in which pieces of information are missing. While it is valuable to have algorithms that perform well for specific domains, to gain a fundamental understanding of the problem, one needs a “theory” about how to learn with incomplete data. The important contribution of such a theory is not so much the specific algorithmic results, but rather that it provides good ways of thinking about the problem formally. In this paper we introduce the unspecified attribute value (UAV) learning model as a first step towards a theoretical framework for studying the problem of learning from incomplete data in the exact learning framework.In the UAV learning model, an example x is classified positive (resp., negative) if all possible assignments for the unspecified attributes result in a positive (resp., negative) classification. Otherwise the classification given to x is “?” (for unknown). Given an example x in which some attributes are unspecified, the oracle UAV-MQ responds with the classification of x. Given a hypothesis h, the oracle UAV-EQ returns an example x (that could have unspecified attributes) for which h(x) is incorrect.We show that any class of functions learnable in Angluin’s exact model using the MQ and EQ oracles is also learnable in the UAV model using the MQ and UAV-EQ oracles as long as the counterexamples provided by the UAV-EQ oracle have a logarithmic number of unspecified attributes. We also show that any class learnable in the exact model using the MQ and EQ oracles is also learnable in the UAV model using the UAV-MQ and UAV-EQ oracles as well as an oracle to evaluate a given boolean formula on an example with unspecified attributes. (For some hypothesis classes such as decision trees and unate formulas the evaluation can be done in polynomial time without an oracle.) We also study the learnability of a universal class of decision trees under the UAV model and of DNF formulas under a representation-dependent variation of the UAV model. 相似文献

10.

Relations between Gold-style learning and query learning

Steffen Lange Sandra Zilles 《Information and Computation》2005,203(2):2562

Different formal learning models address different aspects of human learning. Below we compare Gold-style learning—modelling learning as a limiting process in which the learner may change its mind arbitrarily often before converging to a correct hypothesis—to learning via queries—modelling learning as a one-shot process in which the learner is required to identify the target concept with just one hypothesis. In the Gold-style model considered below, the information presented to the learner consists of positive examples for the target concept, whereas in query learning, the learner may pose a certain kind of queries about the target concept, which will be answered correctly by an oracle (called teacher). Although these two approaches seem rather unrelated at first glance, we provide characterisations of different models of Gold-style learning (learning in the limit, conservative inference, and behaviourally correct learning) in terms of query learning. Thus we describe the circumstances which are necessary to replace limit learners by equally powerful one-shot learners. Our results are valid in the general context of learning indexable classes of recursive languages. This analysis leads to an important observation, namely that there is a natural query learning type hierarchically in-between Gold-style learning in the limit and behaviourally correct learning. Astonishingly, this query learning type can then again be characterised in terms of Gold-style inference. 相似文献

11.

Robust learning of automatic classes of languages

Sanjay Jain Eric Martin Frank Stephan 《Journal of Computer and System Sciences》2014

One of the most important paradigms in the inductive inference literature is that of robust learning. This paper adapts and investigates the paradigm of robust learning to learning languages from positive data. Broadening the scope of that paradigm is important: robustness captures a form of invariance of learnability under admissible transformations on the object of study; hence, it is a very desirable property. The key to defining robust learning of languages is to impose that the latter be automatic, that is, recognisable by a finite automaton. The invariance property used to capture robustness can then naturally be defined in terms of first-order definable operators, called translators. For several learning criteria amongst a selection of learning criteria investigated either in the literature on explanatory learning from positive data or in the literature on query learning, we characterise the classes of languages all of whose translations are learnable under that criterion. 相似文献

12.

Formal language identification: query learning vs. Gold-style learning

Steffen Lange 《Information Processing Letters》2004,91(6):285-292

A natural approach towards powerful machine learning systems is to enable options for additional machine/user interactions, for instance by allowing the system to ask queries about the concept to be learned. This motivates the development and analysis of adequate formal learning models.In the present paper, we investigate two different types of query learning models in the context of learning indexable classes of recursive languages: Angluin's original model and a relaxation thereof, called learning with extra queries. In the original model the learner is restricted to query languages belonging to the target class, while in the new model it is allowed to query other languages, too. As usual, the following standard types of queries are considered: superset, subset, equivalence, and membership queries.The learning capabilities of the resulting query learning models are compared to one another and to different versions of Gold-style language learning from only positive data and from positive and negative data (including finite learning, conservative inference, and learning in the limit). A complete picture of the relation of all these models has been elaborated. A couple of interesting differences and similarities between query learning and Gold-style learning have been observed. In particular, query learning with extra superset queries coincides with conservative inference from only positive data. This result documents the naturalness of the new query model. 相似文献

13.

Learning Boolean Functions in an Infinite Attribute Space 总被引：2，自引：1，他引：1

Avrim Blum 《Machine Learning》1992,9(4):373-386

This paper presents a theoretical model for learning Boolean functions in domains having a large, potentially infinite number of attributes. The model allows an algorithm to employ a rich vocabulary to describe the objects it encounters in the world without necessarily incurring time and space penalties so long as each individual object is relatively simple. We show that many of the basic Boolean functions learnable in standard theoretical models, such as conjunctions, disjunctions, K-CNF, and K-DNF, are still learnable in the new model, though by algorithms no longer quite so trivial as before. The new model forces algorithms for such classes to act in a manner that appears more natural for many learning scenarios. 相似文献

14.

On Polynomial-Time Learnability in the Limit of Strictly Deterministic Automata

Yokomori Takashi 《Machine Learning》1995,19(2):153-179

This paper deals with the polynomial-time learnability of a language class in the limit from positive data, and discusses the learning problem of a subclass of deterministic finite automata (DFAs), called strictly deterministic automata (SDAs), in the framework of learning in the limit from positive data. We first discuss the difficulty of Pitt's definition in the framework of learning in the limit from positive data, by showing that any class of languages with an infinite descending chain property is not polynomial-time learnable in the limit from positive data. We then propose new definitions for polynomial-time learnability in the limit from positive data. We show in our new definitions that the class of SDAs is iteratively, consistently polynomial-time learnable in the limit from positive data. In particular, we present a learning algorithm that learns any SDA M in the limit from positive data, satisfying the properties that (i) the time for updating a conjecture is at most O(lm), (ii) the number of implicit prediction errors is at most O(ln), where l is the maximum length of all positive data provided, m is the alphabet size of M and n is the size of M, (iii) each conjecture is computed from only the previous conjecture and the current example, and (iv) at any stage the conjecture is consistent with the sample set seen so far. This is in marked contrast to the fact that the class of DFAs is neither learnable in the limit from positive data nor polynomial-time learnable in the limit. 相似文献

15.

Learning Regular Languages from Simple Positive Examples

François Denis 《Machine Learning》2001,44(1-2):37-66

Learning from positive data constitutes an important topic in Grammatical Inference since it is believed that the acquisition of grammar by children only needs syntactically correct (i.e. positive) instances. However, classical learning models provide no way to avoid the problem of overgeneralization. In order to overcome this problem, we use here a learning model from simple examples, where the notion of simplicity is defined with the help of Kolmogorov complexity. We show that a general and natural heuristic which allows learning from simple positive examples can be developed in this model. Our main result is that the class of regular languages is probably exactly learnable from simple positive examples. 相似文献

16.

Iterative learning from positive data and negative counterexamples

Sanjay Jain Efim Kinber 《Information and Computation》2007,205(12):1777-1805

A model for learning in the limit is defined where a (so-called iterative) learner gets all positive examples from the target language, tests every new conjecture with a teacher (oracle) if it is a subset of the target language (and if it is not, then it receives a negative counterexample), and uses only limited long-term memory (incorporated in conjectures). Three variants of this model are compared: when a learner receives least negative counterexamples, the ones whose size is bounded by the maximum size of input seen so far, and arbitrary ones. A surprising result is that sometimes absence of bounded counterexamples can help an iterative learner whereas arbitrary counterexamples are useless. We also compare our learnability model with other relevant models of learnability in the limit, study how our model works for indexed classes of recursive languages, and show that learners in our model can work in non-U-shaped way—never abandoning the first right conjecture. 相似文献

17.

Learning from positive and unlabeled examples

《Theoretical computer science》2005,348(1):70-83

In many machine learning settings, labeled examples are difficult to collect while unlabeled data are abundant. Also, for some binary classification problems, positive examples which are elements of the target concept are available. Can these additional data be used to improve accuracy of supervised learning algorithms? We investigate in this paper the design of learning algorithms from positive and unlabeled data only. Many machine learning and data mining algorithms, such as decision tree induction algorithms and naive Bayes algorithms, use examples only to evaluate statistical queries (SQ-like algorithms). Kearns designed the statistical query learning model in order to describe these algorithms. Here, we design an algorithm scheme which transforms any SQ-like algorithm into an algorithm based on positive statistical queries (estimate for probabilities over the set of positive instances) and instance statistical queries (estimate for probabilities over the instance space). We prove that any class learnable in the statistical query learning model is learnable from positive statistical queries and instance statistical queries only if a lower bound on the weight of any target concept f can be estimated in polynomial time. Then, we design a decision tree induction algorithm POSC4.5, based on C4.5, that uses only positive and unlabeled examples and we give experimental results for this algorithm. In the case of imbalanced classes in the sense that one of the two classes (say the positive class) is heavily underrepresented compared to the other class, the learning problem remains open. This problem is challenging because it is encountered in many real-world applications. 相似文献

18.

On the errors that learning machines will make

A. W. Biermann K. C. Gilbert A. F. Fahmy B. Koster 《国际智能系统杂志》1994,9(3):269-302

Associated with each learning system there is a class of learnable behaviors. If the target behavior to be acquired is in the learnable class, it will be learned perfectly. If it is outside that class, the machine will only be able to acquire a behavior that approximates the target and it will always make errors. It is desirable for a learning machine to have a large learnable class to maximize the chances of acquiring the unknown behavior and to minimize the expected error when only an approximation is possible. However, it is also desirable to have a small learnable class so that learning can be achieved rapidly. Thus the design of learning machines involves selecting a position on the spectrum: minimum error and slow learning time versus larger error and faster learning time. A computational method is given for finding where a given learning machine is on this spectrum. Machines that have fast learning times, relatively small learnable classes, and thus relatively large expected errors are called realization sparse in this article. These machines do little better than a random coin flipping algorithm in many situations. It is shown that many common learning systems are of this type including signature tables, linear system models, and conjunctive normal form expression based systems. These studies lead to the concept of an “optimum” machine which spreads its learnable behaviors across the behavior space in a manner to minimize the expected error. an approximation to such optimum machines is presented and its behavior is compared to the more traditional learning machines. © 1994 John Wiley & Sons, Inc. 相似文献

19.

Trial and Error

Foued Ameur Paul Fischer Klaus-U. Höffgen Friedhelm Meyer auf der Heide 《Acta Informatica》1996,33(7):621-630

A pac-learning algorithm is -space bounded, if it stores at most examples from the sample at any time. We characterize the -space learnable concept classes. For this purpose we introduce the compression parameter of a concept class and design our Trial and Error Learning Algorithm. We show : is -space learnable if and only if the compression parameter of is at most . This learning algorithm does not produce a hypothesis consistent with the whole sample as previous approaches e.g. by Floyd, who presents consistent space bounded learning algorithms, but has to restrict herself to very special concept classes. On the other hand our algorithm needs large samples; the compression parameter appears as exponent in the sample size. We present several examples of polynomial time space bounded learnable concept classes: – all intersection closed concept classes with finite VC–dimension. – convex -gons in . – halfspaces in . – unions of triangles in . We further relate the compression parameter to the VC–dimension, and discuss variants of this parameter. Received May 24, 1994 / July 4, 1995 相似文献

20.

Active Learning Using Arbitrary Binary Valued Queries

Kulkarni S.R. Mitter S.K. Tsitsiklis J.N. 《Machine Learning》1993,11(1):23-35

The original and most widely studied PAC model for learning assumes a passive learner in the sense that the learner plays no role in obtaining information about the unknown concept. That is, the samples are simply drawn independently from some probability distribution. Some work has been done on studying more powerful oracles and how they affect learnability. To find bounds on the improvement in sample complexity that can be expected from using oracles, we consider active learning in the sense that the learner has complete control over the information received. Specifically, we allow the learner to ask arbitrary yes/no questions. We consider both active learning under a fixed distribution and distribution-free active learning. In the case of active learning, the underlying probability distribution is used only to measure distance between concepts. For learnability with respect to a fixed distribution, active learning does not enlarge the set of learnable concept classes, but can improve the sample complexity. For distribution-free learning, it is shown that a concept class is actively learnable iff it is finite, so that active learning is in fact less powerful than the usual passive learning model. We also consider a form of distribution-free learning in which the learner knows the distribution being used, so that distribution-free refers only to the requirement that a bound on the number of queries can be obtained uniformly over all distributions. Even with the side information of the distribution being used, a concept class is actively learnable iff it has finite VC dimension, so that active learning with the side information still does not enlarge the set of learnable concept classes. 相似文献