期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Relations between Gold-style learning and query learning

Steffen Lange Sandra Zilles 《Information and Computation》2005,203(2):2562

Different formal learning models address different aspects of human learning. Below we compare Gold-style learning—modelling learning as a limiting process in which the learner may change its mind arbitrarily often before converging to a correct hypothesis—to learning via queries—modelling learning as a one-shot process in which the learner is required to identify the target concept with just one hypothesis. In the Gold-style model considered below, the information presented to the learner consists of positive examples for the target concept, whereas in query learning, the learner may pose a certain kind of queries about the target concept, which will be answered correctly by an oracle (called teacher). Although these two approaches seem rather unrelated at first glance, we provide characterisations of different models of Gold-style learning (learning in the limit, conservative inference, and behaviourally correct learning) in terms of query learning. Thus we describe the circumstances which are necessary to replace limit learners by equally powerful one-shot learners. Our results are valid in the general context of learning indexable classes of recursive languages. This analysis leads to an important observation, namely that there is a natural query learning type hierarchically in-between Gold-style learning in the limit and behaviourally correct learning. Astonishingly, this query learning type can then again be characterised in terms of Gold-style inference. 相似文献

2.

The Power of Self-Directed Learning

Goldman Sally A. Sloan Robert H. 《Machine Learning》1994,14(3):271-294

This article studies self-directed learning, a variant of the on-line (or incremental) learning model in which the learner selects the presentation order for the instances. Alternatively, one can view this model as a variation of learning with membership queries in which the learner is only charged for membership queries for which it could not predict the outcome. We give tight bounds on the complexity of self-directed learning for the concept classes of monomials, monotone DNF formulas, and axis-parallel rectangles in {0, 1, , n – 1}^d. These results demonstrate that the number of mistakes under self-directed learning can be surprisingly small. We then show that learning complexity in the model of self-directed learning is less than that of all other commonly studied on-line and query learning models. Next we explore the relationship between the complexity of self-directed learning and the Vapnik-Chervonenkis (VC-)dimension. We show that, in general, the VC-dimension and the self-directed learning complexity are incomparable. However, for some special cases, we show that the VC-dimension gives a lower bound for the self-directed learning complexity. Finally, we explore a relationship between Mitchell's version space algorithm and the existence of self-directed learning algorithms that make few mistakes. 相似文献

3.

Maximizing Agreements with One-Sided Error with Applications to Heuristic Learning

Nader H. Bshouty Lynn Burroughs 《Machine Learning》2005,59(1-2):99-123

We study heuristic learnability of classes of Boolean formulas, a model proposed by Pitt and Valiant. In this type of example-based learning of a concept class C by a hypothesis class H, the learner seeks a hypothesis h H that agrees with all of the negative (resp. positive) examples, and a maximum number of positive (resp. negative) examples. This learning is equivalent to the problem of maximizing agreement with a training sample, with the constraint that the misclassifications be limited to examples with positive (resp. negative) labels. Several recent papers have studied the more general problem of maximizing agreements without this one-sided error constraint. We show that for many classes (though not all), the maximum agreement problem with one-sided error is more difficult than the general maximum agreement problem. We then provide lower bounds on the approximability of these one-sided error problems, for many concept classes, including Halfspaces, Decision Lists, XOR, k-term DNF, and neural nets.Editor Philip M. LongThis research was supported by the fund for promotion of research at the Technion. Research no. 120-025. 相似文献

4.

Improving Generalization with Active Learning 总被引：29，自引：0，他引：29

Cohn David Atlas Les Ladner Richard 《Machine Learning》1994,15(2):201-221

Active learning differs from learning from examples in that the learning algorithm assumes at least some control over what part of the input domain it receives information about. In some situations, active learning is provably more powerful than learning from examples alone, giving better generalization for a fixed number of training examples.In this article, we consider the problem of learning a binary concept in the absence of noise. We describe a formalism for active concept learning calledselective sampling and show how it may be approximately implemented by a neural network. In selective sampling, a learner receives distribution information from the environment and queries an oracle on parts of the domain it considers useful. We test our implementation, called anSG-network, on three domains and observe significant improvement in generalization.A preliminary version of this article appears as Cohn et al. (1990). 相似文献

5.

A study on effectiveness of extreme learning machine 总被引：7，自引：0，他引：7

Yuguang WangAuthor VitaeFeilong CaoAuthor Vitae Yubo YuanAuthor Vitae 《Neurocomputing》2011,74(16):2483-2490

Extreme learning machine (ELM), proposed by Huang et al., has been shown a promising learning algorithm for single-hidden layer feedforward neural networks (SLFNs). Nevertheless, because of the random choice of input weights and biases, the ELM algorithm sometimes makes the hidden layer output matrix H of SLFN not full column rank, which lowers the effectiveness of ELM. This paper discusses the effectiveness of ELM and proposes an improved algorithm called EELM that makes a proper selection of the input weights and bias before calculating the output weights, which ensures the full column rank of H in theory. This improves to some extend the learning rate (testing accuracy, prediction accuracy, learning time) and the robustness property of the networks. The experimental results based on both the benchmark function approximation and real-world problems including classification and regression applications show the good performances of EELM. 相似文献

6.

Can Finite Samples Detect Singularities of Real-Valued Functions?

S. Ben-David 《Algorithmica》1998,22(1-2):3-17

Commonly, in learning theory, the task of the learner is to identify an unknown target object. We consider a variant of this basic task in which the learner is required only to decide whether the unknown target has a certain property. We allow an infinite learning process in which the learner is required to eventually arrive at the correct answer. We say that a problem for which such a learning algorithm exists is Decidable In the Limit (DIL). We analyze the class of DIL problems and provide a necessary and sufficient condition for the membership of a decision problem in this class. We offer an algorithm for any DIL problem, and apply it to several types of learning tasks. We introduce an extension of the usual Inductive Inference learning model—Inductive Inference with a Cheating Teacher. In this model the teacher may choose to present to the learner, not only a language belonging to the agreed-upon family of languages, but also an arbitrary language outside this family. In such a case we require that the learner will be able to eventually detect the faulty choice made by the teacher. We show that such a strong type of learning is possible, and there exist learning algorithms that will fail only on arbitrarily small sets of faulty languages. Furthermore, if an a priori probability distribution P , according to which f is being chosen, is available to the algorithm, then it can be strengthened into a finite algorithm. More precisely, for many distributions P , there exists a polynomial function, l , such that, for every 0 < δ < 1 , there is an algorithm using at most l(log(δ)) many probes that succeeds on more than (1-δ) of the f 's (as measured by P ). We believe that the new approach presented here will be found useful for many further applications. Received February 14, 1997; revised July 6, 1997, and July 18, 1997. 相似文献

7.

Tracking Drifting Concepts By Minimizing Disagreements 总被引：3，自引：0，他引：3

Helmbold David P. Long Philip M. 《Machine Learning》1994,14(1):27-45

In this paper we consider the problem of tracking a subset of a domain (called thetarget) which changes gradually over time. A single (unknown) probability distribution over the domain is used to generate random examples for the learning algorithm and measure the speed at which the target changes. Clearly, the more rapidly the target moves, the harder it is for the algorithm to maintain a good approximation of the target. Therefore we evaluate algorithms based on how much movement of the target can be tolerated between examples while predicting with accuracy . Furthermore, the complexity of the classH of possible targets, as measured byd, its VC-dimension, also effects the difficulty of tracking the target concept. We show that if the problem of minimizing the number of disagreements with a sample from among concepts in a classH can be approximated to within a factork, then there is a simple tracking algorithm forH which can achieve a probability of making a mistake if the target movement rate is at most a constant times ²/(k(d +k) ln 1/), whered is the Vapnik-Chervonenkis dimension ofH. Also, we show that ifH is properly PAC-learnable, then there is an efficient (randomized) algorithm that with high probability approximately minimizes disagreements to within a factor of 7d + 1, yielding an efficient tracking algorithm forH which tolerates drift rates up to a constant times ²/(d ² ln 1/). In addition, we prove complementary results for the classes of halfspaces and axisaligned hyperrectangles showing that the maximum rate of drift that any algorithm (even with unlimited computational power) can tolerate is a constant times ²/d. 相似文献

8.

Structural Results About On-line Learning Models With and Without Queries

Auer Peter Long Philip M. 《Machine Learning》1999,36(3):147-181

We solve an open problem of Maass and Turán, showing that the optimal mistake-bound when learning a given concept class without membership queries is within a constant factor of the optimal number of mistakes plus membership queries required by an algorithm that can ask membership queries. Previously known results imply that the constant factor in our bound is best possible.We then show that, in a natural generalization of the mistake-bound model, the usefulness to the learner of arbitrary yes-no questions between trials is very limited. We show that several natural structural questions about relatives of the mistake-bound model can be answered through the application of this general result. Most of these results can be interpreted as saying that learning in apparently less powerful (and more realistic) models is not much more difficult than learning in more powerful models. 相似文献

9.

Empirical Learning as a Function of Concept Character

Larry Rendell Howard Cho 《Machine Learning》1990,5(3):267-298

Concept learning depends on data character. To discover how, some researchers have used theoretical analysis to relate the behavior of idealized learning algorithms to classes of concepts. Others have developed pragmatic measures that relate the behavior of empirical systems such as ID3 and PLS1 to the kinds of concepts encountered in practice. But before learning behavior can be predicted, concepts and data must be characterized. Data characteristics include their number, error, size, and so forth. Although potential characteristics are numerous, they are constrained by the way one views concepts. Viewing concepts asfunctions over instance space leads to geometric characteristics such as concept size (the proportion of positive instances) and concentration (not too many peaks). Experiments show that some of these characteristics drastically affect the accuracy of concept learning. Sometimes data characteristics interact in non-intuitive ways; for example, noisy data may degrade accuracy differently depending on the size of the concept. Compared with effects of some data characteristics, the choice of learning algorithm appears less important: performance accuracy is degraded only slightly when the splitting criterion is replaced with random selection. Analyzing such observations suggests directions for concept learning research. 相似文献

10.

Inferring Finite Automata with Stochastic Output Functions and an Application to Map Learning

Dean Thomas Angluin Dana Basye Kenneth Engelson Sean Kaelbling Leslie Kokkevis Evangelos Maron Oded 《Machine Learning》1995,18(1):81-108

It is often useful for a robot to construct a spatial representation of its environment from experiments and observations, in other words, to learn a map of its environment by exploration. In addition, robots, like people, make occasional errors in perceiving the spatial features of their environments. We formulate map learning as the problem of inferring from noisy observations the structure of a reduced deterministic finite automaton. We assume that the automaton to be learned has a distinguishing sequence. Observation noise is modeled by treating the observed output at each state as a random variable, where each visit to the state is an independent trial and the correct output is observed with probability exceeding 1/2. We assume no errors in the state transition function.Using this framework, we provide an exploration algorithm to learn the correct structure of such an automaton with probability 1 – , given as inputs , an upper bound m on the number of states, a distinguishing sequence s, and a lower bound > 1/2 on the probability of observing the correct output at any state. The running time and the number of basic actions executed by the learning algorithm are bounded by a polynomial in ^–l, m, |s|, and (1/2-)^–1.We discuss the assumption that a distinguishing sequence is given, and present a method of using a weaker assumption. We also present and discuss simulation results for the algorithm learning several automata derived from office environments. 相似文献