期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Online Ensemble Learning: An Empirical Study

Fern Alan Givan Robert 《Machine Learning》2003,53(1-2):71-109

We study resource-limited online learning, motivated by the problem of conditional-branch outcome prediction in computer architecture. In particular, we consider (parallel) time and space-efficient ensemble learners for online settings, empirically demonstrating benefits similar to those shown previously for offline ensembles. Our learning algorithms are inspired by the previously published boosting by filtering framework as well as the offline Arc-x4 boosting-style algorithm. We train ensembles of online decision trees using a novel variant of the ID4 online decision-tree algorithm as the base learner, and show empirical results for both boosting and bagging-style online ensemble methods. Our results evaluate these methods on both our branch prediction domain and online variants of three familiar machine-learning benchmarks. Our data justifies three key claims. First, we show empirically that our extensions to ID4 significantly improve performance for single trees and additionally are critical to achieving performance gains in tree ensembles. Second, our results indicate significant improvements in predictive accuracy with ensemble size for the boosting-style algorithm. The bagging algorithms we tried showed poor performance relative to the boosting-style algorithm (but still improve upon individual base learners). Third, we show that ensembles of small trees are often able to outperform large single trees with the same number of nodes (and similarly outperform smaller ensembles of larger trees that use the same total number of nodes). This makes online boosting particularly useful in domains such as branch prediction with tight space restrictions (i.e., the available real-estate on a microprocessor chip). 相似文献

2.

Boosting Methods for Regression 总被引：6，自引：0，他引：6

Duffy Nigel Helmbold David 《Machine Learning》2002,47(2-3):153-200

In this paper we examine ensemble methods for regression that leverage or boost base regressors by iteratively calling them on modified samples. The most successful leveraging algorithm for classification is AdaBoost, an algorithm that requires only modest assumptions on the base learning method for its strong theoretical guarantees. We present several gradient descent leveraging algorithms for regression and prove AdaBoost-style bounds on their sample errors using intuitive assumptions on the base learners. We bound the complexity of the regression functions produced in order to derive PAC-style bounds on their generalization errors. Experiments validate our theoretical results. 相似文献

3.

Selective Sampling Using the Query by Committee Algorithm 总被引：23，自引：0，他引：23

Freund Yoav Seung H. Sebastian Shamir Eli Tishby Naftali 《Machine Learning》1997,28(2-3):133-168

We analyze the query by committee algorithm, a method for filtering informative queries from a random stream of inputs. We show that if the two-member committee algorithm achieves information gain with positive lower bound, then the prediction error decreases exponentially with the number of queries. We show that, in particular, this exponential decrease holds for query learning of perceptrons. 相似文献

4.

On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality 总被引：3，自引：2，他引：1

Jerome H. Friedman 《Data mining and knowledge discovery》1997,1(1):55-77

The classification problem is considered in which an outputvariable y assumes discrete values with respectiveprobabilities that depend upon the simultaneous values of a set of input variablesx = {x_1,....,x_n}. At issue is how error in the estimates of theseprobabilities affects classification error when the estimates are used ina classification rule. These effects are seen to be somewhat counterintuitive in both their strength and nature. In particular the bias andvariance components of the estimation error combine to influenceclassification in a very different way than with squared error on theprobabilities themselves. Certain types of (very high) bias can becanceled by low variance to produce accurate classification. This candramatically mitigate the effect of the bias associated with some simpleestimators like naive Bayes, and the bias induced by thecurse-of-dimensionality on nearest-neighbor procedures. This helps explainwhy such simple methods are often competitive with and sometimes superiorto more sophisticated ones for classification, and whybagging/aggregating classifiers can often improveaccuracy. These results also suggest simple modifications to theseprocedures that can (sometimes dramatically) further improve theirclassification performance. 相似文献

5.

A Comparison of Two Approaches to Ranking Algorithms Used to Compute Hill Slopes 总被引：2，自引：0，他引：2

Kevin H. Jones 《GeoInformatica》1998,2(3):235-256

The calculation of slope (downhill gradient) for a point in a digital elevation model (DEM) is a common procedure in the hydrological, environmental and remote sensing sciences. The most commonly used slope calculation algorithms employed on DEM topography data make use of a three-by-three search window or kernel centered on the grid point (grid cell) in question in order to calculate the slope at that point. A comparison of eight such slope calculation algorithms has been carried out using an artificial DEM, consisting of a smooth synthetic test surface with various amounts of added Gaussian noise. Morrison's Surface III, a trigonometrically defined surface, was used as the synthetic test surface. Residual slope grids were calculated by subtracting the slope grids produced by the algorithms on test from true/reference slope grids derived by analytic partial differentiation of the synthetic surface. The resulting residual slope grids were used to calculate root-mean-square (RMS) residual error estimates that were used to rank the slope algorithms from best (lowest value of RMS residual error) to worst (largest value of RMS residual error). Fleming and Hoffers method gave the best results for slope estimation when values of added Gaussian noise were very small compared to the mean smooth elevation difference (MSED) measured within three-by-three elevation point windows on the synthetic surface. Horns method (used in ArcInfo GRID) performed better than Fleming and Hoffers as a slope estimator when the noise amplitude was very much larger than the MSED. For the large noise amplitude situation the best overall performing method was that of Sharpnack and Akin. The popular Maximum Downward Gradient Method (MDG) performed poorly coming close to last in the rankings, for both situations, as did the Simple Method. A nonogram was produced in terms of standard deviation of the Gaussian noise and MSED values that gave the locus of the trade-off point between Fleming and Hoffers and Horns methods. 相似文献

6.

Improving Generalization with Active Learning 总被引：29，自引：0，他引：29

Cohn David Atlas Les Ladner Richard 《Machine Learning》1994,15(2):201-221

Active learning differs from learning from examples in that the learning algorithm assumes at least some control over what part of the input domain it receives information about. In some situations, active learning is provably more powerful than learning from examples alone, giving better generalization for a fixed number of training examples.In this article, we consider the problem of learning a binary concept in the absence of noise. We describe a formalism for active concept learning calledselective sampling and show how it may be approximately implemented by a neural network. In selective sampling, a learner receives distribution information from the environment and queries an oracle on parts of the domain it considers useful. We test our implementation, called anSG-network, on three domains and observe significant improvement in generalization.A preliminary version of this article appears as Cohn et al. (1990). 相似文献

7.

Optimal multilayer channel routing with overlap

Martin L. Brady Donna J. Brown 《Algorithmica》1991,6(1):83-101

This paper presents algorithms for multiterminal net channel routing where multiple interconnect layers are available. Major improvements are possible if wires are able to overlap, and our generalized main algorithm allows overlap, but only on everyKth (K 2) layer. Our algorithm will, for a problem with densityd onL layers,L K + 3,provably use at most three tracks more than optimal: (d + 1)/L/K + 2 tracks, compared with the lower bound of d/L/K. Our algorithm is simple, has few vias, tends to minimize wire length, and could be used if different layers have different grid sizes. Finally, we extend our algorithm in order to obtain improved results for adjacent (K = 1) overlap: (d + 2)/2L/3 + 5 forL 7.This work was supported by the Semiconductor Research Corporation under Contract 83-01-035, by a grant from the General Electric Corporation, and by a grant at the University of the Saarland. 相似文献

8.

Learning to Recognize Volcanoes on Venus 总被引：1，自引：0，他引：1

Burl Michael C. Asker Lars Smyth Padhraic Fayyad Usama Perona Pietro Crumpler Larry Aubele Jayne 《Machine Learning》1998,30(2-3):165-194

Dramatic improvements in sensor and image acquisition technology have created a demand for automated tools that can aid in the analysis of large image databases. We describe the development of JARtool, a trainable software system that learns to recognize volcanoes in a large data set of Venusian imagery. A machine learning approach is used because it is much easier for geologists to identify examples of volcanoes in the imagery than it is to specify domain knowledge as a set of pixel-level constraints. This approach can also provide portability to other domains without the need for explicit reprogramming; the user simply supplies the system with a new set of training examples. We show how the development of such a system requires a completely different set of skills than are required for applying machine learning to toy world domains. This paper discusses important aspects of the application process not commonly encountered in the toy world, including obtaining labeled training data, the difficulties of working with pixel data, and the automatic extraction of higher-level features. 相似文献

9.

Continuous reductions among combinatorial optimization problems

Hans Ulrich Simon 《Acta Informatica》1989,26(8):771-785

Summary Many reductions among combinatorial problems are known in the context of NP-completeness. These reductions preserve the optimality of solutions. However, they may change the relative error of approximative solutions dramatically. In this paper, we apply a new type of reductions, called continuous reductions. When one problem is continuously reduced to another, any approximation algorithm for the latter problem can be transformed into an approximation algorithm for the former. Moreover, the performance ratio is preserved up to a constant factor. We relate the problem Minimum Number of Inverters in CMOS-Circuits, which arises in the context of logic synthesis, to several classical combinatorial problems such as Maximum Independent Set and Deletion of a Minimum Number of Vertices (Edges) in Order to Obtain a Bipartite (Partial) Subgraph. 相似文献

10.

Learning Probabilistic Read-once Formulas on Product Distributions

Schapire Robert E. 《Machine Learning》1994,14(1):47-81

This paper presents a polynomial-time algorithm for inferring a probabilistic generalization of the class of read-once Boolean formulas over the usual basis {AND, OR, NOT}. The algorithm effectively infers a good approximation of the target formula when provided with random examples which are chosen according to anyproduct distribution, i.e., any distribution in which the setting of each input bit is chosen independently of the settings of the other bits. Since the class of formulas considered includes ordinary read-once Boolean formulas, our result shows that such formulas are PAC learnable (in the sense of Valiant) against any product distribution (for instance, against the uniform distribution). Further, this class of probabilistic formulas includes read-once formulas whose behavior has been corrupted by large amounts of random noise. Such noise may affect the formula's output (misclassification noise), the input bits (attribute noise), or it may affect the behavior of individual gates of the formula. Thus, in this setting, we show that read-once formula's can be inferred (approximately), despite large amounts of noise affecting the formula's behavior. 相似文献