Out-of-bag estimation of the optimal sample size in bagging |
| |
Authors: | Gonzalo Martí nez-Muñ oz [Author Vitae],Alberto Suá rez [Author Vitae] |
| |
Affiliation: | C/Francisco Tomás y Valiente, 11 Escuela Politécnica Superior, Universidad Autónoma de Madrid, Madrid 28049, Spain |
| |
Abstract: | The performance of m-out-of-n bagging with and without replacement in terms of the sampling ratio (m/n) is analyzed. Standard bagging uses resampling with replacement to generate bootstrap samples of equal size as the original training set mwor=n. Without-replacement methods typically use half samples mwr=n/2. These choices of sampling sizes are arbitrary and need not be optimal in terms of the classification performance of the ensemble. We propose to use the out-of-bag estimates of the generalization accuracy to select a near-optimal value for the sampling ratio. Ensembles of classifiers trained on independent samples whose size is such that the out-of-bag error of the ensemble is as low as possible generally improve the performance of standard bagging and can be efficiently built. |
| |
Keywords: | Bagging Subagging Bootstrap sampling Subsampling Optimal sampling ratio Ensembles of classifiers Decision trees |
本文献已被 ScienceDirect 等数据库收录! |