首页 | 本学科首页   官方微博 | 高级检索  
     


Building bagging on critical instances
Authors:Li Guo  Samia Boukir  Alexandre Aussem
Affiliation:1. G&E Laboratory (EA 4592), Bordeaux INP, Pessac, France

Atos Worldline, Seclin, France;2. G&E Laboratory (EA 4592), Bordeaux INP, Pessac, France;3. LIRIS (UMR CNRS 5205), University of Lyon, Villeurbanne, France

Abstract:The ensemble method is a powerful data mining paradigm, which builds a classification model by integrating multiple diversified component learners. Bagging is one of the most successful ensemble methods. It is made of bootstrap-inspired classifiers and uses these classifiers to get an aggregated classifier. However, in bagging, bootstrapped training sets become more and more similar as redundancy is increasing. Besides redundancy, any training set is usually subject to noise. Moreover, the training set might be imbalanced. Thus, each training instance has a different impact on the learning process. This paper explores some properties of the ensemble margin and its use in improving the performance of bagging. We introduce a new approach to measure the importance of training data in learning, based on the margin theory. Then, a new bagging method concentrating on critical instances is proposed. This method is more accurate than bagging and more robust than boosting. Compared to bagging, it reduces the bias while generally keeping the same variance. Our findings suggest that (a) examples with low margins tend to be more critical for the classifier performance; (b) examples with higher margins tend to be more redundant; (c) misclassified examples with high margins tend to be noisy examples. Our experimental results on 15 various data sets show that the generalization error of bagging can be reduced up to 2.5% and its resilience to noise strengthened by iteratively removing both typical and noisy training instances, reducing the training set size by up to 75%.
Keywords:Bagging  ensemble  instance importance  instance selection  margin
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号