Learning from Batched Data: Model Combination Versus Data Combination期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Learning from Batched Data: Model Combination Versus Data Combination

Authors:	Kai Ming Ting Boon Toh Low Ian H Witten

Affiliation:	1. School of Computing and Mathematics, Deakin University, Australia 2. Department of Systems Engineering and Engineering Management, Chinese University of Hong Kong, Hong Kong 3. Department of Computer Science, University of Waikato, New Zealand

Abstract:	Combining models learned from multiple batches of data provide an alternative to the common practice of learning one model from all the available data (i.e. the data combination approach). This paper empirically examines the base-line behavior of the model combination approach in this multiple-data-batches scenario. We find that model combination can lead to better performance even if the disjoint batches of data are drawn randomly from a larger sample, and relate the relative performance of the two approaches to the learning curve of the classifier used. In the beginning of the curve, model combination has higher bias and variance than data combination and thus a higher error rate. As training data increases, model combination has either a lower error rate than or a comparable performance to data combination because the former achieves larger variance reduction. We also show that this result is not sensitive to the methods of model combination employed. Another interesting result is that we empirically show that the near-asymptotic performance of a single model in some classification tasks can be significantly improved by combining multiple models (derived from the same algorithm) in the multiple-data-batches scenario.

Keywords:
本文献已被 SpringerLink 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏