Distributed learning with bagging-like performance期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Distributed learning with bagging-like performance

Affiliation:	1. State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, China;2. School of Computer Engineering, Nanyang Technological University, Blk N4-02c-110, Nanyang Avenue 639798, Singapore;1. Department of Computer Science, Faculty of Computer Science and Information Technology, Universiti Putra Malaysia, UPM Serdang, Selangor, Malaysia;2. The School of Computing, Science & Engineering, Newton Building, University of Salford, Salford, Greater Manchester, United Kingdom;3. Department of Information Systems and Cyber Security, University of Texas at San Antonio, USA;4. Information Assurance Research Group, University of South Australia, Adelaide, South Australia, Australia;5. School of Computer Science, China University of Geosciences, Wuhan, China;6. School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China;7. Department of Computer Science, St. Francis Xavier University, Antigonish, NS, Canada

Abstract:	Bagging forms a committee of classifiers by bootstrap aggregation of training sets from a pool of training data. A simple alternative to bagging is to partition the data into disjoint subsets. Experiments with decision tree and neural network classifiers on various datasets show that, given the same size partitions and bags, disjoint partitions result in performance equivalent to, or better than, bootstrap aggregates (bags). Many applications (e.g., protein structure prediction) involve use of datasets that are too large to handle in the memory of the typical computer. Hence, bagging with samples the size of the data is impractical. Our results indicate that, in such applications, the simple approach of creating a committee of n classifiers from disjoint partitions each of size 1/n (which will be memory resident during learning) in a distributed way results in a classifier which has a bagging-like performance gain. The use of distributed disjoint partitions in learning is significantly less complex and faster than bagging.

Keywords:
本文献已被 ScienceDirect 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏