A divide-and-conquer recursive approach for scaling up instance selection algorithms期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

A divide-and-conquer recursive approach for scaling up instance selection algorithms

Authors:	Aida de Haro-García Nicolás García-Pedrajas

Affiliation:	(1) Department of Computing and Numerical Analysis, University of Córdoba, 14071 Córdoba, Spain

Abstract:	Instance selection is becoming more and more relevant due to the huge amount of data that is being constantly produced. However, although current algorithms are useful for fairly large datasets, scaling problems are found when the number of instances is of hundreds of thousands or millions. In the best case, these algorithms are of efficiency O(n ²), n being the number of instances. When we face huge problems, scalability is an issue, and most algorithms are not applicable. This paper presents a divide-and-conquer recursive approach to the problem of instance selection for instance based learning for very large problems. Our method divides the original training set into small subsets where the instance selection algorithm is applied. Then the selected instances are rejoined in a new training set and the same procedure, partitioning and application of an instance selection algorithm, is repeated. In this way, our approach is based on the philosophy of divide-and-conquer applied in a recursive manner. The proposed method is able to match, and even improve, for the case of storage reduction, the results of well-known standard algorithms with a very significant reduction of execution time. An extensive comparison in 30 datasets form the UCI Machine Learning Repository shows the usefulness of our method. Additionally, the method is applied to 5 huge datasets with from 300,000 to more than a million instances, with very good results and fast execution time.

Keywords:	Instance selection Instance based learning Divide-and-conquer Scalability
本文献已被 SpringerLink 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏