首页 | 本学科首页   官方微博 | 高级检索  
     


An experimental study on the use of nearest neighbor-based imputation algorithms for classification tasks
Affiliation:1. School of Computer Engineering, Iran University of Science and Technology, Tehran, Iran;2. Department of Computer Engineering, Faculty of Engineering, Islamic Azad University, South Tehran Branch, Tehran, Iran;1. Noccs, National Engineering School of Sousse, Tunisia;2. Computer Vision Center, Barcelona, Spain;1. Image Processing Center, Beihang University, 100191 Beijing, China;2. State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing 100191, China
Abstract:The substitution of missing values, also called imputation, is an important data preparation task for data mining applications. Imputation algorithms have been traditionally compared in terms of the similarity between imputed and original values. However, this traditional approach, sometimes referred to as prediction ability, does not allow inferring the influence of imputed values in the ultimate modeling tasks (e.g., in classification). Based on an extensive experimental work, we study the influence of five nearest-neighbor based imputation algorithms (KNNImpute, SKNN, IKNNImpute, KMI and EACImpute) and two simple algorithms widely used in practice (Mean Imputation and Majority Method) on classification problems. In order to experimentally assess these algorithms, simulations of missing values were performed on six datasets by means of two missingness mechanisms: Missing Completely at Random (MCAR) and Missing at Random (MAR). The latter allows the probabilities of missingness to depend on observed data but not on missing data, whereas the former occurs when the distribution of missingness does not depend on the observed data either. The quality of the imputed values is assessed by two measures: prediction ability and classification bias. Experimental results show that IKNNImpute outperforms the other algorithms in the MCAR mechanism. KNNImpute, SKNN and EACImpute, by their turn, provided the best results in the MAR mechanism. Finally, our experiments also show that best prediction results (in terms of mean squared errors) do not necessarily yield to less classification bias.
Keywords:
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号