Shell-neighbor method and its application in missing data imputation |
| |
Authors: | Shichao Zhang |
| |
Affiliation: | 1.Department of Computer Science,Zhejiang Normal University,Jinhua,China;2.State Key Laboratory for Novel Software Technology,Nanjing University,Nanjing,China |
| |
Abstract: | Data preparation is an important step in mining incomplete data. To deal with this problem, this paper introduces a new imputation
approach called SN (Shell Neighbors) imputation, or simply SNI. The SNI fills in an incomplete instance (with missing values)
in a given dataset by only using its left and right nearest neighbors with respect to each factor (attribute), referred them
to Shell Neighbors. The left and right nearest neighbors are selected from a set of nearest neighbors of the incomplete instance. The size of
the sets of the nearest neighbors is determined with the cross-validation method. And then the SNI is generalized to deal
with missing data in datasets with mixed attributes, for example, continuous and categorical attributes. Some experiments
are conducted for evaluating the proposed approach, and demonstrate that the generalized SNI method outperforms the kNN imputation method at imputation accuracy and classification accuracy. |
| |
Keywords: | |
本文献已被 SpringerLink 等数据库收录! |
|