Imputation techniques for multivariate missingness in software measurement data |
| |
Authors: | Taghi M Khoshgoftaar Jason Van Hulse |
| |
Affiliation: | (1) Department of Computer Science and Engineering, Florida Atlantic University, Boca Raton, FL 33431, USA |
| |
Abstract: | The problem of missing values in software measurement data used in empirical analysis has led to the proposal of numerous
potential solutions. Imputation procedures, for example, have been proposed to ‘fill-in’ the missing values with plausible
alternatives. We present a comprehensive study of imputation techniques using real-world software measurement datasets. Two
different datasets with dramatically different properties were utilized in this study, with the injection of missing values
according to three different missingness mechanisms (MCAR, MAR, and NI). We consider the occurrence of missing values in multiple
attributes, and compare three procedures, Bayesian multiple imputation, k Nearest Neighbor imputation, and Mean imputation. We also examine the relationship between noise in the dataset and the performance
of the imputation techniques, which has not been addressed previously. Our comprehensive experiments demonstrate conclusively
that Bayesian multiple imputation is an extremely effective imputation technique.
Taghi M. Khoshgoftaar
is a professor of the Department of Computer Science and Engineering, Florida Atlantic University and the Director of the
Empirical Software Engineering and Data Mining and Machine Learning Laboratories. His research interests are in software engineering,
software metrics, software reliability and quality engineering, computational intelligence, computer performance evaluation,
data mining, machine learning, and statistical modeling. He has published more than 300 refereed papers in these areas. He
is a member of the IEEE, IEEE Computer Society, and IEEE Reliability Society. He was the program chair and General Chair of
the IEEE International Conference on Tools with Artificial Intelligence in 2004 and 2005 respectively. He has served on technical
program committees of various international conferences, symposia, and workshops. Also, he has served as North American Editor
of the Software Quality Journal, and is on the editorial boards of the journals Software Quality and Fuzzy systems.
Jason Van Hulse
received the Ph.D. degree in Computer Engineering from the Department of Computer Science and Engineering at Florida Atlantic
University in 2007, the M.A. degree in Mathematics from Stony Brook University in 2000, and the B.S. degree in Mathematics
from the University at Albany in 1997. His research interests include data mining and knowledge discovery, machine learning,
computational intelligence, and statistics. He has published numerous peer-reviewed research papers in various conferences
and journals, and is a member of the IEEE, IEEE Computer Society, and ACM. He has worked in the data mining and predictive
modeling field at First Data Corp. since 2000, and is currently Vice President, Decision Science.
![MediaObjects/11219_2008_9054_Figb_HTML.jpg](/content/v466t37481q3krx2/MediaObjects/11219_2008_9054_Figb_HTML.jpg) |
| |
Keywords: | Imputation Software quality Missing data Data quality Bayesian multiple imputation |
本文献已被 SpringerLink 等数据库收录! |
|