Effects of data set features on the performances of classification algorithms期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Effects of data set features on the performances of classification algorithms

Authors:	Ohbyung Kwon Jae Mun Sim

Affiliation:	1. School of Management, Kyung Hee University, 26 Kyunghee-daero, Dongdaemun-gu, Seoul 130-701, Republic of Korea;2. Department of International Management, Kyung Hee University, 26 Kyunghee-daero, Dongdaemun-gu, Seoul 130-701, Republic of Korea

Abstract:	As the need to analyze big data sets grows dramatically, the role that classification algorithms play in data mining techniques also increases. Big data analysis requires more of the data sets’ characteristics to be included, such as data structure, variety of sources, and the rate of update frequency. In this paper, we evaluate scenarios that examine which data set characteristics most affect the classification algorithms’ performance. It is still a complex issue to determine which algorithm is how strong or how weak in relation to which data set. Thus, our research experimentally examines how data set characteristics affect algorithm performance, both in terms of accuracy and in elapsed time. To do so, we use a multiple regression method to evaluate the causality between data set characteristics as independent variables, and performance metrics as dependent variables. We also examine the role that classification algorithms play as moderator in this causality. All benchmark data sets in a UCI database are used that are fit to run the classification algorithm. Based on the results of the experiment, we discuss the requirements of legacy classification algorithms to address big data analysis in a new business intelligence era.

Keywords:
本文献已被 ScienceDirect 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏