首页 | 本学科首页   官方微博 | 高级检索  
     


Effects of data set features on the performances of classification algorithms
Authors:Ohbyung Kwon  Jae Mun Sim
Affiliation:1. School of Management, Kyung Hee University, 26 Kyunghee-daero, Dongdaemun-gu, Seoul 130-701, Republic of Korea;2. Department of International Management, Kyung Hee University, 26 Kyunghee-daero, Dongdaemun-gu, Seoul 130-701, Republic of Korea
Abstract:As the need to analyze big data sets grows dramatically, the role that classification algorithms play in data mining techniques also increases. Big data analysis requires more of the data sets’ characteristics to be included, such as data structure, variety of sources, and the rate of update frequency. In this paper, we evaluate scenarios that examine which data set characteristics most affect the classification algorithms’ performance. It is still a complex issue to determine which algorithm is how strong or how weak in relation to which data set. Thus, our research experimentally examines how data set characteristics affect algorithm performance, both in terms of accuracy and in elapsed time. To do so, we use a multiple regression method to evaluate the causality between data set characteristics as independent variables, and performance metrics as dependent variables. We also examine the role that classification algorithms play as moderator in this causality. All benchmark data sets in a UCI database are used that are fit to run the classification algorithm. Based on the results of the experiment, we discuss the requirements of legacy classification algorithms to address big data analysis in a new business intelligence era.
Keywords:
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号