首页 | 本学科首页   官方微博 | 高级检索  
     


Benchmarking binary classification models on data sets with different degrees of imbalance
Authors:Ligang Zhou and Kin Keung Lai
Affiliation:(1) Department of Management Sciences, City University of Hong Kong, Hong Kong, China
Abstract:In practice, there are many binary classification problems, such as credit risk assessment, medical testing for determining if a patient has a certain disease or not, etc. However, different problems have different characteristics that may lead to different difficulties of the problem. One important characteristic is the degree of imbalance of two classes in data sets. For data sets with different degrees of imbalance, are the commonly used binary classification methods still feasible? In this study, various binary classification models, including traditional statistical methods and newly emerged methods from artificial intelligence, such as linear regression, discriminant analysis, decision tree, neural network, support vector machines, etc., are reviewed, and their performance in terms of the measure of classification accuracy and area under Receiver Operating Characteristic (ROC) curve are tested and compared on fourteen data sets with different imbalance degrees. The results help to select the appropriate methods for problems with different degrees of imbalance.
Keywords:binary classification  area under Receiver Operating Characteristic (ROC) curve  classification accuracy  degrees of imbalance
本文献已被 万方数据 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号