Imbalanced data classification via support vector machines and genetic algorithms |
| |
Authors: | Jair Cervantes Xiaoou Li |
| |
Affiliation: | 1. Posgrado e Investigación, UAEMEX (Autonomous University of Mexico State), Texcoco 56259, Mexico;2. Departamento de Computacion, CINVESTAV-IPN (National Polytechnic Institute), Mexico City 07360, Mexico |
| |
Abstract: | Many real data sets are imbalanced and contain a large number of a certain type of patterns, but a very small number of another type of patterns. Normal classification methods, such as support vector machine (SVM), do not work well for these imbalanced data sets (IDS). It is difficult for SVMs to get the optimal separation hyperplane when they are trained with imbalanced data. In this paper, we propose a genetic algorithm (GA)-based classification method. A draft hyperplane and support vectors are first generated by SVMs. Then, GA is applied to compensate the imbalanced data. Finally, SVM is used again to find the best hyperplane from the generated data points. Compared with the other popular classification algorithms, our method has better classification accuracy for several IDS. |
| |
Keywords: | genetic algorithm support vector machine imbalanced data classification |
|
|