Boosting support vector machines for imbalanced data sets期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Boosting support vector machines for imbalanced data sets

Authors:	Benjamin X. Wang Nathalie Japkowicz

Affiliation:	1. Datalong technology Ltd., 430074, Wuhan, Hubei, China 2. School of Information Technology and Engineering, University of Ottawa, 800 King Edward Ave., P.O. Box 450 Stn.A, Ottawa, ON, K1N 6N5, Canada

Abstract:	Real world data mining applications must address the issue of learning from imbalanced data sets. The problem occurs when the number of instances in one class greatly outnumbers the number of instances in the other class. Such data sets often cause a default classifier to be built due to skewed vector spaces or lack of information. Common approaches for dealing with the class imbalance problem involve modifying the data distribution or modifying the classifier. In this work, we choose to use a combination of both approaches. We use support vector machines with soft margins as the base classifier to solve the skewed vector spaces problem. We then counter the excessive bias introduced by this approach with a boosting algorithm. We found that this ensemble of SVMs makes an impressive improvement in prediction performance, not only for the majority class, but also for the minority class.

Keywords:
本文献已被 SpringerLink 等数据库收录！