首页 | 本学科首页   官方微博 | 高级检索  
     


A Dynamic Spark-based Classification Framework for Imbalanced Big Data
Authors:Nahla B Abdel-Hamid  Sally ElGhamrawy  Ali El Desouky  Hesham Arafat
Affiliation:1.Computers Engineering Department,MISR Higher Institute for Engineering and Technology,Mansoura,Egypt;2.Scientific Research Group in Egypt,Giza,Egypt;3.Computers and Systems Department, Faculty of Engineering,Mansoura University,Mansoura,Egypt
Abstract:Classification of imbalanced big data has assembled an extensive consideration by many researchers during the last decade. Standard classification methods poorly diagnosis the minority class samples. Several approaches have been introduced for solving the problem of class imbalance in big data to enhance the generalization in classification. However, most of these approaches neglect the effect of border samples on classification performance; the high impact border samples might expose to misclassification. In this paper, a Spark Based Mining Framework (SBMF) is proposed to address the imbalanced data problem. Two main modules are designed for this purpose. The first is the Border Handling Module (BHM) which under samples the low impact majority border instances and oversamples the minority class instances. The second module is the Selective Border Instances sampling (SBI) Module, which enhances the output of the BHM module. The performance of the SBMF framework is evaluated and compared with other recent systems. A number of experiments were performed using moderate and big datasets with different imbalanced ratio. The results obtained from SBMF framework, when compared to the recent works, show better performance for the different datasets and classifiers.
Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号