首页 | 本学科首页   官方微博 | 高级检索  
     

面向非平衡数据分类的概率过抽样过滤方法
引用本文:孟庆鹏,田开严,张恒.面向非平衡数据分类的概率过抽样过滤方法[J].雷达与对抗,2020,40(1):17-21.
作者姓名:孟庆鹏  田开严  张恒
作者单位:海军装备部驻南京地区第二军事代表室,南京211153;中国船舶集团有限公司第八研究院,南京211153
摘    要:利用非合作博弈理论为概率过抽样合成的少数类数据决定其最可能的类标签,将数据中的非本类合成数据进行过滤,减少概率过抽样合成数据过程中产生的重叠数据,得到更高质量的少数类数据进而改善数据倾斜状况。实验分别以CART和SVM分类器建立模型,将本文提出的面向非平衡数据分类的概率过抽样过滤方法RACOG+F与原始概率过抽样方法分别在8个KEEL非平衡数据集上进行对比。实验表明,本文提出的方法在评价指标F-measure、G-mean和AUC上获得了较好的分类性能。

关 键 词:非平衡数据分类  概率近似  过抽样  过滤

Probabilistic oversampling filtering method for unbalanced data classification
MENG Qing-peng,TIAN Kai-yan,ZHANG Heng.Probabilistic oversampling filtering method for unbalanced data classification[J].Radar & Ecm,2020,40(1):17-21.
Authors:MENG Qing-peng  TIAN Kai-yan  ZHANG Heng
Affiliation:(No.2 Military Representatives Office of Naval Equipment Department in Nanjing,Nanjing 211153;No.8 Research Academy of CSSC,Nanjing 211153)
Abstract:The non-cooperative game theory is used to determine the most likely class labels for the minority data synthesized by probabilistic oversampling.The non-class synthetic data in the data is filtered to reduce the overlapping data generated during the process of probabilistic oversampling and data synthesis.Higher quality minority data improves data skew.The CART and SVM classifiers are used in the test to establish the models respectively.The probabilistic oversampling filtering method RACOG+F for the unbalanced data classification proposed is compared with the original probabilistic oversampling methods on 8 KEEL unbalanced data sets respectively.Experiments show that the proposed method achieves better classification performance on the evaluation indicators F-measure,G-mean,and AUC.
Keywords:unbalanced data classification  probability approximation  oversampling  filtering
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号