首页 | 本学科首页   官方微博 | 高级检索  
     

基于Hellinger距离的不平衡漂移数据流Boosting分类算法
引用本文:张喜龙,韩萌,陈志强,武红鑫,李慕航.基于Hellinger距离的不平衡漂移数据流Boosting分类算法[J].计算机工程与科学,2022,44(5):788-799.
作者姓名:张喜龙  韩萌  陈志强  武红鑫  李慕航
作者单位:(北方民族大学计算机科学与工程学院,宁夏 银川 750021)
基金项目:北方民族大学研究生创新项目;宁夏自然科学基金项目;国家自然科学基金
摘    要:数据流中的不平衡问题会严重影响算法的分类性能,其中概念漂移更是流数据挖掘研究领域的一个难点问题。为了提高此类问题下的分类性能,提出了一种新的基于Hellinger距离的不平衡漂移数据流Boosting分类BCA-HD算法。该算法创新性地采用实例级和分类器级的权重组合方式来动态更新分类器,以适应概念漂移的发生,在底层采用集成算法SMOTEBoost作为基分类器,该分类器内部使用重采样技术处理数据的不平衡。在16个突变型和渐变型的数据集上将所提算法与9种不同算法进行比较,实验结果表明,所提算法的G-mean和AUC的平均值和平均排名均为第1名。因此,该算法能更好地适应概念漂移和不平衡现象的同时发生,有助于提高分类性能。

关 键 词:数据流  不平衡数据  概念漂移  Boosting  Hellinger距离  
收稿时间:2021-11-09
修稿时间:2022-01-15

A Boosting classification algorithm for imbalanced drift data stream based on Hellinger distance
ZHANG Xi-long,HAN Meng,CHEN Zhi-qiang,WU Hong-xin,LI Mu-hang.A Boosting classification algorithm for imbalanced drift data stream based on Hellinger distance[J].Computer Engineering & Science,2022,44(5):788-799.
Authors:ZHANG Xi-long  HAN Meng  CHEN Zhi-qiang  WU Hong-xin  LI Mu-hang
Affiliation:(School of Computer Science and Engineering,North Minzu University,Yinchuan 750021,China)
Abstract:Imbalanced data stream will seriously affect the classification performance of the algorithm and the emer-gence of concept drift is a difficult problem in the field of stream data mining. In order to improve the classification performance of such problem, a new Boosting Classification Algorithm for imbalanced drifted data stream based on Hellinger Distance (BCA-HD) is proposed. The algorithm innovatively uses the weighted combination of instance level and classifier level to dynamically update the classifier to adapt to the occurrence of concept drift. The integrated algorithm SMOTEBoost is used as the base classifier at the bottom layer, and the classifier uses resampling technology to deal with the imbalanced data. Finally, the proposed algorithm is compared with 9 different algorithms on 16 abrupt and gradual datasets. The results show that average value and average rankings of G-mean and AUC are both ranked first. Experiments show that the algorithm can better adapt to the simultaneous occurrence of concept drift and imbalance, which helps to improve the classification performance.
Keywords:data stream  imbalanced data  concept drift  Boosting  Hellinger distance  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机工程与科学》浏览原始摘要信息
点击此处可从《计算机工程与科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号