首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于集成的不均衡数据流分类算法
引用本文:袁泉,郭江帆,赵学华. 一种基于集成的不均衡数据流分类算法[J]. 计算机工程与科学, 2019, 41(8): 1519-1524
作者姓名:袁泉  郭江帆  赵学华
作者单位:重庆邮电大学通信与信息工程学院通信新技术应用研究中心,重庆400065;重庆信科设计有限公司,重庆401121;重庆邮电大学通信与信息工程学院通信新技术应用研究中心,重庆,400065
摘    要:目前数据流分类算法大多是基于类分布这一理想状态,然而在真实数据流环境中数据分布往往是不均衡的,并且数据流中往往伴随着概念漂移。针对数据流中的不均衡问题和概念漂移问题,提出了一种新的基于集成学习的不均衡数据流分类算法。首先为了解决数据流的不均衡问题,在训练模型前加入混合采样方法平衡数据集,然后采用基分类器加权和淘汰策略处理概念漂移问题,从而提高分类器的分类性能。最后与经典数据流分类算法在人工数据集和真实数据集上进行对比实验,实验结果表明,本文提出的算法在含有概念漂移和不均衡的数据流环境中,其整体分类性能优于其他算法的。

关 键 词:数据流  概念漂移  集成学习  不均衡
收稿时间:2018-12-12
修稿时间:2019-08-25

An imbalanced data stream classificationalgorithm based on ensemble learning
YUAN Quan,GUO Jiang-fan,ZHAO Xue-hua. An imbalanced data stream classificationalgorithm based on ensemble learning[J]. Computer Engineering & Science, 2019, 41(8): 1519-1524
Authors:YUAN Quan  GUO Jiang-fan  ZHAO Xue-hua
Affiliation:(1.Research Center of New Telecommunication Technology Applications,School of Telecommunications and Information Engineering,Chongqing University of Posts and Telecommunications,Chongqing 400065;2.Chongqing Information Technology Designing Company Limited,Chongqing 401121,China) 
Abstract:At present, most data stream classification algorithms assume that the class distribution is basically balanced. However, the data distribution is often unbalanced and accompanied by conceptual drift in real data stream environments. Aiming at the problem of unbalanced data distribution and concept drift, we propose an unbalanced data stream classification algorithm based on ensemble learning. Firstly, in order to solve the problem of unbalanced data flows, a mixed sampling method is added to balance the data set before model training. And then the concept drift problem is solved with base classifier weight and elimination strategy. Finally, comparison experiments among data stream classification algorithms are carried out on artificial and real data sets. Experimental results show that the proposed algorithm has better overall classification performance than other algorithms in data stream environments with concept drift and imbalance.
Keywords:data stream  concept drift  ensemble learning  unbalance  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机工程与科学》浏览原始摘要信息
点击此处可从《计算机工程与科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号