首页 | 本学科首页   官方微博 | 高级检索  
     

面向类不平衡流量数据的分类模型
引用本文:刘丹,姚立霜,王云锋,裴作飞.面向类不平衡流量数据的分类模型[J].计算机应用,2020,40(8):2327-2333.
作者姓名:刘丹  姚立霜  王云锋  裴作飞
作者单位:1. 重庆邮电大学 通信与信息工程学院, 重庆 400065;2. 移动通信技术重庆市重点实验室(重庆邮电大学), 重庆 400065
基金项目:长江学者和创新团队发展计划(IRT_16R72)。
摘    要:针对网络流量分类过程中,传统模型在小类别上的分类性能较差和难以实现频繁、及时更新的问题,提出一种基于集成学习的网络流量分类模型(ELTCM)。首先,根据类别分布信息定义了偏向于小类别的特征度量,利用加权对称不确定性和近似马尔可夫毯(AMB)对网络流量特征进行降维,减小类不平衡问题带来的影响;然后,引入早期概念漂移检测增强模型应对流量特征随网络变化而变化的能力,并通过增量学习的方式提高模型更新训练的灵活性。利用真实流量数据集进行实验,仿真结果表明,与基于C4.5决策树的分类模型(DTITC)和基于错误率的概念漂移检测分类模型(ERCDD)相比,ELTCM的平均整体精确率分别提高了1.13%和0.26%,且各小类别的分类性能皆优于对比模型。ELTCM有较好的泛化能力,能在不牺牲整体分类精度的情况下有效提高小类别的分类性能。

关 键 词:流量分类  类不平衡  特征选择  增量学习  集成学习  
收稿时间:2020-01-07
修稿时间:2020-03-30

Classification model for class imbalanced traffic data
LIU Dan,YAO Lishuang,WANG Yunfeng,PEI Zuofei.Classification model for class imbalanced traffic data[J].journal of Computer Applications,2020,40(8):2327-2333.
Authors:LIU Dan  YAO Lishuang  WANG Yunfeng  PEI Zuofei
Affiliation:1. School of Communication and Information Engineering, Chongqing University of Posts and Communications, Chongqing 400065, China;2. Chongqing Key Lab of Mobile Communications Technology(Chongqing University of Posts and Communications), Chongqing 400065, China
Abstract:In the process of network traffic classification, the traditional model has poor classification on minority classes and cannot be updated frequently and timely. In order to solve the problems, a network Traffic Classification Model based on Ensemble Learning (ELTCM) was proposed. First, in order to reduce the impact of class imbalance problem, feature metrics biased towards minority classes were defined according to the class distribution information, and the weighted symmetric uncertainty and Approximate Markov Blanket (AMB) were used to reduce the dimensionality of network traffic features. Then, early concept drift detection was introduced to enhance the model's ability to cope with the changes in traffic features as the network changed. At the same time, incremental learning was used to improve the flexibility of model update training. Experimental results on real traffic datasets show that compared with the Internet Traffic Classification based on C4.5 Decision Tree (DTITC) and Classification Model for Concept Drift Detection based on ErrorRate (ERCDD), the proposed ELTCM has the average overall accuracy increased by 1.13% and 0.26% respectively, and the classification performance of minority classes all higher than those of the models. ELTCM has high generalization ability, and can effectively improve the classification performance of minority classes without sacrificing the overall classification accuracy.
Keywords:traffic classification                                                                                                                        class imbalance                                                                                                                        feature selection                                                                                                                        incremental learning                                                                                                                        ensemble learning
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号