首页 | 本学科首页   官方微博 | 高级检索  
     

基于C4.5决策树的流量分类方法
引用本文:徐 鹏,林 森.基于C4.5决策树的流量分类方法[J].软件学报,2009,20(10):2692-2704.
作者姓名:徐 鹏  林 森
作者单位:中国科学院,软件研究所,北京,100190;中国科学院,研究生院,北京,100049
基金项目:Supported by the National Basic Research Program of China under Grant No.2007CB307100 (国家重点基础研究发展计划(973))
摘    要:近年来,利用机器学习方法处理流量分类问题成为网络测量领域一个新兴的研究方向.在现有研究中,朴素贝叶斯方法及其改进算法以其实现简单、分类高效的特点而被广泛应用.但此类方法过分依赖于样本在样本空间的分布,具有潜在的不稳定性.为此,引入C4.5决策树方法来处理流量分类问题.该方法利用训练数据集中的信息熵来构建分类模型,并通过对分类模型的简单查找来完成未知网络流样本的分类.理论分析和实验结果都表明,利用C4.5决策树来处理流量分类问题在分类稳定性上均具有明显的优势.

关 键 词:流量分类  网络测量  决策树  网络流  统计属性
收稿时间:2007/10/23 0:00:00
修稿时间:8/7/2008 12:00:00 AM

Internet Traffic Classification Using C4.5 Decision Tree
XU Peng and LIN Sen.Internet Traffic Classification Using C4.5 Decision Tree[J].Journal of Software,2009,20(10):2692-2704.
Authors:XU Peng and LIN Sen
Abstract:In recent years, Internet traffic classification using machine learning has become a new direction in network measurement. Being simple and efficient Na?ve Bayes and its improved methods have been widely used in this area. But these methods depend too much on probability distribution of sample spacing, so they have connatural instability. To handle this problem, a new method based on C4.5 decision tree is proposed in this paper. This method builds a classification model using information entropy in training data and classifies flows just by a simple search of the decision tree. The theoretical analysis and experimental results show that there are obvious advantages in classification stability when C4.5 decision tree method is used to classify Internet traffic.
Keywords:traffic classification  network measurement  decision tree  flow  statistical attribute
本文献已被 万方数据 等数据库收录!
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号