首页 | 本学科首页   官方微博 | 高级检索  
     

基于Bootstrapping的因特网流量分类方法
引用本文:刘珍,王若愚,刘琼. 基于Bootstrapping的因特网流量分类方法[J]. 北京邮电大学学报, 2014, 37(5): 66. DOI: 10.13190/j.jbupt.2014.05.014
作者姓名:刘珍  王若愚  刘琼
作者单位:1. 华南理工大学 软件学院, 广州 510006;
2. 华南理工大学 计算机科学与工程学院, 广州 510006
基金项目:国家自然科学基金项目(61171141)
摘    要:针对因特网流量分类面临的流量类别标记瓶颈和类别样本数分布不平衡,提出基于Bootstrapping的流量分类方法,使用少量有标记样本训练初始分类器,迭代利用无标记样本扩展样本集并更新分类器. 在构建扩展样本集过程中,将无标记样本在某后验概率分布下的正确分类行为视为一个概率事件,建立新的置信度计算方法,以减少扩展样本集中的噪声样本;基于概率近似正确学习理论建立启发式规则,注重选择小类样本加入扩展样本集,缓解类别样本数分布的不平衡. 实验结果表明,与初始分类器相比,基于Bootstrapping的流量分类器总体分类准确率可提高9.46%;与现有半监督学习方法相比,小类分类准确率提高2.22%.

关 键 词:半监督学习  类别不平衡  Bootstrapping  Internet流量分类  
收稿时间:2013-11-22

Study of Internet Traffic Classification Method Based on Bootstrapping
LIU Zhen,WANG Ruo-yu,LIU Qiong. Study of Internet Traffic Classification Method Based on Bootstrapping[J]. Journal of Beijing University of Posts and Telecommunications, 2014, 37(5): 66. DOI: 10.13190/j.jbupt.2014.05.014
Authors:LIU Zhen  WANG Ruo-yu  LIU Qiong
Affiliation:1. School of Software, South China University of Technology, Guangzhou 510006, China;
2. School of Computer Science and Engineering, South China University of Technology, Guangzhou 510006, China
Abstract:Aiming at the class labeling starvation and class imbalance problems in Internet traffic classification, a bootstrapping based traffic classification method was presented. An initial classifier was trained on a small number of labeled samples, and then it is updated iteratively by predicting the class labels of unlabeled samples and extending the training set. A new algorithm was devised to compute the confidence used for selecting new labeled samples into the extension set. It correctly adopts classifying unlabeled samples with a posterior probability distribution as probabilistic event and to decrease the noise in the extension set. Moreover, the heuristic rule was built with aid of probably approximately correct theory, its biases is toward selecting minority class samples into the extension set so as to reduce class imbalance degree. Experiments show that the bootstrapping based classifier gets improved of 9.46% on overall classification accuracy compared with initial classifier, and the recalls of minority classes get increased about 2.22% averagely compared with the existing method.
Keywords:semi-supervised learning  class imbalance  Bootstrapping  Internet traffic classification  
本文献已被 CNKI 等数据库收录!
点击此处可从《北京邮电大学学报》浏览原始摘要信息
点击此处可从《北京邮电大学学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号