首页 | 本学科首页   官方微博 | 高级检索  
     

基于Bagging-SVM集成分类器的网页作弊检测
引用本文:唐寿洪,朱焱,杨凡.基于Bagging-SVM集成分类器的网页作弊检测[J].计算机科学,2015,42(1):239-243.
作者姓名:唐寿洪  朱焱  杨凡
作者单位:西南交通大学信息科学与技术学院 成都610031
基金项目:本文受四川省学术和技术带头人后备人选培养基金(X800912371309)资助
摘    要:网页作弊不仅造成信息检索质量下降,而且给互联网的安全也带来了极大的挑战.提出了一种基于Bag-ging-SVM集成分类器的网页作弊检测方法.在预处理阶段,首先采用K-means方法解决数据集的不平衡问题,然后采用CFS特征选择方法筛选出最优特征子集,最后对特征子集进行信息熵离散化处理.在分类器训练阶段,通过Bagging方法构建多个训练集并分别对每个训练集进行SVM学习来产生弱分类器.在检测阶段,通过多个弱分类器投票决定测试样本所属类别.在数据集WEBSPAM-UK2006上的实验结果表明,在使用特征数量较少的情况下,本检测方法可以获得非常好的检测效果.

关 键 词:网页作弊  集成分类器  特征选择  信息熵  弱分类器

Web Spam Detection Based on Integrated Classifier with Bagging-SVM
TANG Shou-hong,ZHU Yan and YANG Fan.Web Spam Detection Based on Integrated Classifier with Bagging-SVM[J].Computer Science,2015,42(1):239-243.
Authors:TANG Shou-hong  ZHU Yan and YANG Fan
Affiliation:School of Information Science and Technology,Southwest Jiaotong University,Chengdu 610031,China,School of Information Science and Technology,Southwest Jiaotong University,Chengdu 610031,China and School of Information Science and Technology,Southwest Jiaotong University,Chengdu 610031,China
Abstract:Web spam not only declines the quality of information retrieval,but also causes troubles to the security of Internet.This paper proposed a Bagging-based integration of SVM to detect Web spam.In preprocessing stage,a technique referring to K-means is introduced to solve the class-imbalance problem of dataset firstly,and then an optimal feature subset is culled by using CFS.Finally the optimal feature subset is discretized by the information entropy.In the stage of classifier training,several training datasets are obtained by Bagging and each training dataset is utilized to produce weak classifier respectively after SVM learning.In detection stage,test samples are voted by weak classifiers obtained before detemining their categories.Experimental results on the WEBSPAM-UK2006 reveal that the proposed method can achieve better results with less number of features.
Keywords:Web spam  Integrated classifier  Feature selection  Information entropy  Weak classifier
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号