Co-Training——内容和链接的Web Spam检测方法 Content and Link Based Web Spam Detection with Co-Training期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Co-Training——内容和链接的Web Spam检测方法

引用本文：	魏小娟,李翠平,陈红.Co-Training——内容和链接的Web Spam检测方法[J].计算机科学与探索,2010,4(10):899-908.

作者姓名：	魏小娟李翠平陈红

作者单位：	中国人民大学,数据工程与知识工程国家教育部重点实验室,北京,100872;中国人民大学,信息学院,北京,100872

基金项目：	The National Natural Science Foundation of China under Grant No.60873065，the National High-Tech Research and Development Plan of China under Grant No.2009AA011906(国家高技术研究发展计划，Reserch Program of Sciences at Universities of Inner Mongolia Autonomous Region under Grant No.NJzy08152

摘要：	Web spam是指通过内容作弊和网页间链接作弊来欺骗搜索引擎,从而提升自身搜索排名的作弊网页,它干扰了搜索结果的准确性和相关性。提出基于Co-Training模型的Web spam检测方法,使用了网页的两组相互独立的特征——基于内容的统计特征和基于网络图的链接特征,分别建立两个独立的基本分类器;使用Co-Training半监督式学习算法,借助大量未标记数据来改善分类器质量。在WEB SPAM-UK2007数据集上的实验证明:算法改善了SVM分类器的效果。
关键词：	Web spam检测方法内容作弊链接作弊 Co-Training算法
修稿时间：
Content and Link Based Web Spam Detection with Co-Training

WEI Xiaojuan,LI Cuiping,CHEN Hong.Content and Link Based Web Spam Detection with Co-Training[J].Journal of Frontier of Computer Science and Technology,2010,4(10):899-908.

Authors:	WEI Xiaojuan LI Cuiping CHEN Hong

Affiliation:	1. Key Lab of Data Engineering and Knowledge Engineering of MOE, Renmin University of China, Beijing 100872, China 2. School of Information, Renmin University of China, Beijing 100872, China

Abstract:	Web spam attempts to deceive search engine by crafting the content of Web pages or creating tight knit community of links around irrelevant Web pages, for the purpose of getting an undeserved high rank. It maliciously influences the accuracy and relevancy of ranking algorithms. This paper proposes a novel Web spam detection method based on Co-Training model. It builds two basic classifiers separately considering link-based and content- based features, then leverages unlabeled data along with a few labeled examples to boost the performance of the classifier through a semi-supervised algorithm—— Co-Training model. And the experimental results on WEBSPAM- UK2007 dataset demonstrate that the algorithm improves the efficiency and accuracy of SVM classifier.

Keywords:	Web spam detection method content-based spam link-based spam Co-Training
本文献已被万方数据等数据库收录！
	点击此处可从《计算机科学与探索》浏览原始摘要信息
	点击此处可从《计算机科学与探索》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏