基于双层采样主动学习的社交网络虚假用户检测方法 Two-layer Sampling Active Learning Algorithm for Social Spammer Detection期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于双层采样主动学习的社交网络虚假用户检测方法

引用本文：	谭侃,高旻,李文涛,田仁丽,文俊浩,熊庆宇.基于双层采样主动学习的社交网络虚假用户检测方法[J].自动化学报,2017,43(3):448-461.

作者姓名：	谭侃高旻李文涛田仁丽文俊浩熊庆宇

作者单位：	1.信息物理社会可信服务计算教育部重点实验室重庆 400044 中国

基金项目：	国家重点基础研究发展计划（973计划）（2013CB328903），重庆市基础与前沿研究计划（cstc2015jcyjA40049），国家自然科学基金（71102065），国家科技支撑计划（2015BAF05B03），中央高校基础研究基金（106112014CDJZR095502）资助

摘要：	社交网络的飞速发展给用户带来了便捷，但是社交网络开放性的特点使得其容易受到虚假用户的影响.虚假用户借用社交网络传播虚假信息达到自身的目的，这种行为严重影响着社交网络的安全性和稳定性.目前社交网络虚假用户的检测方法主要通过用户的行为、文本和网络关系等特征对用户进行分类，由于人工标注用户数据需要的代价较大，导致分类器能够使用的标签样本不足.为解决此问题，本文提出一种基于双层采样主动学习的社交网络虚假用户检测方法，该方法使用样本不确定性、代表性和多样性3个指标评估未标记样本的价值，并使用排序和聚类相结合的双层采样算法对未标记样本进行筛选，选出最有价值的样本给专家标注，用于对分类模型的训练.在Twitter、Apontador和Youtube数据集上的实验说明本文所提方法在标签样本数量不足的情况下，只使用少量有标签样本就可以达到与有监督学习接近的检测效果；并且，对比其他主动学习方法，本文方法具有更高的准确率和召回率，需要的标签样本数量更少.
关键词：	社交网络虚假用户主动学习样本多样性
收稿时间：	2016-04-05
Two-layer Sampling Active Learning Algorithm for Social Spammer Detection

TAN Kan,GAO Min,LI Wen-Tao,TIAN Ren-Li,WEN Jun-Hao,XIONG Qing-Yu.Two-layer Sampling Active Learning Algorithm for Social Spammer Detection[J].Acta Automatica Sinica,2017,43(3):448-461.

Authors:	TAN Kan GAO Min LI Wen-Tao TIAN Ren-Li WEN Jun-Hao XIONG Qing-Yu

Affiliation:	1.Key Laboratory of Dependable Service Computing in Cyber Physical Society, Ministry of Education, Chongqing 400044, China2.School of Software Engineering, Chongqing University, Chongqing 400044, China3.Centre for Quantum Computation and Intelligent Systems, Faculty of Engineering and Information Technology, University of Technology Sydney, Sydney, NSW 2007, Australia4.Guangzhou Boguan Telecommunication Technology Limited, Guangzhou 501665, China

Abstract:	With the rapid development of social network, more and more people join in social network to make friends and share their views. However, social network is always suffering from fake accounts due to its openness. Fake accounts, also called spammers, always spread spam information to achieve their own purpose, which have destroyed the security and reliability of social network. Existing detection methods extract behaviour, text and relationship features of users, and then use machine learning algorithms to identify social spammers. But machine learning algorithms often suffer from insufficiently labeled training data. Aiming to solve this problem, we propose an efficient algorithm, called two-layer sampling active learning, to construct an accurate classifier with minimum labeled samples. We present three criteria (uncertainty, representative and diversity) to quantity the value of unlabeled samples, using the combination of sorting and clustering to actively select samples with max uncertainty, max representative and max diversity. Experimental results on Twitter, Apontador, and Youtube datasets prove the efficiency of our approach, and better precision and recall of our approach than other active learning methods.

Keywords:	Social network spammer active learning diversity of samples

	点击此处可从《自动化学报》浏览原始摘要信息
	点击此处可从《自动化学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏