Practical detection of spammers and content promoters in online video sharing systems期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Practical detection of spammers and content promoters in online video sharing systems

Authors:	Benevenuto Fabrício Rodrigues Tiago Veloso Adriano Almeida Jussara Gon?alves Marcos Almeida Virgílio

Affiliation:	Computer Science Department, Federal University of Ouro Preto, Ouro Preto, MG, Brazil. fabricio@dcc.ufmg.br

Abstract:	A number of online video sharing systems, out of which YouTube is the most popular, provide features that allow users to post a video as a response to a discussion topic. These features open opportunities for users to introduce polluted content, or simply pollution, into the system. For instance, spammers may post an unrelated video as response to a popular one, aiming at increasing the likelihood of the response being viewed by a larger number of users. Moreover, content promoters may try to gain visibility to a specific video by posting a large number of (potentially unrelated) responses to boost the rank of the responded video, making it appear in the top lists maintained by the system. Content pollution may jeopardize the trust of users on the system, thus compromising its success in promoting social interactions. In spite of that, the available literature is very limited in providing a deep understanding of this problem. In this paper, we address the issue of detecting video spammers and promoters. Towards that end, we first manually build a test collection of real YouTube users, classifying them as spammers, promoters, and legitimate users. Using our test collection, we provide a characterization of content, individual, and social attributes that help distinguish each user class. We then investigate the feasibility of using supervised classification algorithms to automatically detect spammers and promoters, and assess their effectiveness in our test collection. While our classification approach succeeds at separating spammers and promoters from legitimate users, the high cost of manually labeling vast amounts of examples compromises its full potential in realistic scenarios. For this reason, we further propose an active learning approach that automatically chooses a set of examples to label, which is likely to provide the highest amount of information, drastically reducing the amount of required training data while maintaining comparable classification effectiveness.

Keywords:
本文献已被 PubMed 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏