首页 | 本学科首页   官方微博 | 高级检索  
     

基于多特征的垃圾微博检测方法
引用本文:邹永潘,李伟,王儒敬.基于多特征的垃圾微博检测方法[J].计算机系统应用,2017,26(10):184-189.
作者姓名:邹永潘  李伟  王儒敬
作者单位:中国科学院 合肥物质科学研究院 合肥智能机械研究所, 合肥 230031;中国科学技术大学, 合肥 230026,中国科学院 合肥物质科学研究院 合肥智能机械研究所, 合肥 230031,中国科学院 合肥物质科学研究院 合肥智能机械研究所, 合肥 230031
基金项目:中国科学院战略性先导科技专项(XDA08040110)
摘    要:随着微博平台的快速发展,垃圾信息检测与过滤也面临着巨大的考验,实时精确地识别垃圾信息对于提高用户的体验以及微博平台的可持续发展意义重大.本文根据新浪微博的真实数据,提出了一种基于多特征的垃圾微博检测方法.首先,提取微博的显式特征(用户特征、内容特征);然后利用文档主题生成模型(LDA)提取微博中的隐含主题特征;最后根据所提取的微博特征利用支持向量机(SVM)构建分类器.实验结果表明,该方法相比于现有方法在准确率和F1值方面都有一定的提升.

关 键 词:垃圾微博检测  隐含狄利克雷分布  支持向量机
收稿时间:2017/1/16 0:00:00

Detection Method of Spam Based on Multi-Features of Micro-Blog
ZOU Yong-Pan,LI Wei and WANG Ru-Jing.Detection Method of Spam Based on Multi-Features of Micro-Blog[J].Computer Systems& Applications,2017,26(10):184-189.
Authors:ZOU Yong-Pan  LI Wei and WANG Ru-Jing
Affiliation:Institute of Intelligent Machines, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei 230031, China;University of Science and Technology of China, Hefei 230026, China,Institute of Intelligent Machines, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei 230031, China and Institute of Intelligent Machines, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei 230031, China
Abstract:With the rapid development of micro-blog, spam detection and filtering is faced with enormous challenges. It is significant to realize realtime and accurate detection of spam, which is important to improve user experience and the sustainable development of micro-blog platform. In this paper, a spam detection method based on multi-features of micro-blog is proposed. The main procedures are:first, the features of user and content are extracted. Second, LDA is applied to extract latent topic features. Finally, the features above are fused and a proper classifier is trained based on SVM. Experimental results show that the precision and F1 get increased while adopting the method proposed in this paper compared to the pervious methods.
Keywords:spam detection  latent Dirichlet allocation  support vector machine
点击此处可从《计算机系统应用》浏览原始摘要信息
点击此处可从《计算机系统应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号