首页 | 本学科首页   官方微博 | 高级检索  
     

新浪微博反垃圾中特征选择的重要性分析
引用本文:张宇翔,孙 菀,杨家海,周达磊,孟祥飞,肖春景.新浪微博反垃圾中特征选择的重要性分析[J].通信学报,2016,37(8):24-33.
作者姓名:张宇翔  孙 菀  杨家海  周达磊  孟祥飞  肖春景
作者单位:1. 中国民航大学计算机科学与技术学院,天津 300300;2. 清华大学网络科学与网络空间研究院,北京 100084; 3. 清华信息科学与技术国家实验室,北京 100084;4. 北京邮电大学网络技术研究院,北京 100876; 5. 北京航空航天大学虚拟现实技术与系统国家重点实验室,北京 100876
基金项目:国家重点基础研究发展计划(“973”计划)基金资助项目(No.2009CB320505);国家科技支撑计划基金资助项目(No.2008BAH37B05);国家自然科学基金资助项目(No.61170211, No.U1533104, No.61301245);教育部博士点基金资助项目(No.20110002110056)
摘    要:微博中的垃圾用户非常普遍,其异常行为及生产的垃圾信息显著降低了用户体验。为了提高识别准确率,已有研究或是尽可能多地定义特征,或是不断尝试提出新的分类检测方法;那么,微博反垃圾问题的突破点优先置于寻找分类特征还是改进分类检测方法,是否特征越多检测效果越好,新的方法是否可以显著提高检测效果。以新浪微博为例, 试图通过不同的特征选择方法与不同的分类器组合实验回答以上问题,实验结果表明特征组的选择较分类器的改进更为重要,需从内容信息、用户行为和社会关系多侧面生成特征,且特征并非越多检测效果越好,这些结论将有助于未来微博反垃圾工作的突破。

关 键 词:新浪微博  特征生成  特征选择  垃圾用户检测

Feature importance analysis for spammer detection in Sina Weibo
Yu-xiang ZHANG,Yu SUN,Jia-hai YANG,Da-lei ZHOU,Xiang-fei MENG,Chun-jing XIAO.Feature importance analysis for spammer detection in Sina Weibo[J].Journal on Communications,2016,37(8):24-33.
Authors:Yu-xiang ZHANG  Yu SUN  Jia-hai YANG  Da-lei ZHOU  Xiang-fei MENG  Chun-jing XIAO
Affiliation:1. College of Computer Science,Civil Aviation University of China,Tianjin 300300,China;2. Institute for Network Sciences and Cyberspace,Tsinghua University,Beijing 100084,China;3. Tsinghua National Laboratory for Information Science and Technology (TNList),Beijing 100084,China;4. Institue of Network Technology,Beijing University of Posts and Telecommunications,Beijing 100876,China;5. State Key Laboratory of Virtual Reality Technology and Systems,Beihang University,Beijing 100876,China
Abstract:Microblog has drawn attention of not only legitimate users but also spammers.The garbage information pro-vided by spammers handicaps users' experience significantly.In order to improve the detection accuracy of spammers,most existing studies on spam focus on generating more classification features or putting forward new classifiers.Which kind of issues would be put the high priority of an enormous amount of research effort into? Are extensive features or novel classifiers better for the detection accuracy of spammers? It is tried to address these questions through combining different feature selection methods with different classifiers on a real Sina Weibo dataset.Experimental results show that selected features are more important than novel classifiers for spammer detection.In addition,features should be derived from a wide range,such as text contents,user behaviors,and social relationship,and the dimension of features should not be too high.These results will be useful in finding the breakpoint of Microblog anti-spam works in the future.
Keywords:Sina Weibo  feature definition  feature selection  spammer detection
点击此处可从《通信学报》浏览原始摘要信息
点击此处可从《通信学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号