新浪微博反垃圾中特征选择的重要性分析 Feature importance analysis for spammer detection in Sina Weibo期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

新浪微博反垃圾中特征选择的重要性分析

引用本文：	张宇翔,孙菀,杨家海,周达磊,孟祥飞,肖春景.新浪微博反垃圾中特征选择的重要性分析[J].通信学报,2016,37(8):24-33.

作者姓名：	张宇翔孙菀杨家海周达磊孟祥飞肖春景

作者单位：	1. 中国民航大学计算机科学与技术学院，天津 300300；2. 清华大学网络科学与网络空间研究院，北京 100084； 3. 清华信息科学与技术国家实验室，北京 100084；4. 北京邮电大学网络技术研究院，北京 100876； 5. 北京航空航天大学虚拟现实技术与系统国家重点实验室，北京 100876

基金项目：	国家重点基础研究发展计划(“973”计划)基金资助项目(No.2009CB320505)；国家科技支撑计划基金资助项目(No.2008BAH37B05)；国家自然科学基金资助项目(No.61170211, No.U1533104, No.61301245)；教育部博士点基金资助项目(No.20110002110056)

摘要：	微博中的垃圾用户非常普遍，其异常行为及生产的垃圾信息显著降低了用户体验。为了提高识别准确率，已有研究或是尽可能多地定义特征，或是不断尝试提出新的分类检测方法；那么，微博反垃圾问题的突破点优先置于寻找分类特征还是改进分类检测方法，是否特征越多检测效果越好，新的方法是否可以显著提高检测效果。以新浪微博为例, 试图通过不同的特征选择方法与不同的分类器组合实验回答以上问题，实验结果表明特征组的选择较分类器的改进更为重要，需从内容信息、用户行为和社会关系多侧面生成特征，且特征并非越多检测效果越好，这些结论将有助于未来微博反垃圾工作的突破。
关键词：	新浪微博特征生成特征选择垃圾用户检测
Feature importance analysis for spammer detection in Sina Weibo

Yu-xiang ZHANG,Yu SUN,Jia-hai YANG,Da-lei ZHOU,Xiang-fei MENG,Chun-jing XIAO.Feature importance analysis for spammer detection in Sina Weibo[J].Journal on Communications,2016,37(8):24-33.

Authors:	Yu-xiang ZHANG Yu SUN Jia-hai YANG Da-lei ZHOU Xiang-fei MENG Chun-jing XIAO

Affiliation:	1. College of Computer Science,Civil Aviation University of China,Tianjin 300300,China;2. Institute for Network Sciences and Cyberspace,Tsinghua University,Beijing 100084,China;3. Tsinghua National Laboratory for Information Science and Technology (TNList),Beijing 100084,China;4. Institue of Network Technology,Beijing University of Posts and Telecommunications,Beijing 100876,China;5. State Key Laboratory of Virtual Reality Technology and Systems,Beihang University,Beijing 100876,China

Abstract:	Microblog has drawn attention of not only legitimate users but also spammers.The garbage information pro-vided by spammers handicaps users' experience significantly.In order to improve the detection accuracy of spammers,most existing studies on spam focus on generating more classification features or putting forward new classifiers.Which kind of issues would be put the high priority of an enormous amount of research effort into? Are extensive features or novel classifiers better for the detection accuracy of spammers? It is tried to address these questions through combining different feature selection methods with different classifiers on a real Sina Weibo dataset.Experimental results show that selected features are more important than novel classifiers for spammer detection.In addition,features should be derived from a wide range,such as text contents,user behaviors,and social relationship,and the dimension of features should not be too high.These results will be useful in finding the breakpoint of Microblog anti-spam works in the future.

Keywords:	Sina Weibo feature definition feature selection spammer detection

	点击此处可从《通信学报》浏览原始摘要信息
	点击此处可从《通信学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏