首页 | 本学科首页   官方微博 | 高级检索  
     

融合异常检测与随机森林的微博转发行为预测方法
引用本文:周先亭,黄文明,邓珍荣.融合异常检测与随机森林的微博转发行为预测方法[J].计算机科学,2017,44(7):191-196, 220.
作者姓名:周先亭  黄文明  邓珍荣
作者单位:桂林电子科技大学计算机与信息安全学院 桂林541004,桂林电子科技大学广西可信软件重点实验室 桂林541004,桂林电子科技大学广西可信软件重点实验室 桂林541004
基金项目:本文受广西科技攻关项目(桂科攻1598019-6)资助
摘    要:针对目前微博转发行为预测具有的特征选择任意性、准确率不高的问题,提出了融合异常检测与随机森林的微博转发行为预测方法。首先,提取用户基本特征、博文基本特征、博文内容主题特征,并基于相对熵计算用户活跃度、博文影响力;其次,通过结合过滤式与封装式特征选择方法筛选出关键特征组;最后,融合异常检测与随机森林算法,依据筛选后的关键特征组进行微博转发行为预测,并利用袋外数据误差估计设置随机森林中的决策树和特征数。在真实新浪微博数据集上与基于逻辑回归、决策树、朴素贝叶斯、随机森林等算法的微博转发行为预测方法进行实验对比,结果表明所提方法的预测准确率(90.5%) 高于基准方法中最优的随机森林方法的预测准确率,同时验证了特征筛选方法的有效性。

关 键 词:转发预测  随机森林  异常检测  特征筛选  相对熵
收稿时间:2016/6/19 0:00:00
修稿时间:2016/9/22 0:00:00

Micro-blog Retweet Behavior Prediction Algorithm Based on Anomaly Detection and Random Forest
ZHOU Xian-ting,HUANG Wen-ming and DENG Zhen-rong.Micro-blog Retweet Behavior Prediction Algorithm Based on Anomaly Detection and Random Forest[J].Computer Science,2017,44(7):191-196, 220.
Authors:ZHOU Xian-ting  HUANG Wen-ming and DENG Zhen-rong
Affiliation:School of Computer and Information Security,Guilin University of Electronic Technology,Guilin 541004,China,Guangxi Key Laboratory of Trusted Software,Guilin University of Electronic Technology,Guilin 541004,China and Guangxi Key Laboratory of Trusted Software,Guilin University of Electronic Technology,Guilin 541004,China
Abstract:Aiming to solve the issue that the accuracy of micro-blog retweet behavior prediction is not good enough and features are selected with an arbitrary choice,a new method using anomaly detection and random forest algorithms to predict micro-blog retweet behavior was proposed.Firstly,the basic features of the user,the basic characteristics of blog and blog content theme features are extracted,and the user activity and blog influence are calculated based on relative entropy.Secondly,the best feature set are selected by combining the filter and wrapper feature selection method.Finally,anomaly detection and random forest algorithms are fused to predict micro-blog retweet behavior based on selected features.The algorithm parameters of random forest are selected by analyzing the error estimation of out of bag data.By contrasting with Logistic Regression,Decision Tree,Naive Bias and Random Forest algorithms,which are used in the analysis for micro-blog retweet behavior,the prediction accuracy of the proposed method is higher than that of the optimal random forest method on real data,and reaches 90.5%.Meanwhile,the validity of feature selection method is verified.
Keywords:Retweet prediction  Random forest  Anomaly detection  Feature filter  Relative entropy
点击此处可从《计算机科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号