首页 | 本学科首页   官方微博 | 高级检索  
     

基于主观倾向值和EasyEnsemble算法的虚假评论识别方法
引用本文:陶朝杰,杨进.基于主观倾向值和EasyEnsemble算法的虚假评论识别方法[J].计算机应用研究,2021,38(5):1403-1408.
作者姓名:陶朝杰  杨进
作者单位:上海理工大学理学院,上海200093
基金项目:国家教育部人文社科规划基金资助项目(16YJA630037);上海市一流学科建设项目(S1201YLXK)。
摘    要:为了有效识别在线虚假评论,提出一种基于XGBoost-EasyEnsemble算法的虚假评论识别方法。首先,根据虚假评论的特点和提出的主观倾向值计算方法,建立多维特征模型;其次,针对评论数据中的类别不平衡问题,EasyEnsemble算法借助集成策略弥补欠采样的缺陷,充分利用样本信息;最后,选择“好而不同”的XGBoost模型作为基分类器训练最终分类器。基于Yelp网站上的评论数据,以AUC作为评价指标,与支持向量机、GBDT、神经网络等热门机器学习算法进行对比,验证了该方法的有效性。

关 键 词:虚假评论  类别不平衡  主观倾向值  EasyEnsemble  XGBoost
收稿时间:2020/6/1 0:00:00
修稿时间:2021/4/9 0:00:00

Detection of spam reviews based on subjectivity and easyensemble algorithm
taochaojie and yangjin.Detection of spam reviews based on subjectivity and easyensemble algorithm[J].Application Research of Computers,2021,38(5):1403-1408.
Authors:taochaojie and yangjin
Affiliation:(College of Science,University of Shanghai for Science&Technology,Shanghai 200093,China)
Abstract:In order to detect online spam reviews effectively,this paper proposed a method to detect spam reviews based on XGBoost-EasyEnsemble algorithm.Firstly,according to the characteristics of spam reviews,this paper proposed a calculation method of subjectivity and built a multi-dimensional feature model.Secondly,in view of the class-imbalance problem,EasyEnsemble algorithm used integration strategy to make up for the defects of the under-sampling method,and fully utilized sample information.Finally,it chose XGBoost model with higher diversity and accuracy as base classifier to train.In terms of AUC,comparative experiments on reviews from Yelp.com was conducted with five hot machine learning algorithms,and the results verify the validity of the method.
Keywords:spam review  class-imbalance  subjectivity  EasyEnsemble  XGBoost
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号