首页 | 本学科首页   官方微博 | 高级检索  
     

基于TF-IDF和改进BP神经网络的社交平台垃圾文本过滤
引用本文:王杨,王非凡,张舒宜,黄少芬,许闪闪,赵晨曦,赵传信.基于TF-IDF和改进BP神经网络的社交平台垃圾文本过滤[J].计算机系统应用,2019,28(3):126-132.
作者姓名:王杨  王非凡  张舒宜  黄少芬  许闪闪  赵晨曦  赵传信
作者单位:安徽师范大学计算机与信息学院,芜湖,241000;安徽师范大学计算机与信息学院,芜湖,241000;安徽师范大学计算机与信息学院,芜湖,241000;安徽师范大学计算机与信息学院,芜湖,241000;安徽师范大学计算机与信息学院,芜湖,241000;安徽师范大学计算机与信息学院,芜湖,241000;安徽师范大学计算机与信息学院,芜湖,241000
基金项目:国家自然科学基金(61572036);安徽省社科规划项目(AHSKY2017D42);安徽省重大人文社科基金(SK2014ZD033)
摘    要:近年来,随着生活节奏的提高和互联网的迅速发展,人们更倾向于在众多社交平台上用短文本进行交流,进而可能有人通过发布垃圾文本妨碍人们的正常社交,扰乱网络的绿色环境.为了解决这个问题,我们提出了基于TF-IDF和改进BP神经网络的社交平台垃圾文本检测的方法.通过该方法,实现对社交平台上的垃圾文本过滤.首先,通过结巴分词和去停分词构造关键词数据集;其次,对文本表示的关键词向量运用计算各关键词的权重从而对文本向量进行降维,得到特征向量;最后,在此基础上,运用BP神经网络分类器对短文本进行分类,检测出垃圾文本并进行过滤.实验结果表明用该方法在1000维文本特征向量的情况下分类平均准确率达到了97.720%.

关 键 词:TF-IDF  改进BP神经网络  结巴分词  垃圾文本过滤
收稿时间:2018/9/27 0:00:00
修稿时间:2018/10/23 0:00:00

Social Platform Spam Filtering Based on TF-IDF and Optimized BP Neural Network
WANG Yang,WANG Fei-Fan,ZHANG Shu-Yi,HUANG Shao-Fen,XU Shan-Shan,ZHAO Chen-Xi and ZHAO Chuan-Xin.Social Platform Spam Filtering Based on TF-IDF and Optimized BP Neural Network[J].Computer Systems& Applications,2019,28(3):126-132.
Authors:WANG Yang  WANG Fei-Fan  ZHANG Shu-Yi  HUANG Shao-Fen  XU Shan-Shan  ZHAO Chen-Xi and ZHAO Chuan-Xin
Affiliation:School of Computer and Information, Anhui Normal University, Wuhu 241000, China,School of Computer and Information, Anhui Normal University, Wuhu 241000, China,School of Computer and Information, Anhui Normal University, Wuhu 241000, China,School of Computer and Information, Anhui Normal University, Wuhu 241000, China,School of Computer and Information, Anhui Normal University, Wuhu 241000, China,School of Computer and Information, Anhui Normal University, Wuhu 241000, China and School of Computer and Information, Anhui Normal University, Wuhu 241000, China
Abstract:In recent years, with the improvement of the pace of life and the rapid development of the Internet, people are more inclined to communicate with the short text on many social platforms, and then some people can disturb the network''s green environment by releasing the spam texts to hinder the normal social intercourse. In order to solve this problem, we propose a method of spam text detection based on optimized BP neural network and social platform. Through this method, the spam text filtering on the social platform is realized. First of all, through the stuttering participle and to stop word to construct keyword data set. Secondly, the keyword vector of the text expression is used to compute the weights of each keyword so as to reduce the dimension of the text vector and obtain the eigenvector. Finally, based on this, the BP neural network classifier is used to classify the short texts, and the spam text is detected and filtered. The experimental results show that with this method, the average classification accuracy for the 1000 dimensional text feature vector reaches 97.720%.
Keywords:TF-IDF  optimized BP neural network  stuttering participle  junk text filtering
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机系统应用》浏览原始摘要信息
点击此处可从《计算机系统应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号