首页 | 本学科首页   官方微博 | 高级检索  
     

基于Doc2Vec与SVM的聊天内容过滤
引用本文:岳文应. 基于Doc2Vec与SVM的聊天内容过滤[J]. 计算机系统应用, 2018, 27(7): 127-132
作者姓名:岳文应
作者单位:浙江理工大学 信息学院, 杭州 310018
摘    要:直播系统中用户聊天内容的实时拦截具有非常重大的意义,为了提高分类的准确率和效率,提出了一种基于Doc2Vec与SVM结合的文本分类模型对聊天内容分类,判断聊天内容是否应该被拦截.首先使用Doc2Vec模型将聊天内容表示成密集数值向量的形式,第二部分使用SVM分类器进行分类.通过实验表明,该模型有效地减少了文本表示的维度,提高了训练效率,而且具有的97%的准确率和89.82%召回率,性能优于朴素贝叶斯和基于Doc2Vec的Logistic模型.

关 键 词:文本分类  自然语言处理  Doc2Vec模型  支持向量机
收稿时间:2017-10-16
修稿时间:2017-11-03

Chat Content Filtering Based on Doc2Vec and SVM
YUE Wen-Ying. Chat Content Filtering Based on Doc2Vec and SVM[J]. Computer Systems& Applications, 2018, 27(7): 127-132
Authors:YUE Wen-Ying
Affiliation:School of Information Science and Technology, Zhejiang Sci-Tech University, Hangzhou 310018, China
Abstract:The real-time interception of user chat content in live broadcast system is of great significance. In order to improve the accuracy and efficiency of the classification, a text classification model based on the combination of Doc2Vec and SVM is proposed to classify the chat content and judge whether the chat content should be intercepted. The First part uses the Doc2Vec model to represent the chat content as a dense numeric vector, and then an SVM classifier is used to classify. The experimental results show that the model greatly reduces the dimension of text representation with high efficiency, and it has excellent accuracy rate (97%) and recall rate (89.82%), which are superior to Naive Bayes and the logistic based on Doc2Vec.
Keywords:text classification  Natural Language Processing (NLP)  Doc2Vec model  Support Vector Machine (SVM)
点击此处可从《计算机系统应用》浏览原始摘要信息
点击此处可从《计算机系统应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号