首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于朴素贝叶斯的微博情感分类
引用本文:林江豪,阳爱民,周咏梅,陈锦,蔡泽键.一种基于朴素贝叶斯的微博情感分类[J].计算机工程与科学,2012,34(9):160-165.
作者姓名:林江豪  阳爱民  周咏梅  陈锦  蔡泽键
作者单位:1. 广东外语外贸大学国际工商管理学院,广东广州,510006
2. 广东外语外贸大学思科信息学院,广东广州,510006
3. 广东外语外贸大学英语语言文化学院,广东广州,510006
基金项目:国家社科基金资助项目,教育部人文社会科学研究青年资助项目,广东省科技计划资助项目,广东外语外贸大学研究生科研创新资助项目,广东外语外贸大学大学生创新实验资助项目
摘    要:本文基于二次情感特征提取算法,利用句法依存关系进行一次文本情感特征提取,在此基础上,利用情感词典,进行二次情感特征提取。构建朴素贝叶斯分类器,对采集的热门话题微博和酒店评论进行文本情感倾向性分类。主要比较了表情符号、标点符号,基于情感词典的特征提取和基于二次情感特征提取方法,在不同的组合下的分类性能,寻找更佳的微博文本情感分类预处理方法。并与酒店评论情感分类结果对比、分析,发现影响微博情感分类性能的原因。实验结果表明,二次特征提取方法在分类上取得更高的F1。实验最佳的分类预处理方式是"表情符号+标点符号+二次情感特征提取+BOOL值"。同时发现,朴素贝叶斯在酒店评论情感分类取得更高的分类性能,主要是微博评价对象多样化造成的。

关 键 词:微博  文本情感分类  二次情感特征提取  朴素贝叶斯

Classification of Microblog Sentiment Based on Na(i)ve Bayesian
LIN Jiang-hao , YANG Ai-min , ZHOU Yong-mei , CHEN Jin , CAI Ze-jian.Classification of Microblog Sentiment Based on Na(i)ve Bayesian[J].Computer Engineering & Science,2012,34(9):160-165.
Authors:LIN Jiang-hao  YANG Ai-min  ZHOU Yong-mei  CHEN Jin  CAI Ze-jian
Affiliation:1.School of Management,Guangdong University of Foreign Studies,Guangzhou 510006; 2.Cisco School of Informatics,Guangdong University of Foreign Studies,Guangzhou 510006; 3.School of English Language and Culture,Guangdong University of Foreign Studies,Guangzhou 510006,China)
Abstract:Based on the twice sentiment feature extraction approach,this paper uses syntactic dependency as the first extraction method and semantic lexicon as the second.A sentiment classifier based on nave Bayesian is constructed in order to classify the inclination of emotions from the collected hot topic data in Chinese microblog and hotel remarks.The experiments mainly compare the classification performance of different combination groups including emoticons,punctuation, extraction methods based on semantic lexicon feature and those based on twice sentiment feature to find out better pretreatment methods for sentiment classification of microblog text. Besides,the experiments also compare and analyze the sentiment classification results between microblog text and hotel remarks to seek out the reasons for influencing the classification performance of microblog sentiment.The results indicate that the twice sentiment feature extraction gain the higher F1.And the performance of "emoticons + punctuation + twice sentiment feature extraction + BOOL" is the best pretreatment method.Meanwhile,it also shows the reason why the classifier based on nave Bayesian obtains higher classification performance in hotel remarks is probably that the topic in microblog is various.
Keywords:microblog  text sentiment classification  twice sentiment feature extraction  nave Bayesian
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《计算机工程与科学》浏览原始摘要信息
点击此处可从《计算机工程与科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号