首页 | 本学科首页   官方微博 | 高级检索  
     

基于多样化特征的中文微博情感分类方法研究
引用本文:张志琳,宗成庆.基于多样化特征的中文微博情感分类方法研究[J].中文信息学报,2015,29(4):134-143.
作者姓名:张志琳  宗成庆
作者单位:中国科学院 自动化研究所 模式识别国家重点实验室,北京 100190
摘    要:随着Web 2.0时代的兴起,微博作为一个新的信息分享平台已经成为人们生活中一个重要的信息来源和传播渠道。近年来针对微博的情感分类问题研究也越来越多地引起人们的关注。该文深入分析了传统的情感文本分类和微博情感分类在特征表示和特征筛选上存在的差异,针对目前微博情感分类在特征选择和使用上存在的缺陷,提出了三种简单但十分有效的特征选取和加入方法,包括词汇化主题特征、情感词内容特征和概率化的情感词倾向性特征。实验结果表明,通过使用该文提出的特征选择和特征加入方法,微博情感分类准确率由传统方法的73.17%提高到了84.17%,显著改善了微博情感分析的性能。

关 键 词:中文微博  情感分类  机器学习    特征选择  

Sentiment Analysis of Chinese Micro Blog Based on Rich-features
ZHANG Zhilin,ZONG Chengqing.Sentiment Analysis of Chinese Micro Blog Based on Rich-features[J].Journal of Chinese Information Processing,2015,29(4):134-143.
Authors:ZHANG Zhilin  ZONG Chengqing
Affiliation:1. Shijiazhuang Vocational Technology Institute, Shijiazhuang, Hebei 050081, China;
2. Beijing Institute of Technology, Beijing 100081, China
Abstract:Micro blog, a new information-sharing platform, is now playing an important role in people’s daily live with the rise of Web 2.0. And micro blog sentiment analysis research also attracts more attention in recent years. This paper provides an in-depth analysis on the difference of feature representation and feature selection between the traditional sentiment classification and micro blog sentiment analysis. To avoid the drawbacks of feature selection of existing methods, we propose three simple but effective approaches for feature representation and selection, including the lexicalization hashtag feature, the sentiment word feature, and the probabilistic sentiment lexicon feature. Experimental results show that our proposed methods significantly boost the micro blog sentiment classification accuracy from 73.17% to 84.17%, outperforming the state-of-the-art method significantly.
Keywords:Chinese micro blog  sentiment analysis  machine learning  feature selection  
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号