首页 | 本学科首页   官方微博 | 高级检索  
     

基于二元搭配词的微博情感特征选择
引用本文:周剑峰,阳爱民,周咏梅,王璇璇.基于二元搭配词的微博情感特征选择[J].计算机工程,2014(6):162-165.
作者姓名:周剑峰  阳爱民  周咏梅  王璇璇
作者单位:[1]广东外语外贸大学图书馆,广州510006 [2]广东外语外贸大学思科信息学院,广州510006 [3]广东外语外贸大学西方语言文化学院,广州510006
基金项目:国家社科基金资助项目(12BYY045);教育部人文社会科学研究青年基金资助项目(10YJCZH247);教育部人文社会科学基金资助一般项目(09YJCZH019);教育部新世纪优秀人才支持计划基金资助项目(NCET-12-0939);广东省科技计划基金资助项目(2010B031000014);广东外语外贸大学校级基金资助项目(12Q22);广东外语外贸大学研究生科研创新基金资助项目.
摘    要:分析和监测微博文本中所包含的情感信息,能够挖掘用户行为,为微博舆情监管提供借鉴。但微博文本具有长度较短、不规范、存在大量变形词和新词等特点,仅以情感词为特征对微博进行分类的方法准确率较低,难以满足实际使用。为此,基于微博语料构建二元搭配词库,并根据PMI-IR算法结合语料库统计信息,提出搭配词组情感权值的计算方法PMI-IR-P。结合情感词典,采用统计方法生成微博情感特征向量,利用机器学习中的C4.5算法构建分类模型,对微博文本进行情感倾向分类。分别使用不同的数据集用于构建搭配词库及分类模型,并与基于情感词典的分类方法以及朴素贝叶斯分类方法进行对比。实验结果表明,提出的情感特征通过运用C4.5算法对微博文本情感分类的准确率达到87%,具有较好的效果。

关 键 词:搭配词库  微博情感特征  微博情感分类  机器学习  C.算法

Micro-blog Sentimental Feature Selection Based on Bigram Collocation
ZHOU Jian-feng,YANG Ai-min,ZHOU Yong-mei,WANG Xuan-xuan.Micro-blog Sentimental Feature Selection Based on Bigram Collocation[J].Computer Engineering,2014(6):162-165.
Authors:ZHOU Jian-feng  YANG Ai-min  ZHOU Yong-mei  WANG Xuan-xuan
Affiliation:1. Library; 2. Cisco School of lnformatics;3. Faculty of European Languages & Cultures Guangdong University of Foreign Studies, Guangzhou 510006, China)
Abstract:Analysis and monitoring of emotion information in micro-blog texts can help mine user behavior and offer the reference for the micro-blog public opinion supervision. However, micro-blog texts have the characteristics of short length, non-standardization, existence of a large number of anagrams and new words, etc. To classify micro-blog texts based on sentimental feature only lead poor accuracy. It is also difficult to meet practical demands. Therefore, a word stock of bigram collocation based on micro-blog corpus is constructed, and the PMI-IR-P algorithm is proposed to calculate the semantic weight of collocation based on PMI-IR algorithm. Combining the sentiment dictionary, micro-blog sentimental feature vector is generated by adopting statistical method. The C4.5 algorithm is used to establish classification models, so as to classify the sentiment polarity of the micro-biog. In the experiment, different data sets are utilized to construct collocation stock and classification models, and the result with the method based on sentiment dictionary is compared with rules as well as the Naive Bayes method. Experimental results show that with the help of C4.5 algorithm, the accuracy rate of micro-blog text sentiment classification reaches 87%, which has better effect.
Keywords:collocation dictionary  micro-blog sentimental feature  micro-blog sentimental classification  machine learning  C4  5 algorithm
本文献已被 CNKI 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号