首页 | 本学科首页   官方微博 | 高级检索  
     

基于词语相关度的微博新情感词自动识别
引用本文:陈鑫,王素格,廖健.基于词语相关度的微博新情感词自动识别[J].计算机应用,2016,36(2):424-427.
作者姓名:陈鑫  王素格  廖健
作者单位:1. 山西大学 计算机与信息技术学院, 太原 030006;2. 计算智能与中文信息处理教育部重点实验室(山西大学), 太原 030006
基金项目:国家863计划项目(2015AA015407);国家自然科学基金资助项目(61175067,61272095,61432011,61573231,U1435212);山西省科技基础条件平台计划项目(2015091001-0102);山西省回国留学人员科研项目(2013-014)。
摘    要:针对微博中新情感词的识别问题,提出了一种基于词语相关度的微博新情感词自动识别方法。首先,对于分词软件把一个新词错分成几个词的问题,利用组合思想将相邻词进行合并作为新词的候选词;其次,为了充分利用词语上下文的语义信息,采用神经网络训练语料获得候选新词的空间表示向量;最后,利用已有的情感词典作为指导,融合基于词表集合的关联度排序和最大关联度排序算法,在候选词上筛选,获得最终的情感新词。在COAE2014(第六届中文倾向性分析评测)任务3语料上,提出的融合算法与点互信息(PMI)、增强互信息(EMI)、多词表达距离(MED)、新词语概率(NWP)以及基于词向量的新词识别方法相比,准确率至少提高了22%,说明该方法自动识别微博新情感词效果优于其他五种方法。

关 键 词:情感词识别  词语相关度  词向量  排序算法  微博  
收稿时间:2015-08-29
修稿时间:2015-09-13

Automatic identification of new sentiment word about microblog based on word association
CHEN Xin,WANG Suge,LIAO Jian.Automatic identification of new sentiment word about microblog based on word association[J].journal of Computer Applications,2016,36(2):424-427.
Authors:CHEN Xin  WANG Suge  LIAO Jian
Affiliation:1. School of Computer and Information Technology, Shanxi University, Taiyuan Shanxi 030006, China;2. Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education(Shanxi University), Taiyuan Shanxi 030006, China
Abstract:Aiming at new sentiment word identification, an automatic extraction of new words about microblog was proposed based on the word association. Firstly, a new word, which was incorrectly separated into several words using the Chinese auto-segmentation system, should be assembled as the candidate word. In addition, to make full use of the semantic information of word context, the spatial representation vector of the candidate words was obtained by training a neural network. Finally, using the existing emotional vocabulary as a guide, combining the association-sort algorithm based on vocabulary list and the max association-sort algorithm, the final new emotional word was selected from candidate words. The experimental results on the task No. 3 of COAE2014 show that the precision of the proposed method increases at least 22%, compared to Pointwise Mutual Information (PMI), Enhanced Mutual Information (EMI), Normalized Multi-word Expression Distance (NMED), New Word Probability (NWP), and identification of new sentiment word based on word embedding, which proves the effectiveness of the proposed method.
Keywords:sentiment word recognition                                                                                                                        word association                                                                                                                        word vector                                                                                                                        sort algorithm                                                                                                                        microblog
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号