首页 | 本学科首页   官方微博 | 高级检索  
     

基于词向量的微博事件追踪方法
引用本文:张佳明,席耀一,王 波,唐浩浩,李天彩.基于词向量的微博事件追踪方法[J].计算机工程与应用,2016,52(17):73-78.
作者姓名:张佳明  席耀一  王 波  唐浩浩  李天彩
作者单位:解放军信息工程大学 信息系统工程学院,郑州 450001
摘    要:微博文本长度短,且网络新词层出不穷,使得传统方法在微博事件追踪中效果不够理想。针对该问题,提出一种基于词向量的微博事件追踪方法。词向量不仅可以计算词语之间的语义相似度,而且能够提高微博间语义相似度计算的准确率。该方法首先使用Skip-gram模型在大规模数据集上训练得到词向量;然后通过提取关键词建立初始事件和微博表示模型;最后利用词向量计算微博和初始事件之间的语义相似度,并依据设定阈值进行判决,完成事件追踪。实验结果表明,相比传统方法,该方法能够充分利用词向量引入的语义信息,有效提高微博事件追踪的性能。

关 键 词:微博  事件追踪  短文本  Skip-gram模型  词向量  语义信息  

Method of micro-blog event tracking based on word vector
ZHANG Jiaming,XI Yaoyi,WANG Bo,TANG Haohao,LI Tiancai.Method of micro-blog event tracking based on word vector[J].Computer Engineering and Applications,2016,52(17):73-78.
Authors:ZHANG Jiaming  XI Yaoyi  WANG Bo  TANG Haohao  LI Tiancai
Affiliation:Institute of Information and System Engineering, PLA Information Engineering University, Zhengzhou 450001, China
Abstract:The traditional methods in micro-blog events tracking do not achieve good performance, because the length of micro-blog text is shorter and the cyber-words emerge constantly. To solve this problem, a method of micro-blog event tracking based on word vector is proposed. By using word vector, semantic similarity between the words can be computed, and the accuracy of semantic similarity between micro-blogs can also be improved. Firstly, the Skip-gram model is trained to get the word vector by using a large dataset. Then, the models for initial event and micro-blogs are constructed by extracting the keywords. Finally, the semantic similarities between micro-blogs and the initial event are computed through word vector, and the task of event tracking is completed according to the decision of pre-defined threshold. The experimental results show that the proposed method can make full use of semantic information contained by word vector, which can effectively improve the tracking performance compared with traditional methods.
Keywords:micro-blog  event tracking  short text  Skip-gram model  word vector  semantic information  
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号