首页 | 本学科首页   官方微博 | 高级检索  
     

基于词向量语义聚类的微博热点挖掘方法
引用本文:刘培磊,唐晋韬,王挺,谢松县,岳大鹏,刘海池.基于词向量语义聚类的微博热点挖掘方法[J].计算机工程与科学,2018,40(2):313-319.
作者姓名:刘培磊  唐晋韬  王挺  谢松县  岳大鹏  刘海池
作者单位:(国防科技大学计算机学院,湖南 长沙 410073)
基金项目:国家自然科学基金(61532001,61472436)
摘    要:随着社交媒体的迅速发展,信息过载问题越发严重,因此如何从海量、短小而充满噪声的社交媒体数据中发现和挖掘出热点话题或者热点事件成为一个重要的问题。结合社交媒体数据实时性、地理性、包含较多元数据等特点,提出了用户行为分析与文本内容分析相结合的热点挖掘方法。在内容分析过程中,提出了从更细的词语粒度进行聚类,以代替传统的在消息粒度进行聚类的经典方法。为了提高话题关键词提取的效果,引入了基于词向量技术,并通过语义聚类的方法进行热点挖掘。在真实数据集上的实验结果表明,该方法提取的关键词语义关联性强、话题划分效果好,在主要指标上优于传统的热点挖掘方法。

关 键 词:热点挖掘  社交媒体  词向量  语义聚类  
收稿时间:2016-03-29
修稿时间:2018-02-25

A Twitter hotspot mining method based on sematic clustering of word vectors
LIU Pei-lei,TANG Jin-tao,WANG Ting,XIE Song-xian,YUE Da-peng,LIU Hai-chi.A Twitter hotspot mining method based on sematic clustering of word vectors[J].Computer Engineering & Science,2018,40(2):313-319.
Authors:LIU Pei-lei  TANG Jin-tao  WANG Ting  XIE Song-xian  YUE Da-peng  LIU Hai-chi
Affiliation:(College of Computer,National University of Defense Technology,Changsha 410073,China)
Abstract:With the rapid development of social media, information overloading becomes a challenge. As a result, how to mining hotspots automatically from so many short and noisy data is an important problem. Social data are real-time and geographic, which usually contain plenty of meta-information. According to these characteristics, this paper proposes a hotspot mining method, which combines user’s behavior patterns and text content analysis. In the process of content analysis, we cluster text on the word scale rather than message scale. Besides, sematic clustering technology of word vectors is used for promoting the performance of keywords extraction. Experimental results on real datasets show that this method is better than traditional methods. Specifically, keywords extracted by this method have strong semantic relevance and good topic segmentation, which are superior to the traditional hot-spot mining methods on the main indexes.
Keywords:hotspot mining  Twitter  word embedding  semantic clustering  
点击此处可从《计算机工程与科学》浏览原始摘要信息
点击此处可从《计算机工程与科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号