首页 | 本学科首页   官方微博 | 高级检索  
     

结合词向量和聚类算法的新闻评论话题演进分析
引用本文:林江豪,周咏梅,阳爱民,王伟. 结合词向量和聚类算法的新闻评论话题演进分析[J]. 计算机工程与科学, 2016, 38(11): 2368-2374
作者姓名:林江豪  周咏梅  阳爱民  王伟
作者单位:;1.广东外语外贸大学语言工程与计算实验室;2.广东外语外贸大学思科信息学院
基金项目:国家社科基金项目(12BYY045);广东省哲学社会科学“十二五”规划项目(GD15YTS01)
摘    要:话题演进分析主要是挖掘话题内容随着时间流的演进情况。话题的内容可用关键词来表示。利用word2vec对75万篇新闻和微博文本进行训练,得到词向量模型。将文本流处理后输入模型,获得时间序列下所有词汇的词向量,利用K-means对词向量进行聚类,从而实现话题关键词的抽取。实验对比了基于PLSA和LDA主题模型下的话题抽取效果,发现本文的话题分析效果优于主题模型的方法。同时,采集足够大量、内容足够丰富的语料,可训练得到泛化能力比较强的模型,有利于实时话题演进分析研究工作。

关 键 词:话题演进  word2vec  PLSA  LDA  
收稿时间:2016-07-01
修稿时间:2016-11-25

Analysis on topic evolution of news comments bycombining word vector and clustering algorithm
LIN Jiang hao,ZHOU Yong mei,YANG Ai min,WANG Wei. Analysis on topic evolution of news comments bycombining word vector and clustering algorithm[J]. Computer Engineering & Science, 2016, 38(11): 2368-2374
Authors:LIN Jiang hao  ZHOU Yong mei  YANG Ai min  WANG Wei
Affiliation:(1.Laboratory for Language Engineering and Computing,Guangdong University of ForeignStudies,Guangzhou 510006;2.Cisco School of Informatics,Guangdong University of Foreign Studies,Guangzhou510006,China)
Abstract:The analysis of topic evolution is regarded as the mining of topic content evolving withthe time. This article, based on the hypothesis that topic content may be embodied by keywords, adopt word2vec for the training of 750 thousand pieces of news and micro blog textsto establish the model of word vector. The text information flow is applied to the modeland all word vectors by time series are acquired. K means is used to cluster the wordvectors before the key words are drawn and the analysis of topic evolution is visualized.By comparing the effect of the word vector model with those of PLSA or LDA topic models ondrawing topic, the results show that the former is more effective than the latter twomodels. In addition, the collection of abundant and varied data can facilitate the trainingof the word vector model with better generalization ability and the investigation on real time analysis of topic evolution.
Keywords:topic evolution  word2vec  PLSA  LDA  
点击此处可从《计算机工程与科学》浏览原始摘要信息
点击此处可从《计算机工程与科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号