首页 | 本学科首页   官方微博 | 高级检索  
     

结合互信息和主题模型的微博话题发现方法
引用本文:孙曰昕,马慧芳,姚 伟,张志昌.结合互信息和主题模型的微博话题发现方法[J].计算机工程与应用,2016,52(6):61-66.
作者姓名:孙曰昕  马慧芳  姚 伟  张志昌
作者单位:西北师范大学 计算机科学与工程学院,兰州 730070
摘    要:为了解决短文本信息流的特征稀疏性对热点话题发现带来的挑战,提出了结合词语互信息和概率主题模型的微博热点话题发现方法。通过建立词共现矩阵并应用对称非负矩阵分解算法获取词项-主题矩阵,再利用概率潜在语义分析模型进行主题发现,最终通过定义微博热度分析和排序,有效地支持微博热点话题发现。实验表明,此方法能有效地进行话题聚类并检测出热点话题。

关 键 词:词共现矩阵  对称非负矩阵分解  概率潜在语义分析  微博热点话题发现  

Microblog hot topic detection based on positive point mutual information and probabilistic topic model
SUN Yuexin,MA Huifang,YAO Wei,ZHANG Zhichang.Microblog hot topic detection based on positive point mutual information and probabilistic topic model[J].Computer Engineering and Applications,2016,52(6):61-66.
Authors:SUN Yuexin  MA Huifang  YAO Wei  ZHANG Zhichang
Affiliation:College of Computer Science and Engineering, Northwest Normal University, Lanzhou 730070, China
Abstract:In order to face the challenges of feature sparsely of short text messages for microblog hot topic detection, this paper proposes a hot topic detection method based on the combination of term mutual information and probabilistic topic model. Symmetric Nonnegative Matrix Factorization(sNMF) is performed on word co-occurrence with word mutual information and the matrix of term-topic matrix is thereafter inferred. Probabilistic Latent Semantic Analysis(pLSA) model is then adopted to model the topic-microblog. The hotness of topic is analyzed and sorted. Experiments show that this method can effectively cluster and detect the hot topics.
Keywords:term co-occurrence matrix  symmetrical nonnegative matrix factorization  probabilistic latent semantic analysis  micro-blog hot topic detection  
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号