首页 | 本学科首页   官方微博 | 高级检索  
     

MB-HL模型的微博主题挖掘研究*
引用本文:蒋 权,郑山红,刘 凯. MB-HL模型的微博主题挖掘研究*[J]. 计算机应用研究, 2018, 35(11)
作者姓名:蒋 权  郑山红  刘 凯
作者单位:长春工业大学,长春工业大学,长春工业大学
基金项目:吉林省自然科学基金资助项目、吉林省教育厅“十二五”科学技术研究基金资助项目
摘    要:为了解决传统的文本主题模型对微博主题挖掘准确率低及不考虑主题之间关联的问题,针对中文微博语料本身的特点,分析LDA和HMM模型优缺点,提出了微博主题挖掘模型MB-HL(Microblog-Hidden Markov Model Latent Dirichlet Allocation)。该模型用逐条微博作为处理单元,建立分布主题-词语矩阵并进行优化,通过LDA模型对微博用户不同的行为建模并提取特征,利用HMM模型强大的时序状态建模能力弥补LDA在主题相关性上的不足,采用Gibbs采样进行推理求解。在真实的新浪微博数据上对比实验表明MB-HL模型能提高近9%主题关键词的准确度,并能有效地发现主题之间的关联关系。

关 键 词:微博  主题挖掘  LDA模型  HMM模型  MB-HL模型  Gibbs采样
收稿时间:2017-06-10
修稿时间:2018-09-30

The study of topic mining for microblog based on MB-HL model
Jiang Quan,Zhen Shanhong and Liu Kai. The study of topic mining for microblog based on MB-HL model[J]. Application Research of Computers, 2018, 35(11)
Authors:Jiang Quan  Zhen Shanhong  Liu Kai
Affiliation:Changchun University of Technology,,
Abstract:In order to solve the problem of micro-blog theme mining in the lower accuracy and without considering to relation of between themes in the traditional text topic model, according to the characteristics of Chinese micro-blog corpus,analyzing the advantages and disadvantages of LDA and HMM model , micro-blog theme mining model MB-HL(Microblog-Hidden Markov Model Latent Dirichlet Allocation) was proposed. It used by micro-blog as the processing unit, proposed and optimized the distributed toipic-word matrix ,the model was made by different micro-blog users behavior and feature extraction with LDA, utilized time state modeling ability of HMM model to make up the lack of the strong correlation of the theme for LDA,presented a Gibbs sampling implementation for inference of our model. Experimental results on actual Sina micro-blog dateset show that MB-HL model can improve the topic Keywordsof accuracy nearly 9%, and can effectively find the relationship between topics.
Keywords:microblog   topic mining   LDA mdoel    HMM model   MB-HL model   Gibbs sampling
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号