首页 | 本学科首页   官方微博 | 高级检索  
     

基于主题聚簇评价的论坛热点话题挖掘
引用本文:江浩 陈兴蜀 杜敏. 基于主题聚簇评价的论坛热点话题挖掘[J]. 计算机应用, 2013, 33(11): 3071-3075
作者姓名:江浩 陈兴蜀 杜敏
作者单位:四川大学 计算机学院,成都 610065
基金项目:国家科技支撑计划课题项目
摘    要:热点话题挖掘是舆情监控的重要技术基础。针对现有的论坛热点话题挖掘方法没有解决数据中词汇噪声较多且热度评价方式单一的问题,提出一种基于主题聚簇评价的热点话题挖掘方法。采用潜在狄里克雷分配主题模型对论坛文本数据建模,对映射到主题空间的文档集去除主题噪声后用优化聚类中心选择的K-means++算法进行聚类,最后从主题突发度、主题纯净度和聚簇关注度三个方面对聚簇进行评价。通过实验分析得出主题噪声阈值设置为0.75,聚类中心数设置为50时,可以使聚类质量与聚类速度达到最优。真实数据集上的测试结果表明该方法可以有效地将聚簇按出现热点话题的可能性排序。最后设计了热点话题的展示方法。

关 键 词:潜在狄里克雷分配  主题模型  K-means 聚类  聚簇评价  热点话题  
收稿时间:2013-05-08
修稿时间:2013-07-14

On-line forum hot topic mining method based on topic cluster evaluation
JIANG Hao CHEN Xingshu DU Min. On-line forum hot topic mining method based on topic cluster evaluation[J]. Journal of Computer Applications, 2013, 33(11): 3071-3075
Authors:JIANG Hao CHEN Xingshu DU Min
Affiliation:School of Computer Science, Sichuan University. Chengdu Sichuan 610065, China
Abstract:Hot topic mining is an important technical foundation for monitoring public opinion. As current hot topic mining methods cannot solve the affection of word noise and have single hot degree evaluation way, a new mining method based on topic cluster evaluation was proposed. After forum data was modeled by Latent Dirichlet Allocation (LDA) topic model and topic noise was cut off, the data were then clustered by improved cluster center selection algorithm K-means++. Finally, clusters were evaluated in three aspects: abruptness, purity and attention degree of topics. The experimental results show that both cluster quality and clustering speed can rise up by setting topic noise threshold to 0.75 and cluster number to 50. The effectiveness of ranking clusters by their probability of the existing hot topic with this method has also been proved on real data sets tests. At last a method was developed for displaying hot topics.
Keywords:Latent Dirichlet Allocation (LDA)   topic model   K-means clustering   cluster evaluation   hot topic
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号