首页 | 本学科首页   官方微博 | 高级检索  
     

基于聚类集成的微博话题发现方法
引用本文:冯旭鹏,马 震,谢 波,刘利军,黄青松. 基于聚类集成的微博话题发现方法[J]. 计算机工程与应用, 2017, 53(8): 81-86. DOI: 10.3778/j.issn.1002-8331.1511-0156
作者姓名:冯旭鹏  马 震  谢 波  刘利军  黄青松
作者单位:1.昆明理工大学 教育技术与网络中心,昆明 6505002.昆明理工大学 信息工程与自动化学院,昆明 650500
摘    要:微博中短文本、用语不规范和大量噪音等特性使得传统话题发现方法不能很好地从中获取新话题。针对微博以上特性和话题动态性提出一种基于聚类集成的微博话题发现方法,该方法考虑微博发布的非线性时间因子,采用改进的K-Means方法分别融合微博的各个特性构造其对应的基聚类器,并评估各基聚类器之间的有效性和差异性,以此设置集成投票权值并最终进行聚类集成。实验对比结果表明,该方法将微博发现话题的准确性提升约9.5%,能够更有效地探测到新话题。

关 键 词:短文本  噪音  话题发现  动态性  非线性时间  基聚类器  聚类集成  

Microblog topic detection method based on clustering ensemble
FENG Xupeng,MA Zhen,XIE Bo,LIU Lijun,HUANG Qingsong. Microblog topic detection method based on clustering ensemble[J]. Computer Engineering and Applications, 2017, 53(8): 81-86. DOI: 10.3778/j.issn.1002-8331.1511-0156
Authors:FENG Xupeng  MA Zhen  XIE Bo  LIU Lijun  HUANG Qingsong
Affiliation:1.Educational Technology and Campus Network Center, Kunming University of Science and Technology, Kunming 650500, China2.Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China
Abstract:The short text, randomness and a large amount of noise make the traditional methods of topic detection can not be solved to get the new topic, and these topic detection techniques have not considered the time factor of the microblog post. In this paper, the microblog topic detection method based on clustering ensemble is proposed for the characteristics of micro-blog and topic dynamic performance. This method considers the nonlinear time factor of microblog post, the improved K-Means method is used to construct the corresponding base cluster based on each feature of microblog, evaluate the effectiveness and difference between the each cluster, so as to set up the ensemble voting weights and the clustering ensemble is used for microblog topic detection. Experimental results show that the proposed method gets an accuracy up to 9.5% in microblog topic detection, which can detect the new topic more effectively.
Keywords:short text  noise  topic detection  dynamic  nonlinear time factor  base cluster  clustering ensemble  
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号