首页 | 本学科首页   官方微博 | 高级检索  
     

面向大规模微博消息流的突发话题检测
引用本文:申国伟, 杨武, 王巍, 于淼. 面向大规模微博消息流的突发话题检测[J]. 计算机研究与发展, 2015, 52(2): 512-521. DOI: 10.7544/issn1000-1239.2015.20131336
作者姓名:申国伟  杨武  王巍  于淼
作者单位:1.(哈尔滨工程大学信息安全研究中心 哈尔滨 150001) (shenguowei@hrbeu.edu.cn)
基金项目:国家“八六三”高技术研究发展计划基金项目,国家自然科学基金项目
摘    要:突发事件在微博中迅速传播,产生巨大的影响力,因此,突发舆情受到政府、企业的广泛关注.现有的突发话题检测算法只考虑单一的特征实体,无法处理微博中新词、图片、链接等诱导的突发.面向大规模微博消息流,提出一种无需中文分词的实时突发话题检测框架模型.模型依据消息流动态调整窗口大小,并通过传播影响力度量实体的突发权值.采用高阶联合聚类算法同时对实体、消息、用户进行聚类分析,在检测突发话题的同时,得到话题的关联消息及参与用户.对比实验结果表明,算法的准确性高,能够更早地检测到突发话题.

关 键 词:突发话题检测  微博  联合聚类  影响力  大规模

Burst Topic Detection Oriented Large-Scale Microblogs Streams
Shen Guowei, Yang Wu, Wang Wei, Yu Miao. Burst Topic Detection Oriented Large-Scale Microblogs Streams[J]. Journal of Computer Research and Development, 2015, 52(2): 512-521. DOI: 10.7544/issn1000-1239.2015.20131336
Authors:Shen Guowei  Yang Wu  Wang Wei  Yu Miao
Affiliation:1.(Research Center of Information Security, Harbin Engineering University, Harbin 150001)
Abstract:In microblogs, emergent events spread quickly and produce tremendous influence. Burst of public opinion is widely concerned by government and enterprise. Existing burst topic detection methods only consider one type of entity, such as word or tag. However, Chinese microblogs contain not only new or colloquial words, but also contain some pictures and links, burst patters of which are difficult to detect. To tackle this problem, we propose a real-time burst topic detection framework for multi-type entites. Different from existing method, our method does not require Chinese word segmentation, but generates new words lastly. In this framework,the window size is adjusted based on the microblogs streams dynamically. In order to measure the burst weight of entity, the spread influence of entity is calculated. Moreover, the high order co-clustering algorithm based on non-negative matrix decompostition is used to cluster two types of entities, message and user simultaneously. While the detection of burst topic, we can also obtain the related messages and participating users, which can be used to analyze the cause of burst topic. Experimental on a large Sina Weibo dataset show that our algorithm has higher accuracy and earlier detection of the burst topic compared with the existing algorithms.
Keywords:burst topic detection  microblogs  co-clustering  influence  large scale
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《计算机研究与发展》浏览原始摘要信息
点击此处可从《计算机研究与发展》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号