首页 | 本学科首页   官方微博 | 高级检索  
     

基于动态主题模型融合多维数据的微博社区发现算法
引用本文:刘冰玉,王翠荣,王聪,王军伟,王兴伟,黄敏.基于动态主题模型融合多维数据的微博社区发现算法[J].软件学报,2017,28(2):246-261.
作者姓名:刘冰玉  王翠荣  王聪  王军伟  王兴伟  黄敏
作者单位:东北大学 信息科学与工程学院, 辽宁 沈阳 110819,东北大学 信息科学与工程学院, 辽宁 沈阳 110819,东北大学 信息科学与工程学院, 辽宁 沈阳 110819,东北大学 信息科学与工程学院, 辽宁 沈阳 110819,东北大学 软件学院, 辽宁 沈阳 110819,东北大学 信息科学与工程学院, 辽宁 沈阳 110819
基金项目:国家杰出青年科学基金资助项目(61225012,71325002);国家自然科学基金资助项目(61572123,61300195);高等学校博士学科点专项科研基金优先发展领域资助课题(20120042130003);辽宁省百千万人才工程项目(2013921068);河北省自然科学基金(F2014501078);河北省科技计划项目(15210146)
摘    要:随着微博用户的不断增加,微博网络已经成为用户进行信息交流的平台.针对由于博文长度受限,传统的社区发现算法无法有效解决微博网络的稀疏性等问题,提出了DC-DTM算法.DC-DTM算法首先将微博网络映射为有向加权网络,网络中边的方向反映结点之间的关注关系,利用提出的DTM模型计算出结点之间的语义相似度,并将其作为节点间连边的权重.DTM模型是一种微博主题模型,该模型不仅能够挖掘博客的主题分布,而且能计算出某一主题中用户的影响力大小.其次,利用提出的复杂度低的标签传播算法WLPA进行微博网络的社区发现.该算法的初始化阶段将影响力大的用户结点作为初始结点,标签按照结点的影响力从大到小进行传播,克服了传统标签传播算法的逆流现象,提高了标签传播算法的稳定性.在真实数据上的实验表明,DTM模型能很好地对微博进行主题挖掘,DC-DTM算法能够有效地挖掘出微博网络的社区.

关 键 词:新浪微博  文本挖掘  DC-DTM  吉布斯采样  LDA  主题模型
收稿时间:2015/12/26 0:00:00
修稿时间:2016/3/17 0:00:00

Microblog Community Discovery Algorithm Based on Dynamic Topic Model with Multidimensional Data Fusion
LIU Bing-Yu,WANG Cui-Rong,WANG Cong,WANG Jun-Wei,WANG Xing-Wei and HUANG Min.Microblog Community Discovery Algorithm Based on Dynamic Topic Model with Multidimensional Data Fusion[J].Journal of Software,2017,28(2):246-261.
Authors:LIU Bing-Yu  WANG Cui-Rong  WANG Cong  WANG Jun-Wei  WANG Xing-Wei and HUANG Min
Affiliation:School of Computer Science and Engineering, Northeastern University, Shenyang 110819, China,School of Computer Science and Engineering, Northeastern University, Shenyang 110819, China,School of Computer Science and Engineering, Northeastern University, Shenyang 110819, China,School of Computer Science and Engineering, Northeastern University, Shenyang 110819, China,School of Software, Northeastern University, Shenyang 110819, China and School of Computer Science and Engineering, Northeastern University, Shenyang 110819, China
Abstract:With the dramatic increase of microblog users, microblog websites have become the platform for a wide spectrum of users to get information. Due to the fact that blog is a special kind of text with restricted length, traditional community detection algorithms cannot effectively solve the sparse problem of micro blog. Therefore, the DC-DTM (Discovery Community by DTM) algorithm is proposed. First, the algorithm maps microblog as a directed-weighted network, in which the direction is the concerned relationship, and the weight is the topic''s similarity of different nodes calculated by DTM (Data Topic Mining) model. DTM is a topic model in microblog, which can not only mine the topics of each microblog accurately but also can calculate Author''s influences on one topic. Second, the algorithm uses label propagation WLPA, with low complexity, to find communities in Microblog. In the initial process, we selected nodes with the largest influence as the initial nodes, and propagated the label in the order of node''s influences, from large to small. The algorithm has overcome the current adverse phenomenon in the traditional label propagation algorithm, and the algorithm has better stability. Experiments on real data showed that the DTM model can be very good for the topic mining in microblog and DC-DTM algorithm can effectively discover the communities of microblog.
Keywords:Sina microblog  text mining  DC-DTM  Gibbs sampling  LDA  topic model
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号