首页 | 本学科首页   官方微博 | 高级检索  
     

基于文本聚类与兴趣衰减的微博用户兴趣挖掘方法
引用本文:秦永彬,孙玉洁,魏笑.基于文本聚类与兴趣衰减的微博用户兴趣挖掘方法[J].计算机应用研究,2019,36(5).
作者姓名:秦永彬  孙玉洁  魏笑
作者单位:贵州大学计算机科学与技术学院,贵阳550025;贵州大学贵州省公共大数据重点实验室,贵阳550025;贵州大学计算机科学与技术学院,贵阳,550025
基金项目:国家自然科学基金重大研究计划项目(91746116);贵州省重大应用基础研究项目(黔科合JZ字[2014]2001);贵州省科技重大专项计划(黔科合重大专项字[2017]3002)
摘    要:微博平台隐含潜在的用户信息,通过微博数据挖掘用户兴趣具有重要的社会意义。结合用户兴趣与微博信息的特点,提出了一种文本聚类与兴趣衰减的微博用户兴趣挖掘(TCID-MUIM)方法。首先,通过基于词林的同义词合并策略弥补建模时词频信息不足的弊端;其次,利用二次Single-Pass不完全聚类算法将用户微博划分为多个簇,将簇合并为同一文档以弥补微博文本短小难以挖掘主题信息的问题;最后,通过LDA模型建模,并考虑用户兴趣随时间变化的问题,引入时间因子,将微博—主题矩阵压缩为用户—主题矩阵,获取用户兴趣。实验表明,较之传统建模方法与合并用户历史微博为同一文档的建模方法,TCID-MUIM方法挖掘的用户兴趣主题具有更好的主题区分度,且更贴合用户的真实兴趣偏好。

关 键 词:微博  single-pass聚类  LDA模型  用户兴趣挖掘  兴趣衰减
收稿时间:2017/11/17 0:00:00
修稿时间:2019/4/2 0:00:00

Microblog user interest mining based on text clustering and interest decay
Qin Yongbin,Sun Yujie and Wei Xiao.Microblog user interest mining based on text clustering and interest decay[J].Application Research of Computers,2019,36(5).
Authors:Qin Yongbin  Sun Yujie and Wei Xiao
Abstract:Microblog platform contains potential user''s information, through microblog data mining microblog user interest has important social significance. On account of the characteristics of user interest and microblog information, this paper put forward a method of microblog user interest mining based on text clustering and interest decay(TCID-MUIM) . Firstly, it used the synonyms combined strategy based on Tongyici Cilin to make up for the process of modeling the lack of word frequency information. Secondly, it used the double single-pass incomplete clustering algorithm to make up the problem that the microblog text was shorter so that difficult to dig the topic information. Finally, it used the LDA model modeling, as well as considering the user''s interest changes with time, by introduction of time factor compresses the microblog-topic matrix into the user-topic matrix to gain user interest. Experimental results show that compared to traditional modeling methods and the modeling methods of merger user''s all history microblog as the same document, the TCID-MUIM method presented which modeling results have a higher topic''s differences and closer to the user''s real interest preferences.
Keywords:microblog  Single-Pass clustering  LDA model  user interest mining  interest decay
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号