首页 | 官方网站   微博 | 高级检索  
     

基于微博用户兴趣话题的相似用户挖掘
引用本文:李鹏飞,董旭,仲兆满,李存华.基于微博用户兴趣话题的相似用户挖掘[J].计算机工程与应用,2019,55(11):102-109.
作者姓名:李鹏飞  董旭  仲兆满  李存华
作者单位:中国矿业大学 计算机科学与技术学院,江苏 徐州,221000;淮海工学院 计算机工程学院,江苏 连云港 222005;江苏金鸽网络科技有限公司 软件研发中心,江苏 连云港 222005;淮海工学院 计算机工程学院,江苏 连云港,222005
基金项目:国家自然科学基金;江苏省高层次人才培养工程"项目;江苏高校品牌专业建设工程资助项目;连云港521高层次人才培养对象资助项目;淮海工学院高等教育科学研究项目
摘    要:相似用户挖掘是提高社交网络服务质量的重要途径,在面向大数据的社交网络时代,准确的相似用户挖掘对于用户和互联网企业等都有重要的意义,而根据用户自己的兴趣话题挖掘的相似用户更符合相似用户的要求。提出了一种基于用户兴趣话题进行相似用户挖掘的方法。该方法首先使用TextRank话题提取方法对用户进行兴趣话题提取,再对用户发表内容进行训练,计算出所有词之间的相似度。提出CP(Corresponding Position similarity)、CPW(Corresponding Position Weighted similarity)、AP(All Position similarity)、APW(All Position Weighted similarity)四种用户兴趣话题词相似度计算方法,通过用户和相似用户间关注、粉丝重合率验证相似用户挖掘效果,APW similarity的相似用户的关注/粉丝重合百分比为1.687%,优于提出的其他三种算法,分别提高了26.3%、2.8%、12.4%,并且比传统的文本相似度方法 Jaccard相似度、编辑距离算法、余弦相似度分别提高了20.4%、21.2%、45.0%。因此APW方法可以更加有效地挖掘出用户的相似用户。

关 键 词:微博  相似用户  兴趣话题  文本训练  用户挖掘

Similar User Mining Based on User Interest Topics in Weibo
LI Pengfei,DONG Xu,ZHONG Zhaoman,LI Cunhua.Similar User Mining Based on User Interest Topics in Weibo[J].Computer Engineering and Applications,2019,55(11):102-109.
Authors:LI Pengfei  DONG Xu  ZHONG Zhaoman  LI Cunhua
Affiliation:1.School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, Jiangsu 221000, China 2.School of Computer Engineering, Huaihai Institute of Technology, Lianyungang, Jiangsu 222005, China 3.Software R & D Center, Jiangsu Jinge Network Technology Co., Ltd., Lianyungang, Jiangsu 222005, China
Abstract:Similar user mining is an important way to improve the quality of social network services. In the era of big data-oriented social networks, accurate similar user mining has important meanings for users and Internet companies. Similar users mined from users’ interest topics are more consistent with similar users’ requirements. This paper proposes a method for similar user mining based on user interest topics. The method first uses the TextRank topic extraction method to extract the interest topics of the user, and then trains users’ content to calculate the similarity between all the words. Four methods for calculating similarity of user interest topic words, such as CP(Corresponding Position similarity), CPW(Corresponding Position Weighted similarity), AP(All Position similarity), and APW(All Position Weighted similarity), are proposed. The coincidence rate verifies the similar user mining effect. The similar user’s followers/fans coincidence percentage of APW similarity is 1.687%, which is better than the other three algorithms proposed, which are increased by 26.3%, 2.8%, 12.4%. Meanwhile, the coincidence rate of proposed method is better than the traditional text similarity methods and Jaccard similarity, edit distance algorithm, and cosine similarity are improved by 20.4%, 21.2%, and 45.0%, respectively. Therefore, the APW method can more effectively mine similar users of users.
Keywords:Weibo  similar users  interest topic  text training  user mining  
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号