首页 | 本学科首页   官方微博 | 高级检索  
     

基于用户行为特征的多维度文本聚类
引用本文:黎万英,黄瑞章,丁志远,陈艳平,徐立洋.基于用户行为特征的多维度文本聚类[J].计算机应用,2018,38(11):3127-3131.
作者姓名:黎万英  黄瑞章  丁志远  陈艳平  徐立洋
作者单位:1. 贵州大学 计算机科学与技术学院, 贵阳 550025;2. 贵州省公共大数据重点实验室(贵州大学), 贵阳 550025;3. 计算机软件新技术国家重点实验室(南京大学), 南京 210093
基金项目:国家自然科学基金资助项目(61462011);国家自然科学基金重大研究计划项目(91746116);贵州省重大应用基础研究项目(黔科合JZ字[2014]2001);贵州省自然科学基金资助项目(黔科合基础[2018]1035);贵州省科技重大专项计划(黔科合重大专项字[2017]3002)。
摘    要:传统多维度文本聚类一般是从文本内容中提取特征,而很少考虑数据中用户与文本的交互信息(如:点赞、转发、评论、关注、引用等行为信息),且传统的多维度文本聚类主要是将多个空间维度线性结合,没能深入考虑每个维度中属性间的关系。为有效利用与文本相关的用户行为信息,提出一种结合用户行为信息的多维度文本聚类模型(MTCUBC)。根据文本间的相似性在不同空间上应该保持一致的原则,该模型将用户行为信息作为文本内容聚类的约束来调节相似度,然后结合度量学习方法来改善文本间的距离,从而提高聚类效果。通过实验表明,与线性结合的多维度聚类相比,MTCUBC模型在高维稀疏数据中表现出明显的优势。

关 键 词:多维度聚类  度量学习  约束  用户行为特征  
收稿时间:2018-04-30
修稿时间:2018-06-21

Multi-dimensional text clustering with user behavior characteristics
LI Wanying,HUANG Ruizhang,DING Zhiyuan,CHEN Yanping,XU Liyang.Multi-dimensional text clustering with user behavior characteristics[J].journal of Computer Applications,2018,38(11):3127-3131.
Authors:LI Wanying  HUANG Ruizhang  DING Zhiyuan  CHEN Yanping  XU Liyang
Affiliation:1. College of Computer Science and Technology, Guizhou University, Guiyang Guizhou 550025, China;2. Guizhou Provincial Key Laboratory of Public Big Data(Guizhou University), Guiyang Guizhou 550025, China;3. State Key Laboratory for Novel Software Technology(Nanjing University), Nanjing Jiangsu 210093, China
Abstract:Traditional multi-dimensional text clustering generally extracts features from text contents, but seldom considers the interaction information between users and text data, such as likes, forwards, reviews, concerns, references, etc. Moreover, the traditional multi-dimension text clustering mainly integrates linearly multiple spatial dimensions and fails to consider the relationship between attributes in each dimension. In order to effectively use text-related user behavior information, a Multi-dimensional Text Clustering with User Behavior Characteristics (MTCUBC) was proposed. According to the principle that the similarity between texts should be consistent in different spaces, the similarity was adjusted by using the user behavior information as the constraints of the text content clustering, and then the distance between the texts was improved by the metric learning method, so that the clustering effect was improved. Extensive experiments conduct and verify that the proposed MTCUBC model is effective, and the results present obvious advantages in high-dimensional sparse data compared to linearly combined multi-dimensional clustering.
Keywords:multi-dimensional clustering                                                                                                                        metric learning                                                                                                                        constraint                                                                                                                        user behavior characteristics
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号