首页 | 本学科首页   官方微博 | 高级检索  
     

S-SimRank:结合内容和链接信息的文档相似度计算方法
引用本文:蔡元哲,李佩,刘红岩,何军,杜小勇. S-SimRank:结合内容和链接信息的文档相似度计算方法[J]. 计算机科学与探索, 2009, 3(4): 378-391. DOI: 10.3778/j.issn.1673-9418.2009.04.005
作者姓名:蔡元哲  李佩  刘红岩  何军  杜小勇
作者单位:中国人民大学教育部数据工程和知识工程重点实验室,北京,100872;中国人民大学信息学院,北京,100872;清华大学管理科学与工程系,北京,100084
基金项目:国家自然科学基金,The preliminary version of this paper wag first appeared in Proceedings of 4th International Conference Oil Advanced Data Mining and Applications
摘    要:文档的内容分析和连接分析是计算文档相似度的两种方法。连接分析能够发现文档之间的隐含关系,但是,由于文档之间的噪声的存在,这种方法很难得到精确的结果。为了解决这个问题,提出了一个新的算法—S-SimRank(Star-SimRank),有效地将文档的内容信息和连接信息结合在一起从而提高了文档相似度计算的准确性。S-Simrank算法在ACM数据集上无论是准确性和效率都比其他算法有了很大地提高。最后,给出了S-SimRank的收敛性的数学证明。

关 键 词:连接分析  相似度计算  文本分析
修稿时间: 

S-SimRank:Combining Content and Link Information to Cluster Papers Effectively and Efficiently
CAI Yuanzhe,LI Pei,LIU Hongyan,HE Jun,DU Xiaoyong. S-SimRank:Combining Content and Link Information to Cluster Papers Effectively and Efficiently[J]. Journal of Frontier of Computer Science and Technology, 2009, 3(4): 378-391. DOI: 10.3778/j.issn.1673-9418.2009.04.005
Authors:CAI Yuanzhe  LI Pei  LIU Hongyan  HE Jun  DU Xiaoyong
Affiliation:CAI Yuanzhe1,2,LI Pei1,LIU Hongyan3,HE Jun1,2+,DU Xiaoyong1,2 1.Key Laboratory of Data Engineering , Knowledge Engineering,Ministry of Education,Renmin University of China,Beijing 100872,China 2.School of Information,China 3.Department of Management Science , Engineering,Tsinghua University,Beijing 100084,China
Abstract:Content analysis and link analysis among documents are two common methods in recommending system.Compared with content analysis,link analysis can discover more implicit relationship between documents.At the same time,because of the noise,these methods can't gain precise result.To solve this problem,a new algorithm,S-SimRank(Star-SimRank),is proposed to effectively combine content analysis and link analysis to improve the accuracy of similarity calculation.The experimental results for the ACM data set show t...
Keywords:linkage mining  similarity calculation  text mining
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《计算机科学与探索》浏览原始摘要信息
点击此处可从《计算机科学与探索》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号