首页 | 本学科首页   官方微博 | 高级检索  
     

基于主题相似度模型的TS-PageRank算法
引用本文:黄德才,戚华春,钱能.基于主题相似度模型的TS-PageRank算法[J].小型微型计算机系统,2007,28(3):510-514.
作者姓名:黄德才  戚华春  钱能
作者单位:浙江工业大学,信息工程学院,浙江,杭州,310014
摘    要:PageRank算法是著名搜索引擎Google的核心算法,但存在主题漂移的问题,致使搜索结果中存在过多与查询主题无关的网页.在分析PageRank算法及其有关改进算法的基础上,提出了基于虚拟文档的主题相似度模型和基于主题相似度模型的TS-PageRank算法框架.只要选择不同的相似度计算模型,就可以得到不同的TS-PageRank算法,形成一个网页排序算法簇.理论分析和数值仿真实验表明,该算法在不需要额外文本信息,也不增加算法时空复杂度的情况下,就能极大地减少主题漂移现象,从而提高查询效率和质量.

关 键 词:链接分析  主题相似度  PageRank算法
文章编号:1000-1220(2007)03-0510-05
修稿时间:2005-12-26

TS-PageRank Algorithm Based on the Model of Topic Similarity
HUANG De-cai,QI Hua-chun,QIAN Neng.TS-PageRank Algorithm Based on the Model of Topic Similarity[J].Mini-micro Systems,2007,28(3):510-514.
Authors:HUANG De-cai  QI Hua-chun  QIAN Neng
Affiliation:College of Information Engineering, Zhejiang University of Technology, Hangzhou 310 014, China
Abstract:The PageRank algorithm is a key algorithm used in famous search engine Google,but there exists a bad problem of topic-drift,which results in too many web pages without any correlation with the user's search topic in the list of web pages searched by the algorithm. After analysing the PageRank algorithm and its modified algorithm,a similarity model based on virtual file vector and similar degree of cosine,and put forward a TS-PageRank algorithm frame.We can get many different TS-PageRank algorithms and form a set of TS-PageRank algorithm,if we use different similarity model in the frame. The analysis of theory and numerical simulation illustrate that the TS-PageRank algorithm can avoid the problem of topic-drift and improve the quanlity of web search effectively without adding any other extra text information or increasing the degree of time and space complexity.
Keywords:hyperlink analysis  topic similarity  pagerank algorithm
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号