首页 | 本学科首页   官方微博 | 高级检索  
     

改进的PageRank在Web信息搜集中的应用
引用本文:秦拯,张玲,李娜.改进的PageRank在Web信息搜集中的应用[J].计算机研究与发展,2006,43(6):1044-1049.
作者姓名:秦拯  张玲  李娜
作者单位:1. 湖南大学软件学院,长沙,410082
2. 湖南大学计算机与通信学院,长沙,410082
基金项目:国家自然科学基金;湖南省科研项目
摘    要:PageRank是一种用于网页排序的算法,它利用网页间的相互引用关系评价网页的重要性·但由于它对每条出链赋予相同的权值,忽略了网页与主题的相关性,容易造成主题漂移现象·在分析了几种PageRank算法基础上,提出了一种新的基于主题分块的PageRank算法·该算法按照网页结构对网页进行分块,依照各块与主题的相关性大小对块中的链接传递不同的PageRank值,并能根据已访问的链接对块进行相关性反馈·实验表明,所提出的算法能较好地改进搜索结果的精确度·

关 键 词:PageRank算法  主题分块  Web信息搜集
收稿时间:07 11 2005 12:00AM
修稿时间:02 20 2006 12:00AM

Application of an Improved PageRank in Web Crawler
Qin Zheng,Zhang Ling,Li Na.Application of an Improved PageRank in Web Crawler[J].Journal of Computer Research and Development,2006,43(6):1044-1049.
Authors:Qin Zheng  Zhang Ling  Li Na
Affiliation:Software College, Hunan University, Changsha 410082; 2 . College of Computer and Communications, Hunan University, Changsha 410082
Abstract:The PageRank algorithm is used in ranking Web pages. It estimates the pages' authority by taking into account the link structure of the Web. However, it assigns each outlink the same weight and is independent of topics, resulting in topic-drift. In this paper, an improved PageRank algorithm based on topical segments is proposed. This algorithm segments the Web page into blocks and passes the page's PageRank to outlinks in each block in proportion with the block's relativity to the given topic. Moreover, it regards the visited outlink as feedback to modify the block's relevance. The experiment in Web crawler shows that the new algorithm has better performance.
Keywords:PageRank algorithm  topical blocks  Web crawler
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号