首页 | 本学科首页   官方微博 | 高级检索  
     

基于Nutch的页面排序算法研究
引用本文:胡维华,曹奇峰.基于Nutch的页面排序算法研究[J].杭州电子科技大学学报,2013(6):74-77.
作者姓名:胡维华  曹奇峰
作者单位:杭州电子科技大学计算机学院,浙江杭州310018
摘    要:针对某一主题或学科的垂直搜索引擎是搜索引擎的延伸和细分,面向特定用户提供垂直搜索。网页排序算法是搜索引擎好坏的关键,搜索引擎网页排序算法的目的是从海量搜索结果中将主题相关和权威的网页排在前列,帮助用户查找所需的资源。 Nutch搜索引擎只实现了一个基本的综合排序模型,为了使Nutch更好地满足专业用户的需求,该文设计一个综合考虑主题相关性和网页权威性的综合排序模型,将主题相关度因子和改进后的PageRank算法因子融入到Nutch网页评分计算公式中。实验表明,改进的排序算法可以提高信息的查准率,具有明显的主题倾向性,在实际应用中发挥作用。

关 键 词:搜索引擎  向量空间模型  页面排序算法  相似性

Research of Page Ranking Algorithm Based on Nutch
HU Wei-hua,CAO Qi-feng.Research of Page Ranking Algorithm Based on Nutch[J].Journal of Hangzhou Dianzi University,2013(6):74-77.
Authors:HU Wei-hua  CAO Qi-feng
Affiliation:(School of Computer Science, Hangzhou Dianzi University, Hangzhou Zhejiang 310018, China)
Abstract:For a particular subject or discipline of the vertical search engine is an extension of the search engine and subdivision .It provides vertical search for specific users .Web sorting algorithm in search engine is putting the related and authoritative pages in the front of massive pages search results and helping users to locate the information rapidly .It is the key of search engine .Sorting algorithm in Nutch search engine is a basic ranking synthetical model .In order to meet the needs of professional users , the theme correlation factor and web authoritative factor are added into the Nutch web pages scoring formula .Experiment results show that the improved algorithm can improve the accuracy of search results , which can play a role in practical applications .
Keywords:search engine  vector space model  PageRank algorithm  similarity
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号