首页 | 本学科首页   官方微博 | 高级检索  
     

定题搜索引擎Robot的设计与算法
引用本文:龙宇巍,王永成,许欢庆.定题搜索引擎Robot的设计与算法[J].计算机仿真,2004,21(4):69-73.
作者姓名:龙宇巍  王永成  许欢庆
作者单位:上海交通大学计算机系,上海,200030
基金项目:国家 8 63项目资 (2 0 0 2AA1190 5 0 )
摘    要:定题搜索引擎将信息检索限定在特定主题领域,提供特定主题的信息检索服务,是新一代搜索引擎的发展方向之一。该文介绍了一个定题搜索robot系统NetBat 2.02版,它可以实现在web上爬行下载主题相关网页。定题搜索的关键技术是主题相关信息的搜索及网页相关度分析。该文分析了传统定题搜索算法的优缺点,提出了基于反向链接结合超链文本分析的定题搜索算法。文章还对基于内容的网页相关度分析算法进行了详细的论述。对比搜索实验表明系统有着较好的性能,能准确地爬行到主题相关网页。

关 键 词:定题搜索  搜索引擎  Robot  相关度分析  爬行算法  信息检索
文章编号:1006-9348(2004)04-0069-04
修稿时间:2003年7月21日

Design and Algorithm of a Focused Search Engine Robot
LONG Yu-wei,WANG Yong-cheng,XU Huan-qing.Design and Algorithm of a Focused Search Engine Robot[J].Computer Simulation,2004,21(4):69-73.
Authors:LONG Yu-wei  WANG Yong-cheng  XU Huan-qing
Abstract:Focused search engine forces information search in the specific topic field and provides the search service in the related topic field. It is one of the development aims in new generation search engine. This paper describes NetBat 2.02, a focused search Robot system that can crawl on the Web and download topic related page. The key technology of focused search is the topic related information search and page relevance analysis. This paper resolves the advantage and disadvantage of Fish-Search Algorithm and Shark-Search Algorithm, then it presents the InverseLink-Based Search Algorithm. Meanwhile, the paper provides a detailed discussion of Content-based page relevance analysis algorithm. Experimental results indicate that the system has better performance and can crawl more topic relevance page.
Keywords:Focused crawling  Robot  Search engine  Crawling algorithm  Relevance analysis
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号