基于Web的网络爬虫的设计与实现 Design and Implementation of Spider on Web-based Full-text Search Engine期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于Web的网络爬虫的设计与实现

引用本文：	徐远超,刘江华,刘丽珍,关永.基于Web的网络爬虫的设计与实现[J].微计算机信息,2007,23(21):119-121.

作者姓名：	徐远超刘江华刘丽珍关永

作者单位：	100037,北京,首都师范大学,信息工程学院

基金项目：	北京市自然科学基金;北京市教委科技发展计划项目

摘要：	无论是站内信息检索还是特定的Web信息搜集,都离不开全文搜索引擎系统的核心模块--网络爬虫,本文详细介绍了一种设计及实现方案,包括页面搜集器和页面索引器的基本工作流程、数据存储结构、核心算法及主要的技术难点.该系统经实际运行,效果良好,最后给出了有待进一步改进的地方.
关键词：	搜索引擎网络爬虫信息检索页面索引
文章编号：	1008-0570（2007）07-3-0119-03
修稿时间：	2007-05-032007-06-05
Design and Implementation of Spider on Web-based Full-text Search Engine

XU YUANCHAO,LIU JIANGHUA,LIU LIZHEN,GUAN YONG.Design and Implementation of Spider on Web-based Full-text Search Engine[J].Control & Automation,2007,23(21):119-121.

Authors:	XU YUANCHAO LIU JIANGHUA LIU LIZHEN GUAN YONG

Abstract:	Whether inside website information retrieval or special web information collecting, spider is the essential and most important module. One way of design and implementation of spider on web-based full-text search engine was introduced in detail,including the basic work principle,database structure,key arithmetic and technical difficulties about webpage collecting and webpage indexing. This blue print has been proved to be feasible. In the end it gives some aspects to be improved on.

Keywords:	search engine spider information retrieval webpage indexing
本文献已被维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏