首页 | 本学科首页   官方微博 | 高级检索  
     

广域网分布式Web 爬虫
引用本文:许 笑,张伟哲,张宏莉,方滨兴.广域网分布式Web 爬虫[J].软件学报,2010,21(4):1067-1082.
作者姓名:许 笑  张伟哲  张宏莉  方滨兴
作者单位:哈尔滨工业大学 计算机科学与技术学院,黑龙江 哈尔滨 150001
基金项目:Supported by the National Natural Science Foundation of China under Grant No.60703014 (国家自然科学基金); the National BasicResearch Program of China under Grant No.G2005CB321806 (国家重点基础研究发展计划(973)); the National High-Tech Research andDevelopment Plan of China under Grant No.2009AA01Z437 (国家高技术研究发展计划(863)); the Specialized Research Fund for theDoctoral Program of Higher Education of China under Grant No.20070213044 (高等学校博士学科点专项科研基金); the ChinaPostdoctoral Science Foundation under Grant No.20070410263 (中国博士后科学基金); the Heilongjiang Postdoctoral Foundation ofChina under Grant No.LBH-Z07108 (黑龙江省博士后资助); the Development Program for Outstanding Young Teachers in HarbinInstitute of Technology of China under Grant No.HITQNJS.2007.034 (哈尔滨工业大学优秀青年教师培养计划)
摘    要:分析了广域网分布式Web 爬虫相对于局域网爬虫的诸多优势,提出了广域网分布式Web 爬虫的3 个核心 问题:Web 划分、Agent 协同和Agent 部署.围绕这3 个问题,对目前学术界和商业界出现的多种实现方案和策略进 行了全面的综述,深入讨论了研究中遇到的问题与挑战,并论述了广域网分布式Web 爬虫的评价模型.最后,对未来 的研究方向进行了总结.

关 键 词:搜索引擎  广域网分布式爬虫  Web  划分  Agent  协同  Agent  部署
修稿时间:9/3/2009 12:00:00 AM

WAN-Based Distributed Web Crawling
XU Xiao,ZHANG Wei-Zhe,ZHANG Hong-Li and FANG Bin-Xing.WAN-Based Distributed Web Crawling[J].Journal of Software,2010,21(4):1067-1082.
Authors:XU Xiao  ZHANG Wei-Zhe  ZHANG Hong-Li and FANG Bin-Xing
Abstract:There are three core issues recognized for WAN-based distributed Web crawling systems: Web Partition, Agent collaboration and Agent deployment. Centering around these issues, this paper presents a comprehensive overview of the current strategies adopted by academic and business communities. The experiences, problems and challenges encountered by the WAN-based distributed Web crawlers are classified and discussed in depth. A summary of the current evaluation indicators is also given. Finally, conclusion and some suggestions for future research are put forward.
Keywords:search engine  WAN-based distributed crawling  Web partition  agent collaboration  agentdeployment
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号