首页 | 本学科首页   官方微博 | 高级检索  
     

网络爬行技术研究
引用本文:关慧芬,师军,马继红. 网络爬行技术研究[J]. 郑州轻工业学院学报(自然科学版), 2008, 23(6)
作者姓名:关慧芬  师军  马继红
作者单位:陕西师范大学计算机科学学院,陕西,西安,710062
摘    要:阐述了基于整个W eb的爬行器、增量式的爬行器、基于主题的爬行器等不同类型网络爬行器的功能及优缺点;分析了近年来国内外网络爬行器的遍历算法,包括深度优先算法、广度优先算法、主题优先算法等.分析结果表明:基于遗传算法的爬行策略能够有效地加快抓取网页的速度和扩大搜索范围.

关 键 词:爬行器  广度优先算法  主题优先策略  遗传算法

Research of Web crawler technology
GUAN Hui-fen,SHI Jun,MA Ji-hong. Research of Web crawler technology[J]. Journal of Zhengzhou Institute of Light Industry(Natural Science), 2008, 23(6)
Authors:GUAN Hui-fen  SHI Jun  MA Ji-hong
Abstract:The function and advantages disadvantages of the scalable Web crawler,incremental Web crawler and focused Web crawler,etc were expounded.The recent search algorithms about Web crawlers,including depth-first algorithm,breadth-first algorithm and best-first search algorithm,were analyzed.The results show that the search algorithm based on genetic algorithm can effectively accelerate the speed of crawling the pages and expand the searching scope.
Keywords:crawler  breadth-first algorithm  best-first search  genetic algorithm
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号