首页 | 本学科首页   官方微博 | 高级检索  
     

基于改进Kademlia协议的分布式爬虫
引用本文:陶耀东,向中希.基于改进Kademlia协议的分布式爬虫[J].计算机系统应用,2016,25(4):156-161.
作者姓名:陶耀东  向中希
作者单位:中国科学院 沈阳计算技术研究所, 沈阳 110168,中国科学院 沈阳计算技术研究所, 沈阳 110168;中国科学院大学, 北京 100049
基金项目:沈阳市科技计划(F14-056-7-00)
摘    要:随着互联网信息的爆炸式增长,搜索引擎和大数据等学科迫切需要一种高效、稳定、可扩展性强的爬虫架构来完成数据的采集和分析.本文借助于对等网络的思路,使用分布式哈希表作为节点间的数据交互的载体,同时针对网络爬虫自身的特点,对分布式哈希表的一种实现——Kademlia协议进行改进以满足分布式爬虫的需求.在此基础上设计并完善了具有可扩展性和容错性的分布式爬虫集群.在实际试验中,进行了单机多线程实验和分布式集群的实验,从系统性能角度和系统负载角度进行分析,实验结果表明了这种分布式集群方法的有效性.

关 键 词:分布式哈希表  P2P  网络爬虫  Kademlia协议  去中心化
收稿时间:2015/7/21 0:00:00
修稿时间:2015/9/14 0:00:00

Distributed Crawler Based on the Improved Kademlia Protocol
TAO Yao-Dong and XIANG Zhong-Xi.Distributed Crawler Based on the Improved Kademlia Protocol[J].Computer Systems& Applications,2016,25(4):156-161.
Authors:TAO Yao-Dong and XIANG Zhong-Xi
Affiliation:Shenyang Institute of Computing Technology, Chinese Academy of Sciences, Shenyang 110168, China and Shenyang Institute of Computing Technology, Chinese Academy of Sciences, Shenyang 110168, China;University of Chinese Academy of Sciences, Beijing 100049, China
Abstract:With the explosive growth of Internet information, researches on search engine and big data call for an efficient, stable and scalable crawler architecture to collect and analyze Internet data. Inspired by peer to peer network, we use distributed hash table as a carrier of communication between nodes, while a distributed hash table implementation-Kademlia protocol is modified and improved to meet the needs of the distributed crawler cluster''s scalability and fault tolerance. In the experiments, we carried out multi-threaded experiment on single computer and node expansion experiment on distributed cluster. From system performance and system load point of view, the experimental results show the effectiveness of this kind of distributed cluster.
Keywords:distributed hash table  peer to peer  network crawler  Kademlia protocol  decentration
点击此处可从《计算机系统应用》浏览原始摘要信息
点击此处可从《计算机系统应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号