首页 | 本学科首页   官方微博 | 高级检索  
     

主题网络爬虫研究综述
引用本文:于娟,刘强.主题网络爬虫研究综述[J].计算机工程与科学,2015,37(2):231-237.
作者姓名:于娟  刘强
作者单位:福州大学经济与管理学院,福建福州,350108
基金项目:国家自然科学基金资助项目(71201032);福建省社会科学规划资助项目(2012C021);福建省教育厅社会科学研究资助项目(JA11040S)
摘    要:网络信息资源呈指数级增长,面对用户越来越个性化的需求,主题网络爬虫应运而生。主题网络爬虫是一种下载特定主题网页的程序。利用在采集页面过程获得的特定信息,主题网络爬虫抓取的页面都是与主题相关的。基于主题网络爬虫的搜索引擎以及基于主题网络爬虫构建领域语料库等应用已经得到广泛运用。首先介绍了主题爬虫的定义、工作原理;然后介绍了近年来国内外关于主题爬虫的研究状况,并比较了各种爬行策略及相关算法的优缺点;最后提出了主题网络爬虫未来的研究方向。关键词:

关 键 词:网络爬虫  主题爬虫  搜索引擎  
收稿时间:2013-08-27
修稿时间:2013-10-18

Survey on topic-focused crawlers
YU Juan , LIU Qiang.Survey on topic-focused crawlers[J].Computer Engineering & Science,2015,37(2):231-237.
Authors:YU Juan  LIU Qiang
Affiliation:(School of Economics and Management,Fuzhou University,Fuzhou 350108,China)
Abstract:With the exponential growth of network information resources and the growing personalized demands of customers, topic focused crawler emerges as the times require. Topic focused crawlers are programs designed to download web pages which are relevant to specific topics. Using information gathered at running time, topic focused crawlers explore the webs which follow promissory hyperlinks, and fetch only pages which appear to be relevant. The searching engine and corpus building based on topic focused crawling have been widely used. We first define the goals and operating principles of focused crawling, comprehensively analyze the recent advances at home and abroad, and then compare the crawling strategies of various topic focused crawlers as well as the advantages and disadvantages of related algorithms. Finally, we point out the future direction of topic focused crawling.
Keywords:web crawler  focused -crawler  searching engine
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《计算机工程与科学》浏览原始摘要信息
点击此处可从《计算机工程与科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号