基于互联网的爬虫程序研究 Research on crawler program based on Internet期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于互联网的爬虫程序研究

引用本文：	郭银芳,韩凯,郭峰明,王国升,李雪萌.基于互联网的爬虫程序研究[J].计算机应用文摘,2022(2).

作者姓名：	郭银芳韩凯郭峰明王国升李雪萌

作者单位：	太原学院

基金项目：	太原学院大学生创新创业训练计划项目(TYX2021020)。

摘要：	随着互联网的飞速发展,大数据成为互联网技术行业的流行词汇。如今,想要获取大量的数据,爬虫无疑是非常便利的工具。文章介绍了爬虫的原理以及网页的分析方法,对Scrapy框架进行了介绍﹐并用Scrapy对网站进行了数据的爬取,最后利用数据可视化工具对数据进行处理,以便更加直观地对数据进行分析。文章以拉勾网为爬虫对象,在爬虫的过程中,总结了爬虫遇到的问题和解决办法。此外,文章利用Scrapy框架对程序进行了优化,提升了爬取效率。
关键词：	聚焦爬虫搜索策略 scrapy框架全站爬取分布式爬取
Research on crawler program based on Internet

Authors:	GUO Yinfang HAN Kai GUO Fengming WANG Guosheng LI Xuemeng

Affiliation:	(Taiyuan University,Taiyuan 030032,China)

Abstract:	With the rapid development of the Internet,big data has become a popular vocabulary in the Internet technology industry.Now the crawler is undoubtedly a very convenient tool when obtaining alarge amount of data.This paper first introduces the principle of python crawler as well as the analysis method of Web page,presents the scrape framework,and then uses scrape to crawl data from thewebsite.Finally,data visualization tools are used to process the data in order to analyze the data more intuitively.This paper takes pull-up web as the object of crawler,and summarizes the problems andsolutions encountered by crawler in the process of crawler.Using Scrapy framework,the program isoptimized to improve the efficiency of climbing.

Keywords:	focused crawler search strategy scratch framework whole station crawling distributed crawling
本文献已被维普等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏