一种基于模拟登录的微博数据采集方案 A Microblog Data Collection Method Based on Simulated Login Technology期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

一种基于模拟登录的微博数据采集方案

引用本文：	孙青云,王俊峰,赵宗渠,高梦超.一种基于模拟登录的微博数据采集方案[J].计算机技术与发展,2014(3):6-10.

作者姓名：	孙青云王俊峰赵宗渠高梦超

作者单位：	[1] 四川大学计算机学院,四川成都610065 [2] 四川大学计算机学院,四川成都610065; 视觉合成图形图像技术重点实验室,四川成都610064

基金项目：	国家科技重大专项（2012ZX10004-901001）;国家自然科学基金资助项目（11102124）

摘要：	随着Web 2.0时代的到来，舆情信息在微博上能够更快速的产生和传播。为了有效地分析微博舆情信息，微博数据的获取显得尤为重要。文中以新浪微博为研究对象，提出了基于模拟登录的网络爬虫采集方案。此方案解决了调用微博API接口对开发者的次数限制，解决了传统的网络爬虫需要身份验证的问题，加快了微博数据的采集速度，可以在短时间内获得海量的微博数据。实验表明，用该方案开发的系统具有快速的微博信息采集速度，更加灵活，可以很好地为舆情系统分析提供大量准确的数据支持。
关键词：	模拟登录技术网络爬虫
A Microblog Data Collection Method Based on Simulated Login Technology

SUN Qing-yun,WANG Jun-feng,ZHAO Zong-qu,GAO Meng-chao.A Microblog Data Collection Method Based on Simulated Login Technology[J].Computer Technology and Development,2014(3):6-10.

Authors:	SUN Qing-yun WANG Jun-feng ZHAO Zong-qu GAO Meng-chao

Affiliation:	1. College of Computer Science, Sichuan University, Chengdu 610065, China; 2. National Key Laboratory of Fundamental Science on Synthetic Vision ,Chengdu 610064 ,China)

Abstract:	Public sentiment information on the microblog generates rapidly and disseminates widely resulting from the coming era of Web 2. 0. Now the information collection is becoming more and more important in analyzing public sentiment. A Web crawler based on simu-lated login technology on the Sina microblog is presented. In the crawler,resolve the limiting numbers of calling microblog API interface for developer,meanwhile it provides a solution for the authentication of traditional Web crawler. It can collect huge amount of data in the short-term because of accelerated progress of collection. According to the result of experiments,this system can improve the microblog in-formation collection speed and become more flexible that can provide accurate data for the public sentiment analysis system.

Keywords:	微博API microblog API simulated login technology Web crawler
本文献已被维普等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏