首页 | 本学科首页   官方微博 | 高级检索  
     

HTMLUNIT在网络信息采集系统中的应用
引用本文:陈永江,仲兆满,陈宗华. HTMLUNIT在网络信息采集系统中的应用[J]. 淮海工学院学报, 2013, 0(4): 31-35
作者姓名:陈永江  仲兆满  陈宗华
作者单位:江苏金鸽网络科技有限公司,江苏连云港222006
基金项目:科技部科技人员服务企业汁划项目(2009GJC10043)
摘    要:首先分析了传统HttpClient方式进行网页信息抓取时的不足,进而讨论了HTMLUNIT技术对富JavaScript页面的支持、获取Ajax技术页面的异步数据和需要模拟交互的页面数据的机器自动抓取问题,并给出实例和实现。进行了HTMLUNIT与流行浏览器内核JavaScript解析速度的对比性试验,最后得出了分析结论。

关 键 词:采集系统  HTMLUNIT  Java浏览器内核

Application of HTMLUNIT into Network Information-collecting System
CHEN Yong-jiang,ZHONG Zhao-man,CHEN Zong-hua. Application of HTMLUNIT into Network Information-collecting System[J]. Journal of Huaihai Institute of Technology:Natural Sciences Edition, 2013, 0(4): 31-35
Authors:CHEN Yong-jiang  ZHONG Zhao-man  CHEN Zong-hua
Affiliation:(Jiangsu Jinge Network Technology Co. , Ltd. , Lianyungang 222006, China)
Abstract:Analyzing the shortcomings in collecting web page information by the traditional way of HttpClient, we focused on the ways to use HtmlUnit technology to collect information from web pages by using rich JavaScript technology or Ajax technology. We also discussed the ways to log in web sites by using HtmlUnit in Java program, and we provided examples and implementa- tions. Meanwhile, we made JavaScript parsing speed comparison tests in the HtmlUnit browser kernel, and arrived at the final conclusions based on our analysis.
Keywords:data-collecting system  HTMLUNIT  Java browser kernel
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号