首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于状态转换图的Ajax爬行算法
引用本文:郭浩,陆余良,刘金红. 一种基于状态转换图的Ajax爬行算法[J]. 计算机应用研究, 2009, 26(11): 4266-4269. DOI: 10.3969/j.issn.1001-3695.2009.11.076
作者姓名:郭浩  陆余良  刘金红
作者单位:电子工程学院,网络系,合肥,230037
摘    要:传统Web爬虫无法解决爬行Ajax应用所面临的JavaScript执行、状态识别与切换、重复状态检测等问题。为此,首先定义Ajax应用的状态转换图,并设计了一种基于状态转换图的Ajax爬行算法,通过该算法可以获取Ajax应用状态信息和后台Deep Web资源。为了提高Ajax爬行的准确性、减少待爬行的状态数目,使用Ajax指纹识别、DOM结构过滤等方法改进上述算法。实验结果表明了算法的有效性和性能。

关 键 词:Ajax爬虫; 状态转换图; Web爬虫; Deep Web

Ajax crawling algorithm based on state transition graph
GUO Hao,LU Yu-liang,LIU Jin-hong. Ajax crawling algorithm based on state transition graph[J]. Application Research of Computers, 2009, 26(11): 4266-4269. DOI: 10.3969/j.issn.1001-3695.2009.11.076
Authors:GUO Hao  LU Yu-liang  LIU Jin-hong
Affiliation:(Dept. of Network, Electronic Engineering Institute, Hefei 230037, China)
Abstract:Traditional Web crawler could not meet the challenges of crawling Ajax application, such as JavaScript execution, state identification and navigation, duplicate states elimination etc.By exploring such challenges,this paper introduced state transition graph, based on which an algorithm was proposed to retrieve Ajax states and the background Deep Web. In order to uplift the accuracy,reduce the unnecessary states,improved the algorithm by Ajax fingerprinting and DOM filtering. The experimental results indicate the effectivity and efficiency of this algorithm.
Keywords:Ajax crawler   state transition graph   Web crawler   Deep Web
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号