Deep Web数据源聚焦爬虫 Deep Web Sources Focused Crawler期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Deep Web数据源聚焦爬虫

引用本文：	林超,赵朋朋,崔志明. Deep Web数据源聚焦爬虫[J]. 计算机工程, 2008, 34(7): 56-58

作者姓名：	林超赵朋朋崔志明

作者单位：	苏州大学智能信息处理及应用研究所,苏州,215006;苏州大学智能信息处理及应用研究所,苏州,215006;苏州大学智能信息处理及应用研究所,苏州,215006

基金项目：	国家自然科学基金 , 教育部科学技术研究项目 , 高等学校博士学科点专项科研项目 , 江苏省高技术研究发展计划项目

摘要：	Internet上有大量页面是由后台数据库动态产生的，这部分页面不能通过传统的搜索引擎访问，被称为Deep Web。数据源发现是大规模Deep Web数据源集成的关键步骤。该文提出一种针对Deep Web数据源的聚焦爬行算法。在评价链接重要性时，综合考虑了页面与主题的相关性和链接相关信息。实验证明该方法是有效的。
关键词：	Deep Web数据源聚焦爬虫贝叶斯分类器
文章编号：	1000-3428(2008)07-0056-03
修稿时间：	2007-04-10
Deep Web Sources Focused Crawler

LIN Chao,ZHAO Peng-peng,CUI Zhi-ming. Deep Web Sources Focused Crawler[J]. Computer Engineering, 2008, 34(7): 56-58

Authors:	LIN Chao ZHAO Peng-peng CUI Zhi-ming

Affiliation:	(Institute of Intelligent Information Processing and Application, Suzhou University, Suzhou 215006)

Abstract:	A lot of pages on Internet are generated dynamically by the back-end databases, which can not be reached by the traditional search engines called Deep Web. This paper proposes an algorithm of Deep Web sources focused crawling. When evaluating the importance of hyperlinks, it takes into consideration relevance among page, topic, and link-related information. Experiments indicate that this method is effective.

Keywords:	Deep Web sourtes focused crawler Bayes classifier
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《计算机工程》浏览原始摘要信息
	点击此处可从《计算机工程》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏