基于最优查询的多领域deep Web爬虫* Multi-domain deep Web crawler based on most efficient queries期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于最优查询的多领域deep Web爬虫*

引用本文：	冯明远,林怀忠.基于最优查询的多领域deep Web爬虫*[J].计算机应用研究,2009,26(9):3375-3377.

作者姓名：	冯明远林怀忠

作者单位：	浙江大学,计算机科学与技术学院,杭州,310027

基金项目：	浙江省科技计划基金资助项目(2007C23086）

摘要：	Deep Web信息通过在网页搜索接口提交查询词获得。通用搜索引擎使用超链接爬取网页，无法索引deep Web数据。为解决此问题，介绍一种基于最优查询的deep Web爬虫，通过从聚类网页中生成最优查询，自动提交查询，最后索引查询结果。实验表明系统能自动、高效地完成多领域deep Web数据爬取。
关键词：	deep Web deep Web爬虫最优查询页面聚类
Multi-domain deep Web crawler based on most efficient queries

FENG Ming-yuan,LIN Huai-zhong.Multi-domain deep Web crawler based on most efficient queries[J].Application Research of Computers,2009,26(9):3375-3377.

Authors:	FENG Ming-yuan LIN Huai-zhong

Affiliation:	College of Computer Science & Technology;Zhejiang University;Hangzhou 310027;China

Abstract:	Deep Web information can only be obtained through queries submitted to search forms in pages. While traditional hyperlinks based search engines were hard to index the deep Web data. To address this problem, proposed a most efficient queries based on deep Web crawler. It generated the most efficient queries through clustered Web pages, submitted the queries, and indexed the returned results. Experiment shows it can crawl data automatically and efficiently from multi-domain deep Web.

Keywords:	deep Web
本文献已被 CNKI 万方数据等数据库收录！
	点击此处可从《计算机应用研究》浏览原始摘要信息
	点击此处可从《计算机应用研究》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏