基于多Agent系统的定题爬虫算法 Focused Crawling Algorithm Based on Multi-agent System期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于多Agent系统的定题爬虫算法

引用本文：	徐照财,程显毅.基于多Agent系统的定题爬虫算法[J].计算机工程,2008,34(16):204-206.

作者姓名：	徐照财程显毅

作者单位：	江苏大学计算机科学与通信工程学院,镇江,212013

基金项目：	江苏省科技攻关基金资助重点项目

摘要：	定题爬虫的研究是定题搜索引擎的关键技术。该文提出一种基于多Agent系统的爬虫算法，采用本题语义主题关键词过滤的方法来抓取与主题相关的网页，利用本体库语义网络实现本体领域中同近义词的过滤。凭借HTML网页标记对关键字识别的不同权重和超链接锚文本对主题相关网页进行预测，通过黑板的通信机制实现多Agent交互。实验结果表明算法在抓取网页的查准率、查全率方面有一定的改善。
关键词：	定题爬虫主题关键字过滤语义
修稿时间：
Focused Crawling Algorithm Based on Multi-agent System

XU Zhao-cai,CHENG Xian-yi.Focused Crawling Algorithm Based on Multi-agent System[J].Computer Engineering,2008,34(16):204-206.

Authors:	XU Zhao-cai CHENG Xian-yi

Affiliation:	(Computer Science & Communication Engineering Institute, Jiangsu University. Zhenjiang 212013)

Abstract:	Focused crawling research is key to search engine technology. In this paper, a focused crawling algorithm based on multi-Agent system is presented, which presents a core issue of a theme key words filtering method based on ontology to collect the URL related to the themes. Semantic network based on the ontology is to achieve filtering of similar meaning. It also introduces keyword identification of different weights by HTML page tags and anchor text, which are important for the website forecast use. And system model based on the blackboard communication mechanism is explained. The experimental results show that the system has an increasing promotion in both precision and extension for crawling website.

Keywords:	focused crawling theme key words filtering semantics
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《计算机工程》浏览原始摘要信息
	点击此处可从《计算机工程》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏